Programming the Basic Computer
The Assembler

Assembler Overview

An assembler is a program that converts a symbolic language program, known as the source program, into its binary machine language equivalent, referred to as the object program. The assembler processes character strings and translates them into a binary format that can be executed by the computer.

Representation of Symbolic Program in Memory

Before the assembly process begins, the symbolic program must be loaded into memory. This is typically done by the user typing the symbolic program on a terminal. A loader program then inputs these characters into memory using an alphanumeric character code. Each character is represented by an 8-bit code where the high-order bit is always 0, and the remaining seven bits follow the ASCII standard.

A line of code in the symbolic program is stored in consecutive memory locations with two characters per memory word (16 bits per word). Each character occupies 8 bits, enabling two characters to fit in one word. Symbols such as labels, operations, and addresses are terminated by specific codes: labels with a comma, operations and addresses with a space, and the end of a line with a carriage return (CR) code.

Example

Consider the following line of code:

PL3 LDA SUB I

This line will be stored in memory as follows:

Memory WordSymbolHexadecimal CodeBinary Representation
1P L50 4C0101 0000 0100 1100
23 ,33 2C0011 0011 0010 1100
3L D4C 440100 1100 0100 0100
4A ' '41 200100 0001 0010 0000
5S U53 550101 0011 0101 0101
6B ' '42 200100 0010 0010 0000
7I CR49 0D0100 1001 0000 1101

The CR code is produced when the return key is pressed, signaling the end of the line.

Assembly Process

The assembler program takes the user's symbolic language program in ASCII and processes it in two passes to produce the equivalent binary program.

First Pass

During the first pass, the assembler generates a table correlating all user-defined address symbols with their binary equivalents. This involves:

  1. Initializing the Location Counter (LC): The LC is set to 0 initially and keeps track of the memory location for each instruction or operand being processed.
  2. Processing Each Line of Code:
    • If the line contains a label, it is stored in the address symbol table along with the current LC value.
    • If the line contains an ORG pseudoinstruction, LC is set to the value that follows ORG.
    • If the line contains an END pseudoinstruction, the first pass ends.
Hello

Example of Address Symbol Table

For a program with labels such as MIN, SUB, and DIF, the address symbol table might look like this:

Memory WordSymbolHexadecimal CodeBinary Representation
1MI4D 490100 1101 0100 1001
2N,4E 2C0100 1110 0010 1100
3(LC)01 060000 0001 0000 0110
4SU53 550101 0011 0101 0101
5B'42 2C0100 0010 0010 1100
6(LC)01 070000 0001 0000 0111
7DI44 490100 0100 0100 1001
8F'46 2C0100 0110 0010 1100
9(LC)01 080000 0001 0000 1000

Second Pass

In the second pass, the assembler translates machine instructions using table-lookup procedures. The assembler uses four main tables:

  1. Pseudoinstruction Table: Contains symbols like ORG, END, DEC, and HEX.
  2. MRI Table: Contains memory-reference instruction symbols and their 3-bit opcode equivalents.
  3. Non-MRI Table: Contains register-reference and input-output instruction symbols and their 16-bit binary codes.
  4. Address Symbol Table: Generated during the first pass, it maps symbolic addresses to their binary equivalents.

The assembler processes each line of code, referring to these tables to find and convert symbols to their binary equivalents. Instructions are assembled by combining the opcode with the binary address and any necessary bits.

Hello

Error Diagnostics

The assembler also checks for possible errors in the symbolic program:

  • Invalid Machine Code Symbols: Detected when a symbol is absent in the MRI and non-MRI tables.
  • Undefined Symbols: Occur when a symbolic address does not appear as a label in the program.

For each error, the assembler prints an error message indicating the line of code where the error occurred.

Practical Considerations

While the described assembler provides a basic understanding, practical assemblers are often more complex. They may offer more flexibility, such as allowing addresses to be specified by numbers or symbols, supporting arithmetic expressions in addresses, and including more pseudoinstructions to aid programming. As the sophistication of the assembly language increases, so does the complexity of the assembler.

This documentation provides a comprehensive overview of how an assembler operates, emphasizing its processes, data structures, and error-handling mechanisms.