Assembly Language Syntax

The following context free grammar (CFG) gives the syntax of legal assembly language programs. The start symbol for the CFG is Program. EOL is an end-of-line token and EOF indicates the end of the file. Square brackets indicates an optional item. Items in parentheses followed by "*" indicate 0 or more copies of those items. Items in parentheses followed by "+" indicate 1 or more copies of those items. Terminal symbols are displayed in bold and are surrounded by quotes. Terminal symbols are case sensitive. A range of characters are indicated by the first character of the range and the last character separated by a dash. For example, the small alphabetical letters would be indicated by "a-z". All characters in an assembly language program must be ASCII characters.

Note: In this grammar, we assume that the character starting a comment is a semicolon (;), the character ending a label is a colon (:), and the character beginning all pseudoinstructions is a period (.). Users can specify other characters to play those three roles in the Preferences Dialog.

Program -> [CommentsAndEOLs] EquMacroIncludePart InstructionPart EOF
CommentsAndEOLs -> ([Comment] EOL)+
Comment -> ";" <any-sequence-of-characters-not-including-EOF-or-EOL>
EquMacroIncludePart -> ((EquDeclaration | MacroDeclaration | Include) CommentsAndEOLs)*
Include -> ".include" <string-of-characters-not-including-EOL-or-EOF>
EquDeclaration -> Symbol "EQU" Operand
MacroDeclaration -> "MACRO" Symbol [Symbol ([","] Symbol)*] CommentsAndEOLs InstructionPart "ENDM"
InstructionPart -> ((RegularInstructionCall | DataPseudoinstructionCall | AsciiPseudoinstructionCall) CommentsAndEOLs)*
RegularInstructionCall -> (Label CommentsAndEOLs)* [Label] Symbol [Operand ([","] Operand)*]
DataPseudoinstructionCall -> (Label CommentsAndEOLs)* [Label] ".data" Operand [","] ( Operand | [Operand [","]] "[" [Operand ([","] Operand)*] "]")
AsciiPseudoinstructionCall -> (Label CommentsAndEOLs)* [Label] ".ascii" String
Label -> Symbol ":"
Operand -> Symbol | Literal
Literal -> [ "-" | "+" ] ( ( 0-9 )+ | "0x"( 0-9a-fA-F )+ | "0b"( 0 | 1 )+ | <single-quoted-character>)

Here is a summary of the parts of an assembly language program. The basic building blocks consist of the following items.

A literal is one of the following:
(a) a decimal integer,
(b) a hexadecimal integer (denoted by the prefix "0x" or "-0x" followed by one or more of the characters 0-9, a-f, A-F),
(c) a binary integer (denoted by the prefix "0b" or "-0b" followed by one or more 0's or 1's),
(d) a single character surrounded by single quotes.
Literals other than single-quoted characters can have an optional plus or minus sign in front. No commas or decimal points are allowed in literals. In the case of the single-quoted character, the value of the literal is the ASCII value of the character.

A string is any sequence of characters surrounded by double quotes, such as "abcde", or surrounded by chevrons, as in <abcde>. Note that the sequence of characters inside the quotes cannot include the double-quote character, the characters inside the chevrons cannot include a chevron, and a string cannot contain the EOL character.

A symbol consists of a sequence of one or more characters, These characters include all letters a-z and A-Z, the digits 0-9, and the set of punctuation characters specified by the user in the Preferences Dialog. Any character can start a symbol except a digit, a plus sign followed by a digit, or a minus sign followed by a digit, since the parser will parse a number instead of a symbol in those three cases. CPU Sim distinguishes between upper and lower case letters; hence, "Data" and "data" are considered different symbols.

A label is a symbol followed immediately by a colon. The colon is just a separator and is not considered part of the label. The label and colon pair is an optional feature on every line of assembly language programs including those lines that are otherwise blank or contain only comments. In the latter two cases, the assembler will treat the label as if it referred to the next regular instruction or data pseudoinstruction. Labels can be used as operands in statements.

A comment is any sequence of characters preceded by a semicolon ";" and ending at the end of the line. Any line of a program can contain a comment. Blank lines and lines containing only comments are also allowed in assembly language programs, and are ignored by the assembler. When the program is assembled, regular instructions and data pseudoinstructions, including the comments on the ends of the lines, are saved and appear after each line of the assembled program in the Comments column of the RAM window into which the assembled program is loaded. However, remember that blank lines or lines with only comments are discarded when the program gets assembled.

A program consists of two parts. The first part contains any number of EQU declarations, include directives, and macro definitions in any order. The second part contains any number of regular instructions and data pseudoinstructions, one per line.

Note: To separate tokens in assembly language programs that the assembler would otherwise treat as one token, use one or more spaces or tab characters.