Chapter 3 Strings

Fig-Forth v2 has a variety of string oriented functions available to your programs, the most basic of which is the ." (dot-quote) word. When input, this word combination scans the input stream until a second quote mark is found, adding the string to the dictionary space or the current end of the definition in progress depending upon the active mode. When run, the text enclosed is sent to the output device.

          : SAY-HELLO ." This is a sample of a string to be printed." ;

The above shows a typical display line, which can be used for program titles, user prompts, or program results. Note the space between the dot-quote word and the first letter of the enclosed text, which causes Forth to activate the compiler in order to parse the string.

The second string function in Fig-Forth v2 is the ," (comma-quote) word, which adds the text to the current definition in the same manner as dot-quote, but upon execution returns only the address of the string itself. This is most useful in block open statements for files and commands, or in the building of strings whose content is expected to change. For DOS function calls the comma-quote word also adds a null terminator to the end of the string, making the maximum length of the contained string to be 254 characters.

The format of all strings saved using these words is <kernel-function> <count byte> <text characters> and are limited by the 8 bit count to 255 characters including any terminator.

Additional printing words are CR, SPACE and SPACES for a carriage return, single space or multiple spaces respectively.

Strings in Arrays

Strings can also be saved into arrays of variable sizes and entered from the keyboard through the EXPECT command. The EXPECT word requires two values for this process, the address of the array or string-space as set aside in memory, and the maximum number of characters to fetch from the keyboard. Note that EXPECT terminates the input with a full word value of Zero, such that requesting an input of 25 characters will consume a maximum of 27 bytes. Simple editing functions are available through EXPECT, and the function will exit automatically when the buffer is full or a carriage return is entered. The number of characters received including any terminator is stored in the system variable HLD.

Basic String Functions

Strings saved in arrays or set-aside areas not belonging to dot or comma quote have a variety of function options, the most simple being the TYPE word. TYPE expects two values just like EXPECT, the address of the string and the length of its content. For ease of use, any counted string's length may be found with the COUNT word, which retrieves the first byte of the string and adjusts the address to point to the first character. Thus a function sequence like this would emulate the dot-quote word;

          : MYLINE ," This string will be printed." COUNT TYPE ;

Strings saved in arrays or comma-quote words may be filled to all nulls by the ERASE word, which accepts an address of memory and the number of bytes to fill. The BLANKS word performs a similar function, filling the memory area with all spaces. Finally, the FILL word accepts the memory address location, the number of bytes to fill, and the character byte to place into the string area.

          50 ARRAY ASCII-A ASCII-A 50 65 FILL -- fills the array with 'AAAAA..."

Strings may be moved by the STR> word once their addresses are known, and may optionally be manipulated by the memory words defined in Table 3-1. The STR> word takes two addresses and moves the counted string from the first to the second location, including its count byte;

          NAME1 NAME2 STR> -- move contents of counted string buffer name1 to name2

The CMOVE and -CMOVE words on the other hand, require the starting and ending addresses, and the number of bytes to move. The direction of move is also present in the command, in that the hyphenated version moves the data from the tail of the buffers specified. For the purposes of accessing all of the system memory, Fig-Forth v2 contains the segmented extensions of these words as LMOVE and -LMOVE. All of these words operate as outlined in the table below;

Table 3-1. String and Memory moves.

word

stack

operation
STR> adr1 adr2 -- moves counted string from adr1 to adr2
CMOVE adr1 adr2 n -- moves n bytes from adr1 to adr2
-CMOVE adr1 adr2 n -- moves n bytes backward from adr1 to adr2
LMOVE adr1 seg1 adr2 seg2 n -- moves n bytes from seg1:adr1 to seg2:adr2
-LMOVE adr1 seg1 adr2 seg1 n -- moves n bytes backward from seg1:adr1 to seg2:adr2

Advanced String Functions

Fig-Forth v2 offers 11 advanced string operations, some of which are specialized to the compiler activity. These functions are DIGIT, ENCLOSE, FIND, FIND$, INTERPRET, NUMBER, WORD, -TRAILING, >UPPER $= and SIZE$. These functions are briefly defined below;

The DIGIT word is used to convert an ASCII character to a numeric value for the current or a selected number base within Fig-Forth. The unknown character is placed upon the parameter stack followed by the number base desired, then DIGIT is executed. Upon return the stack contains a false flag if the character does not qualify as a number for the base selected, or the converted binary value and a true flag is returned if the character is a valid digit.

ENCLOSE is the input parse function used by the compiler, accepting a buffer address to scan and the parse character. The values returned are the address of the buffer and counts that reflect the first non-delimiter character, and the length of the input to the next delimiter.

The FIND word searches the current vocabulary path for the counted string given it, returning a zero and the string address if the word is not in dictionary, or the token address of the function and a flag value if it is located. The flag value is -1 if the word definition is immediate, or a 1 if not. Note that the FIND word is ANS Forth compliant and therefore does not recognize the dotted operations mentioned in Chapters 2 or 6.

The FIND$ word searches for a string within a memory block. The string sought may be counted or not, but must be terminated by a null byte or an optional setting of the highest bit of the last character. FIND$ expects three values to be given it; the buffer address to search, the address of the string to find and the length of the buffer to scan. Upon return, the addresses are advanced to the point where the search ended, and the number of bytes remaining in the buffer. A zero top of stack value indicates the search failed.

NUMBER accepts a counted string's address and attempts to convert the ASCII characters found into a valid double word number for the current system base. The mathematical base is stored in the system variable BASE and the NUMBER process will recognize and accept the Ampersand (&) over-ride character to temporarily change the base to Hexadecimal. Before processing the string is converted to its uppercase equivalent regardless of the mode set by CASELOCK, then the string is parsed one character at a time to create the value. The Decimal Point Location variable (DPL) is set to the point where a period was found within the string, or returns a -1 if no such character was found. Additional notes: 1) The Fig-Forth command system will discard the top word of the double integer formed by NUMBER if no decimal point is located. 2) See the Compatibility Appendix for changes in the operation in this word as of version 2.25.

The -TRAILING word adjusts the count of a string given its address and current count, eliminating from the count value any trailing spaces the string may include. It does not change the string in any fashion, including the count byte, it only changes the count value located on the stack.

The >UPPER word accepts a string address and changes the string case for all characters from A to Z, the string is expected to contain its length count as the first byte value.

The $= word compares two strings for equivalence. The two strings may or may not be counted strings at the time, because $= requires both string addresses and the count of characters minus one to be compared. If the two strings are equal a Zero is returned, if the first is higher upon the ASCII table or shorter than the second string a -1 is returned, and if the first string is lower on the ASCII table or longer than the second a 1 is returned. Strings are deemed as shorter in the routine by having a zero termination character.

The SIZE$ word is used to determine the printing size of a counted string for the Font Printer. See Chapter 8.

The WORD function calls the ENCLOSE operation listed above for the current input stream buffer, using the parse character given to extract the next term. Leading terminator characters are ignored and the resulting counted string is placed at the end of the dictionary (HERE) for further processing. To parse out the next text word specifically for example, use the sequence of 32 WORD. (32 is an ASCII space character.) The source location of the input stream is stored in the system variable TIB, with the current offset within that buffer being stored in the system variable IN. When reading from a disk file the input location is saved in the system variable BLK, which is the numeric block of the file input to be parsed while the system variable IN points to the offset within the that block.

INTERPRET performs the operation of WORD defined above, in addition to operating upon the words parsed from the input stream. Using this function direct commands can be given to the compiler, whether accepted from the keyboard or input file, or constructed within a string space of any size or type. See the Black-Jack game example in Part 2, as well as the definition of RUN$. Interpret will process the contents of the current input buffer until it becomes empty, at which time control will return to the input processor or the application program as defined by the events executed.

Return to Contents.   Next Chapter.   Previous Chapter.