Chapter 3: Scanning—Theory and Practice


Getting Started

Abstract: In this assignment you will implement a simple finite-state transducer, which uses the actions of a finite-state automaton to perform a simple language translation. You will also gain experience with the notation used by CUP (Constructor of Useful Parsers). Specifically, you will design and implement a deterministic finite-state automaton that recognizes fully-declared right-linear CUP grammar specifications. The output (transduction) from your program consists of node and edge declarations for a finite-state automaton.

As usual, you will find a TestFiles subdirectory for this assignment. That directory contains a collection of files you can use to test your solution. While your solution is expected to work on these inputs, your code may be evaluated on other, secret inputs. You are encouraged to test your program to ensure that it operates reasonably on any possible input.

Behold the general structure of a CUP specification:

Section Example
Symbol Declarations
    terminal stringtype lparen, rparen, digit;
non terminal inttype    Number; 
non terminal stringtype Parens; 
Start Declaration start with Number;
Number ::= Number digit
         | digit
         | Parens

Parens ::= lparen rparen;

The phrases terminal, non terminal, and start with indicate the declaration of terminal and nonterminal symbols, respectively. The word immediately following each of those is not a symbol to be declared, but is instead the semantic type of the symbol. So, in the example above, stringtype and inttype are object types for the symbols declared on those lines. The distributon of white space is arbitrary, but the above examples were constructed to show good formatting style.

Upon encountering a terminal symbol, your code should record the symbol as terminal in the symbol table by issuing:

Similarly, nonterminals are recorded by issuing:

For this assignment, you will modify only the file, and that is the only file you will turn in.

In your submitted file, be sure to document your approach, including how you decided to treat keywords such as non, with, etc.

The file as supplied contains a skeletal finite-state machine (simulator), driven by the tables GOTO[][] and ACTION[][]. Your assignment is to expand and fill in these tables appropriately, and to add code in the switch statement to perform the required translation.

While processing the rules section, you will check each production for right-linear form. That is, there should be only nonterminals on the left side of a rule, and the right side of a rule should be a terminal followed by at most one nonterminal. To determine the type of a symbol previously entered into the symbol table, invoke

which will return true if symbol is terminal, and false otherwise. The method may throw the error SymbolNotFoundError if the symbol was not recorded.

To assist you, the Fsa constructor is passed an Enumeration that is the token stream. There is no end-of-input token: the token stream is exhausted when the Enumeration has no more elements. Each element of the Enumeration is a Token x such that x.type() is one of the following:

Token.Blank A string comprised of blanks, tabs, and newlines. Although Blank tries to return the longest such string it can, there are situations when the scanner will return multiple consecutive Blanks. Your finite-state machine must accommodate this.
Token.Define The character sequence ::=
Token.Semi The character ;
Token.Or The character |
Token.Comma The character ,
Token.Str A string of characters has been recognized that is not otherwise covered by other tokens. The actual string is accessible as x.strValue() for token x.
Token.Terminal The string terminal
Token.Non The string non
Token.Start The string start
Token.With The string with
Token.Other Anything else (always an error)
The tokens above are conveniently in the range of 0...10; see the file for the correspondence.
Sample input Sample output
non terminal obj S;
    terminal obj plus, minus, zero, one;
non terminal obj Rest, D;
    terminal obj bogus, not, really, used;
start with S;
	::= zero Rest
	|   one Rest

	::= plus    D
	|   minus   D

	::= zero
	|   one
	|   really
Start S
Edge S Rest zero
Edge S Rest one
Edge Rest D plus
Edge Rest D minus
Edge D $FINAL$ zero
Edge D $FINAL$ one
Edge D $FINAL$ really
Expected output for the lab's inputs can be found in the Solns folder of the download.