In this lab assignment, you will develop (in C++ in Visual Studio 2013 on the Windows lab machines) a new executable program that processes command line arguments, reads expressions from a file, and parses them. The emphasis for this assignment is on the internal processing, representation, and matching activites the program will perform. Specifically, in this assignment, you will extend your solution from the previous lab assignment to (1) read and parse an input file as a stream of Horn clauses according to the parenthesized notation described by the input grammar defined below (2) construct and store representations of those elements according to their types and other attributes; and (3) perform basic matching operations to see if different instances of those elements are compatible, based on the idea of unification from logic programming.
In the next (and final lab) assignment this semester, we will extend this program to perform resolution, further semantic evaluation, and query processing of Horn clauses. The design and implementation decisions you make in this assignment also will impact the design and implementation decisions for the next lab assignment, and so you are encouraged to think ahead where possible as you design your solution to this assignment.
As you implement your solution to this lab, please feel free to declare and define new
structs, classes, enumerations, functions, and other types that are helpful in the
design and implementation of your lab solution. As you do that please maintain good
modularity of your implementation at the file level as well as at the type level, e.g., giving
each major abstraction like a class and its related functions and enumerations its own separate
header (.h
) file and source (.cpp
) file.
hornclause -> LEFTPAREN head [body] RIGHTPAREN
head -> predicate
body -> LEFTPAREN predicate {predicate} RIGHTPAREN
predicate -> LEFTPAREN name {symbol} RIGHTPAREN
name -> LABEL
symbol -> LABEL | NUMBER
For this lab assignment, the parsing should largely ignore the line breaks in the input file as long as the current parse is succeeding, so that a well-formed Horn clause is allowed to span multiple lines of the input file, and the end of one well formed Horn clause and the beginning of another well formed Horn clause both may be within the same line of the input file.
However, if the parse discovers that the Horn clause it is currently trying to parse is not well formed, then how you handle that case is up to you. For example, upon encountering such an error the program could immediately start looking for the start of another well formed Horn clause, or it could we skip the rest of the current line of the input file and start a fresh parse beginning at the first token of the next line of the input file. However you decide to handle this issue, please make sure you present your design choice regarding this issue, and your reasoning for making it, in your project report writeup.
As in the previous lab assignment, you will need to implement either
specific scanning functions or additional logic in your parsing code
for the terminal tokens in the grammar above, which are defined as
follows (which is the same way they were defined in the previous lab
assignment): LABEL
tokens consist entirely of (lowercase
or uppercase) alphabetic characters: i.e., characters in 'a'
to
'z'
or in 'A'
to 'Z'
;
NUMBER
tokens consist entirely of decimal digit
characters: '0'
to '9'
;
LEFTPAREN
is the string "("
; and a
RIGHTPAREN
is the string ")"
.
Hint: Since C++ provides object-oriented and aliasing features, one highly effective way to generate such a representation is to construct a parse graph for each well formed Horn clause, within which the nodes of the graph are objects corresponding to the different (terminal or non-terminal) elements of the grammar and the edges of the graph are implemented as pointers. Note also that some of the C++ classes you have developed for the studio exercises may be a valuable starting point towards developing such a representation.
In your project report, please describe the abstract representation that you used, and how that choice affected your implementation of the symbol table and of the predicate matching code.
Every distinct token that matches a
symbol
non-terminal within a well-formed Horn clause
should be represented by exactly one entry in the symbol table for all
occurrences of the same symbol
throughout the entire
file. A LABEL
token symbol
should be interpreted as a variable,
with a label attribute containing the token string, while
a NUMBER
token symbol
should be interpreted as an unsigned integer
constant, with a value attribute containing the numeric value
corresponding to the token (hint: wrapping a C++
string
with an istringstream
whose
extraction operator >>
produces a numeric value is
one good way to do this).
Every expression that matches a predicate
non-terminal
within a well-formed Horn clause (except possibly for identical
predicates, which you may want to avoid duplicating, as specified in the extra
credit section below) should have its own distinct entry within the symbol table,
which should have a name atttibute containing the token string corresponding to
the LABEL
token that matched the predicate's name
non-terminal and
should point to the symbol table entries for the variables and constants within
that expression.
For example, if the input file contained only the following well-formed Horn clauses
( ( greater x y ) ( ( greater x z ) ( greater z y ) ) )
( ( greater z 3 ) )
then there would be one constant (with value 3), three variables (with labels x, y, and z
respectively), and (even with duplicate elimination) four predicates all named greater and
with pointers to the x and y variables, the x and z variables, the z and y variables, and
the z variable and the constant 3, respectively.
After your program has finished parsing the input file, it should traverse the symbol table and print (to the standard output stream) a separate line for each predicate by printing its name attribute followed by the attributes of the symbols to which it points.
For example, for the input file above the program should print out something like:
greater x y
greater x z
greater z y
greater z 3
In your project report, please describe the structure of your symbol table implementation, and the design considerations that led to that structure.
If the predicates have different numbers of symbols then they should not match.
For example, ( greater x y )
would not match with ( greater x y 3 )
even though the names and the first two variables are the same.
If the names of the predicates and the number of variables they have are the same, the
program should make working copies of the two predicates being unified (leaving
the original predicates unchanged), and in those copies repeatedly compare the symbols
at each position within their ordered lists of symbols looking for mis-matches and/or
performing substitutions, as appropriate.
If at any position the same constant or variable appears in both working copies, the
program should just continue checking at the next position, and if at any position
there are two different constants, the predicates should not match and the working copies
simply should be discarded.
For example, ( greater x 2 )
would not match with ( greater x 3 )
even though the names and the number of variables are the same.
If at least one of the symbols at a position is a variable, the program should check repeatedly (until no more substitutions can be applied) whether any previously recorded substitutions for that particular pair of working copies for the predicates being matched need to be applied to the variable (or variables) at that position in either of the working copies. The program should then compare the resulting symbols in the working copies, after all the previously recorded substitutions have been made at that position.
If the resulting symbols are not the same and are not mis-matched constants (which
would identify the predicates as not matching) then the program should add a new substitution
to the list of substitutions before moving on to the next position: if both of the
symbols are variables, then the name of the symbol in the current predicate should
be substituted for the name of the symbol in the other predicate; otherwise the constant
should be substituted for the name of the variable.
For example, ( greater 3 y x )
(the current predicate) would match with
( greater z z w )
as follows:
The substitution 3/z
-- 3 for z -- would be generated for the first position,
and the working copies would become ( greater 3 y x )
-- the working copy of the
current predicate is unchanged since z does not appear in it anywhere -- and
( greater 3 3 w )
respectively.
The substitution 3/y
-- 3 for y -- then would be generated for the second position,
and the working copy of the current predicate would become ( greater 3 3 x )
, with
the other working copy remaining ( greater 3 3 w )
in that step.
Finally, the substitution x/w
-- x for w -- then would be generated for the third
position, resulting in the transformed unified version of both working copies being
( greater 3 3 x )
after that substitution is applied to the second working copy.
If the program reaches the end of the predicates' symbol lists without detecting a mismatch it should print out a message identifying the match, including printing out the original versions of the predicates (as they appeared in the input file), the list of substitutions (if any) that were needed to unify the predicates, and (a single copy of) the resulting transformed unified version of the predicate(s) (with all of the substitutions applied).
Before beginning to unify the next pair of predicates, the list of substitutions again should be made empty so that each list of substitutions pertains only to the matching of a particular pair of predicates (not beyond them). The working copies of the previous pair of predicates should be discarded (or completely overwritten by copies of the new pair of predicates to be matched, depending on how you implement this).
In your project report, please describe how you implemented the code that matches predicates under unification, and the design considerations that led to that implementation.
In your project report please document which test cases you ran, and summarize what your program did and whether what you saw was or was not correct behavior (and why or why not) in each case.
IMPORTANT:please make sure that no .exe file is included in any .zip file you send, as the WUSTL e-mail servers will block delivery of email with a .zip attachment that includes a .exe file. Please also make sure to send a .zip file (not a .7z file or other zip format) when you send your solutions.
Please add an extra credit section to your project report documenting your design and implementation of that additional capability. In that same section, please show examples of input and output from the different cases that you tested in order to validate that it is working correctly.
Please submit both the required and extra credit portions within the same code base, along with your project report.