Fall 2015 CSE 425 Lab 1: Basic Semantic Analysis of Horn Clauses in C++

Due by Tuesday October 6, 2015, at 11:59 pm
(deadline for our e-mail server's receipt of your submission with a .zip file with your solution)
Final grade percentage: 10 percent

In this lab you are allowed (and in fact encouraged) to work in teams of 2 or 3 people (but not more), though you also are allowed to work individually if you prefer.

Objective:

This lab is intended to give you experience with basic techniques for semantic analysis (i.e., of Horn clauses from the domain of logic programming as a specific example), and to extend your experience with syntax directed techniques, including:

In this lab assignment, you will extend your solution from the previous lab assignment to: (1) record attributes for some of the tokens recognized during scanning (specifically, the "values" of label and number tokens); (2) construct an abstract representation of recognized Horn clauses during the operation of the parsing function(s) you developed in the previous lab assignment; and (3) in a separate semantic analysis function use the abstract representation to print out ordered lists of the unique label and number tokens that were seen, and to translate and output recognized Horn clauses in a different but similar syntax, based on a simple output grammar.

Assignment:

    Part I - Readings and Resources:

  1. The following readings in the required course text book may be useful as reference material while working on this lab assignment:

  2. The following on-line resource also may be useful while working on this lab assignment:

    Part II - Program Design and Implementation:

    Note: the details of this assignment are intentionally somewhat under-specified, leaving you some room to choose what you think is the best way to implement them, as long as what you do is reasonable and you explain your design decisions in comments in the code and in your project report.

  3. Open up Visual Studio 2013 and create a new C++ Win32 Console Application project for your lab 1 assignment. For example, named cse425_lab1.

    Delete the automatically generated main source file from your lab 1 project and from the new project's source file folder.

    In the ReadMe.txt file, put the lab number and names of the people submitting the lab solution, and as you work on this lab please record your design, implementation, and testing approach in this file as your project report.

    Copy the source and header files you wrote for lab 0 into the corresponding directory for lab 1, and add them to your lab 1 project.

    Please make sure you make separate copies of the files in the directory where Visual Studio 2013 expects to find source and header files from the project, rather than attempting to add the original lab 0 files into the lab 1 project - not only will you make it easier for Visual Studio 2013 to find things where it expects them (and thus avoid some errors that can crop up otherwise), but you also will have a version of your code that you can look back at (and possibly revert back to) even as you're making changes to the version that you're updating in lab 1.

    When we return your graded lab 0 solution to you, please also incorporate fixes to any problems we pointed out in your lab 1 code, even (or perhaps especially) in code that you moved over from lab 0.

  4. Add a public constructor to the token struct type that you wrote for your lab 0 solution, which takes two parameters -- one of the struct's enumerated type and one with a reference to a const (C++ style) string -- and initializes the respective member variables with those parameters. The struct type should still have a default constructor that initializes the enumerated member variable to UNKNOWN and leaves the string member variable empty.

  5. Declare and define a struct for a label token, which is derived via public inheritance from the token struct type that you wrote for your lab 0 solution: it should have a public constructor that takes a reference to a (C++ style) string and calls the appropriate base struct type's constructor with LABEL and the passed string parameter.

  6. Declare and define a struct for a number token, which is derived via public inheritance from the token struct type that you wrote for your lab 0 solution: it should have a member variable of type unsigned int, and a public constructor that takes a reference to a (C++ style) string and calls the appropriate base struct type's constructor with NUMBER and the passed string parameter.

    The body of the constructor should then assign the numeric value represented by the passed string parameter, to the unsigned int member variable (hint: it is easy to convert the string representation of an unsigned number into its equivalent unsigned int representation using the >> extraction operator of an istringstream with which you can wrap the string parameter).

  7. Modify the scanner class that you wrote for the previous assignment so that it constructs all token objects dynamically on the heap (using the new operator) and stores aliases to token objects rather than storing the objects themselves by value. This will preserve polymorphism and avoid the "class slicing problem" for the derived label and number struct types. Please make sure to use a reference counted smart pointer type like the C++11 shared_ptr, and/or other techniques to ensure that dynamically allocated objects are not leaked, cleaned up too soon, etc.

    Modify your scanner class so that when it recognizes a label token or number token it constructs the appropriate derived struct type for each of those, but constructs the base struct type for each of the other kinds of tokens, using the appropriate constructor and storing an alias to the (dynamically allocated) token object in each case.

  8. Modify the parsing function (and depending on your design, any other part of the program that deals with tokens) that you wrote for the previous assignment, so that it operates only on aliases to the dynamically allocated token objects and does not copy them, pass them by value, etc. (so as to avoid the "class slicing problem" for the derived label and token struct types). One reasonable way to achieve this is to pass a reference counted smart pointer type like the C++11 shared_ptr between the scanner and parser, as a handle for each dynamically allocated token object -- when neither the scanner nor the parser (nor any other part of the program) has a reference to a token object, it will be deallocated dynamically by the destructor of the last smart pointer object that had an alias to it.

    Also modify your parsing function so that for all of the well formed input Horn clauses it recognizes, it builds (and makes available to the main function) an abstract representation that is easily translated into a comparable output Horn clause format whose structure is defined by the following output grammar (where the metasymbols and terminal tokens have the same definitions as in the lab 0 assignment):

    hornclause -> LEFTPAREN head [body] RIGHTPAREN

    head -> predicate

    body -> LEFTPAREN predicate {predicate} RIGHTPAREN

    predicate -> LEFTPAREN name {symbol} RIGHTPAREN

    name -> LABEL

    symbol -> LABEL | NUMBER

  9. Write a semantic analysis function (which the main function should call after the parsing function completes) that traverses the abstract representation produced by the parsing function and prints out (to an output file as in the previous lab assignment): (1) a lexically ordered list of all of the unique label tokens that were seen (each label should appear exactly once in that list), (2) a numerically ordered ordered list of all of the unique number tokens that were seen (each number should appear exactly once in that list), and (3) versions of the valid input Horn clauses translated into their equivalents under the output grammar (instead of according to the input grammar, which was how the output was formatted in the previous lab assignment).

    For example, the input Horn clauses

    ancestor ( x , z ) :- parent ( x , y ) ^ ancestor ( y , z )

    age ( x , 78 )

    age ( y , 53 )

    age ( z , 2 )

    would produce the lexically ordered labels

    age
    ancestor
    parent
    x
    y
    z

    and the numerically ordered numbers

    2
    53
    78

    and would be translated into output Horn clauses closely resembling

    ( ( ancestor x z ) ( ( parent x y ) ( ancestor y z ) ) )

    ( ( age x 78 ) )

    ( ( age y 53 ) )

    ( ( age z 2 ) )

    but possibly with different spacing between the symbols.

  10. Hint: One straightforward abstract representation is simply a list of predicates, each with (smart) pointers to the appropriate token objects, since at least for the required portion of the assignment the head of the clause is a single predicate (which would be the first predicate and the ones after that would be the body of the clause; note however that the extra credit portion may require a change in this part of the assignment to use a list of lists of predicates so that the head and body are more clearly differentiated).

    In your project report, please describe the abstract representation that you used, and how that choice affected your implementation of output functions that generated the output Horn clauses from it.

  11. Build your program, and to make sure that all features of your program are working correctly, run the executable program through a series of trials that test it with good coverage of cases involving both well formed and badly formed command lines, with existing and missing files, and with input files containing lines with different well formed and badly formed input Horn clauses, etc. in as many possible combinations as you can think of.

    Before submitting your solution for this lab, please read our grading comments on your solution to the previous lab and address any problems that we pointed out - our evaluation of your solution for this lab may include regression testing of features from the previous lab as well as testing new features for this one.

    In your project report please document which cases you ran, summarize what your program did and whether or not that was correct behavior (and why or why not), in each case. Please make sure to distinguish which cases you ran as regression tests of the functionality that was retained from the previous lab, versus cases you ran to test the new functionality you developed for this lab.

  12. Prepare a .zip file that contains all of your project's header and source code files, input/output traces for the important cases you tested, and your project report. Send the .zip file containing your lab 1 solution as an e-mail attachment to the course e-mail account (cse425@seas.wustl.edu) early enough that it is received by the submission deadline for this assignment. If you need to make changes to your lab solution you are welcome to send a new .zip file, and we will grade the latest one received prior to the deadline (according to the time stamp the server puts on the e-mail).

    IMPORTANT:please make sure that no .exe file is included in any .zip file you send, as the WUSTL e-mail servers will block delivery of email with a .zip attachment that includes a .exe file. Please also make sure to send a .zip file (not a .7z file or other zip format) when you send your solutions.

    Part III - Extra Credit (1 to 5 percent of assignment's value depending on quality and completeness):

  13. Optionally, extend your solution so that (like the definition of the body) the head of a Horn clause may be either a conjunction of multiple predicates or a single predicate, in both the input and output grammars and the corresponding input and output files. Note that if you already received extra credit for this augmentation to the input grammar and input files on the previous lab assignment, we will only consider the part that you completed beyond that original implementation in the previous assignment.

    Please add an extra credit section to your project report documenting your design and implementation of that additional capability (clearly identifying which parts were completed in the previous lab and which parts were completed in this lab), including showing how the input and output grammars grammars were extended to support that feature, and describe how you modified your implementation to parse and emit those grammatical extensions. In that same section, please show examples of input and output from the different cases that you tested in order to validate that it is working correctly.

    Please submit both the required and extra credit portions of your program code and your project report together (rather than in a separate directory).


Posted 9:10am Wednesday September 16, 2015, by
Chris Gill