In CSE 131, you studied the following ADTs:
In the upcoming studio and lab sessions, you will design an interface for using the KWIC index and add some interesting features.
- If you work in pairs, then do all the work together in one person's repo.
- When you demo, indicate that you want the results propagated to the other person's repo.
It should fail at this point with a NullPointerException.
Supose you wanted to search a document for all phrases that include a given word, say "swordfish". There are two approaches that could be used in this endeavor:
We shall assume that offline preprocessing pays off, and that it would be expensive to search the document each time a word is supplied. As an analogy, consider the difficult task undertaken by Google to index the WWW. Imagine how slow it would be for Google to search the entire WWW each time you ask it to find a word.
As an example, consider the following phrases:The following table shows how phrases should be returned for words that might be supplied for KWIC:
- Swordfish goes well with pasta; the pasta should not be overcooked.
- The password for entry to the castle is: "swordfish".
- All's well that ends well.
Notice that case and punctuation do not matter in matches, but that the returned phrases are exactly as they were entered. Also, although "well" is contained twice in one phrase, the set of phrases contains each phrase once (as sets are supposed to do).
Word Set of Phrases swordfish
- Swordfish goes well with pasta; the pasta should not be overcooked.
- The password for entry to the castle is: "swordfish".
Well
- Swordfish goes well with pasta; the pasta should not be overcooked.
- All's well that ends well.
You can find may sites that offer KWIC or concordance indices. For example this site has Shakespeare's works. Check out the results for rogue.
In summary, our KWIC index will:
Remember that the summary of a method just contains the first line of the documentation. The rest is found in the detailed section of the JavaDoc. You get there by clicking on the method name in the summary section or by scrolling down.For example, see the Phrase documentation, which mentions the equals method in the summary. But the detailed documentation is found by clicking on the equals method.
For example, if you have a Set<Word> s, then you can iterate over its words using the relatively beautiful:
for (Word w : s) {
}
instead of the relatively ugly:
Iteration<Word> i = s.iterator();
while (i.hasNext()) {
Word w = i.next();
}
Your code will be penalized for failure to be concise and clear, so use the short form of the iteration.
Get your lab working first without doing the File part.
Word w = new Word("Dog");
assertEquals("Dog", w.getOriginalWord());
assertEquals("dog", w.getMatchWord());
assertEquals(new Word("dog"), w);
assertEquals(new Word("DOG"), w);
assertEquals(new Word("DOG").hashCode(), new Word("dog").hashCode());
But be sure to fix them up before you test!
You must use StringTokenizer to break apart the String into words. This is something you are supposed to learn in CSE132.Build a Set of Words from the tokens and return that Set in this method.
Notes about reading the input file:
- In Studio 2, you read a file using the DataInputStream.
- While you might be tempted to use its readLine() method, that has been deprecated, so don't do that!
- Follow the advice given in the JavaDoc for DataInputStream's readLine().
- This should lead you to BufferedReader's readLine() to read one complete line of the input file.
You can also open the fortunes.txt file in Eclipse to see how it looks.
- If you find 68, or 70, or 71, or something like that, that's fine; change the unit test if you need to for that.
- You may be overlooking commas, and a nice way to get rid of stuff is the replaceAll(String,String) method. That in turn will expose you to regular expressions, which is something good to know.
- Ask for help if you need it!
Review the material here (slides 44 and beyond) to gain an understanding of how those two methods work:
if a.equals(b) then a.hashCode() == b.hashCode()
When you done with this studio, you must be cleared by the TA to receive credit.
- Commit all your work to your repository!
If you do not commit, the TAs cannot grade your work and you will receive a 0 for this assignment!- Fill in the form below with the relevant information
- Have a TA check your work
- The TA should check your work and then fill in his or her name
- Click OK while the TA watches
- If you request propagation, it does not happen immediately, but should be posted in the next day or so