In this assignment, you will apply a hashing scheme in pursuit of finding occurences of one string in another, using the Rabin–Karp algorithm.
That algorithm is discussed in Section 32.2 of our text, on page 990.
a a b a a c a a d a a b a a b a a | | | a a b a | | a a b a | a a b aNotice that the second and third matches overlap. As discused in class, the most straightforward algorithm is as follows:
for each i in the interval [0, n) match = true for each j in the interval [0, m) if target[i+j] ≠ query[j] match = false end if end for if match // definite match of query starting at index i of target end if end forThe worst-case complexity of this approach is Θ(mn).
We will use in particular the following hash function for a window of size m in a character array, with the first character of the window at index start:
private static int hashFor(char[] chars, int start, int m) { int ans = 0; for (int i=start; i < start+m; ++i) { int val = 0; // // Within the bounds of the chars array? // if (i ≥ 0 && i < chars.length) val = chars[i]; // then use the character value there ans = mod( mod(31*ans, 511) + mod(val, 511), 511 ); } return ans; }The above function is similar to how Java computes .hashCode() for a String, except that we confine the results to being strictly less than 511 using mod (the % operator). The following identities related to mod should be useful in this assignment:
You should investigate how large the intermediate results of addition and multiplication can get before the final mod p is applied.
Applied to our running example, the rolling hash values at the end of each window within our target string are as follows:
a | a | b | a | a | c | a | a | d | a | a | b | a | a | b | a | a | |||||||||||||||||
97 | 38 | 254 | 306 | 214 | 430 | 337 | 153 | 73 | 368 | 92 | 224 | 306 | 214 | 429 | 306 | 214 |
In our running example, that hash value is 306, which can be computed by invoking hashFor("aaba".toCharArray(), 0, 4)
- In the example, the value for the first a is computed at indices -3, -2, -1, and 0, for a window of length 4 that ends at the first index of the target. The numerical values for characters outside the window are 0, and the numerical value for a is its ASCII encoding of 97.
- Thus, the window for a (the first character in our target string) has hash value 97.
- Moving one character to the right encounters another a. Its hash value is at indices -2, -1, 0, and 1, which produces 31×97+97, which is 3104, but taken mod 511 is 38.
- Thus the window for a a (the first two characters in our target string) has value 38.
You must complete the constructor and the nextCh method, subject to the constraints stated below.
In terms of our running example, the following calls after instantiation of a KR(4) object would return values as follows:
To receive full credit for this lab, your solution must:
- Suppose h is the hash value of a window of m characters.
- Suppose character c is about to leave that window, and character d is the next to be incorporated into the window.
- We can compute the new value of h using: h = (h×31 – 31^{m}×c + d) mod 511
Note that we will run your solution against the JUnit test as it was given to you. If you modify that test in some way, the results you see may not correspond with the testing necessary to receive credit.You can always revert any changes you make to the testing code back to what we gave you in the repository using Team…Revert, but be sure not to do that to your source code as it will wipe away work you may not have yet committed.
You can tell that your code was pushed by logging into bitbucket.org, clicking on your repository, and seeing the push in the associated log traffic.Because of this, you have no excuse for failing to push your code. Check and make sure it was pushed so your work is counted.
Work that is not pushed will receive no credit.