Name: _____________________________ 6 Digit StudentID: ___ ___ ___ ___ ___ ___

Worksheet 13: HashTables, MatrixMapReduce, and Sunflowers

Recalling this slighly modified passage from Hamlet: Act II Scene 2

    Hamlet: We must abstract! Functions of hash for all!

    Polonius: [Aside] Though this be madness, yet there is method in't.

For some reason, we specified a hashFunction to use when implementing the NotThreadSafeHashTable. One might ask: Why not just use Objects::hashCode(object) directly?

class HashUtils 
    int hashCodeOf(K key, ToIntFunction<K> hashFunction) 
    int arrayIndexForKey(K key, int arrayLength, ToIntFunction<K> hashFunction) 
    E itemInArrayForKey(K key, E[] array, ToIntFunction<K> hashFunction) 

class NotThreadSafeHashTable<K, V> implements Map<K, V>
    NotThreadSafeHashTable(int chainCount, ToIntFunction<K> hashFunction) {

Today's assignment also specifies a hashFunction to use when implementing MatrixMapReduceFramework.

public class MatrixMapReduceFramework<E, K, V, A, R> implements MapReduceFramework<E, K, V, A, R> 
    public MatrixMapReduceFramework(Mapper<E, K, V> mapper,
        AccumulatorCombinerReducer<V, A, R> accumulatorCombinerReducer,
        int mapAndAccumulateTaskCount,
        int combineAndReduceTaskCount,
        ToIntFunction<K> hashFunction) 

Scenario

Both HashMap and MatrixMapReduceFramework use the hashCode of a key to distribute the load. HashMap uses hashCode to determine which bucket a key belongs to. MatrixMapReduceFramework use the hashCode to determine which column's HashMap a key belongs to.

The MatrixMapReduceFramework uses the number of processors for the number of columns. This is reasonable. Typically the number of processors is a power of 2. Let’s say it is 8 for this scenario.

HashMap starts with an initial capacity, say 16:

public class HashMap<K,V> extends AbstractMap<K,V> implements Map<K,V> {
    /**
     * The default initial capacity - MUST be a power of two.
     */
    static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; // aka 16

and doubles bucket count when the “load” gets too high. This is also reasonable. By maintaining a power of 2 bucket count, the relatively expensive floorMod can be replaced by: (buckets.length - 1) & hash.

Would using the same hashCode for both break the system?

 

 

Would there be performance implications?

 

 

How does it relate to sun flowers?

Post Lecture

Synthesize today's class session

 

 

What is unclear?