CSE 532S Concurrency Design Studio

Please complete the required studio exercises listed below, along with any of the (optional) enrichment exercises that interest you.

As you work through the exercises, please record your answers in a file, and upon completion please e-mail your answers to the cse532@seas.wustl.edu course e-mail account with Concurrency Design Studio in the subject line.

Please make sure that the name of each person who worked on these exercises is listed in the first answer, and that you number your answers so they are easy for us to match up with the appropriate exercise.

Required Exercises

1. As the answer to the first exercise, list the names of the people who worked together on this studio.

2. Open up Visual Studio 2013, make sure your settings are for C++, and create a new project for this studio (for example named something like `concurrency_design`).

In the main C++ source code file for the project (which should be named something like `concurrency_design.cpp`) please modify the main function signature so that it looks like the standard (i.e., portable between Windows and Linux) main function entry point for C++: `int main (int, char * [])`

In your main function, declare a two dimensional (2D) C style array of integers with 4 rows and 16 columns, another one with 16 rows and 4 columns, and another one with 8 rows and 8 columns. Initialize their elements with different values, and for each of the arrays print out the address, row number, and column number of each of its elements.

Build and run your program, and as the answer to this exercise (1) show your program's output, (2) based on that output explain how to define rectangular regions of each array whose elements are contiguous in memory, and (3) explain whether or not there is a single common formula for defining contiguous rectangular regions that would apply to all three arrays (and if so what it is, or if not why not).

3. Design a function that takes an integer value and additional parameters defining a contiguous rectangular region of one of your 2D arrays from the previous exercise (which could be as little as a single element or as much as the entire array) and searches the region for all occurrences of the passed value, printing out the value and how many times it found it after it is done searching (feel free to synchronize the threads' access to the standard output stream if you see output races).

In your main function, call `std::thread::hardware_concurrency()` to obtain the number of threads to use for this exercise (if the function is not implemented or returns a value less than 2, please use 4 as a default number of threads for this exercise).

Spawn that many threads in your main function, and use the search function for each of two variations: in one variation partition the array into disjoint regions and have all of the threads search for the same value; in the other variation have each thread search the entire array for its own unique value. After spawning all the threads, the main function should join with each of them.

As the answer to this exercise, please (1) say how many threads you used in each variation and whether that number was returned by `std::thread::hardware_concurrency()` or was used by default, (2) show your code for the thread function and its use in the main function, and (3) show the output your program produced, for both variations.

4. Design a matrix addition function (see http://en.wikipedia.org/wiki/Matrix_addition) that takes parameters for the same contiguous rectangular region of three different 2D arrays, and assigns each element of the third region the sum of the values of the same elements (i.e., with the same row and column number) of the first two regions. The main function should spawn the same number of threads as in the previous exercise, to add two 2D arrays and store the result in a third one (all with the same number of rows and columns -- feel free to declare and initialize additional arrays as needed for this exercise or any of the following ones), with each thread handling a disjoint portion of the work in parallel. After spawning all the threads, the main function should join with each of them, and then print out the first, second, and third arrays.

As the answer to this exercise, please show your code and the output of your program, and describe any interesting aspects of your design or of the program's behavior that you observed.

5. Design a matrix multiplication function (see http://en.wikipedia.org/wiki/Matrix_multiplication) that takes parameters for appropriate rectangular regions of three different 2D arrays (each of appropriate extent according to the definition of matrix multiplication), and assigns each element of the third region the product of the corresponding row in the first array with the corresponding column of the second array.

The main function should spawn the same number of threads as in the previous exercise, to multiply two 2D arrays and store the result in a third one, with each thread handling a disjoint portion of the work in parallel. After spawning all the threads, the main function should again join with each of them, and then print out the first, second, and third arrays.

As the answer to this exercise, please show your code and the output of your program, and describe any interesting aspects of your design or of the program's behavior that you observed.

Enrichment Exercises (Optional)

6. Repeat the array addition (and/or array multiplication) exercise with smaller and larger matrices, and with different numbers of threads (e.g., a single thread vs. the same number of threads as in the exercise, vs. all the way up to a thread per element of the output array).

Using the `std::chrono::steady_clock` class or another suitable time source, measure the time from just before the first thread is spawned until the last join is completed. Run each experiment multiple times, and see if you can detect meaningful (and stable) variations in the program's performance with more or fewer threads and smaller or larger arrays.

As the answer to this exercise please describe how you designed this experiment, what you saw, and what conclusions (if any) can be drawn from it.

7. In your main function, define an array of large square 2D arrays (e.g., with 50 rows and 50 columns or larger), and initialize the values in each of the 2D arrays. Use the array addition function from the previous exercise, and the Active Object pattern (feel free to re-use code from the Active Object Studio) to implement a parallel thread pipeline to sum up a number of arrays, as in

`A + B + C + D + E + F + G + H + I + J`

where `A` through `J` are all square arrays of the same dimension and `+` is the matrix addition operation.

Declare a struct with three pointers (and any other parameters you would like) defining the regions that should be added and the region where the sums should be stored, and pass appropriately valued instances of that struct to the active objects. For example, you might declare some temporary arrays that are initialized to all zeroes (say `T1` through `Tn` where n is the number of threads), pass the first active object pointers to `A`, `B`, and `T1`, pass the second active object pointers to `C`, `D`, and `T2`, etc.

After all of the pairs of input arrays have been added, the program then should add the temporary arrays to each other, with the result after all the additions ending up in the array `Z`. How you assign the remaining additions and how you synchronize them with the previous ones (in light of data dependences) is up to you.