"There are two main reasons to use concurrency in an application: separation of concerns and performance. In fact I'd go so far as to say that they're pretty much the only reasons to use concurrency;"
—Anthony Williams, C++ Concurrency in Action, Chapter 1.2
As we have discussed previously, the Linux Kernel provides a variety of schedulers, each of which is better (or worse) suited for different types of tasks. A good understanding of the semantics of the different schedulers, and which of them is better suited for different scenarios, can make a significant difference in the performance of a system.
In this lab, you will:
Please complete the exercises described below. As you work through them, please take notes regarding both your observations and your answers to the different questions given below, and when you are finished please write-up a cohesive report (see next) and e-mail it along with your code files, Makefile, etc. to email@example.com with the phrase Scheduler Profiling in the subject line.
What you will turn in for this lab will differ somewhat from previous lab and studio exercises. You will submit a cohesive report that unifies your observations and thoughts -- doing so will be a helpful stage between the format of the studios and first lab, and the final project report you will present and submit at the end of the semester.
Instead of answering this lab's questions independently of each other, as you go make note of your observations and answers to each question. As you move from one question to the next, consider new answers in light of the previous answers and feel free to go back to earlier questions and note connections with other questions. This will help you to synthesize a cohesive report when you are finished.
This lab also is meant to focus less on how things are implemented and more on what you learn and notice about the different scheduling classes. For more information see "What to turn in" below.
Note: this lab can be done in the language of your choosing. The lab's purpose is to demonstrate knowledge of the Linux scheduler and the ability to think critically, not to demonstrate mastery of any particular programming language. That said, everything in this lab is fairly straight-forward to do in C, and even more so if you use the C++11 threading libraries. You are free to adopt algorithms, code fragments, etc. from other sources (such as Williams' book quoted above), but if you do so you must comment those portions of your code and also discuss and cite what you've used (and the source from which it came) in your submitted report.
In this lab you will create a program that will spawn a certain number of threads to be pinned on each core. These threads will then wait at a barrier until all other threads have been successfully spawned and pinned. Once all threads have arrived at the barrier, they will each (safely) select the next number from a data structure and cube that number repeatedly (for a given number of iterations). This activity of selecting a number and repeatedly cubing it (which is intended to define a basic unit of workload for the thread to perform) is then repeated for a given number of rounds in each thread, giving it a sustained and configurable overall workload and some degree of contention among the threads, through which the performance of each scheduler can be evaluated.
The program will take in four or more arguments indicating (1) the scheduling class to be used (SCHED_FIFO, SCHED_RR, or SCHED_NORMAL), (2) a positive number of rounds for each thread to perform overall, (3) a positive number of iterations of cubing the selected number that each thread will perform in each round, and (4+) one or more additional numbers that should be used to populate the data structure from which the threads will repeatedly obtain numbers. For example, a command line such as
./myprog SCHED_RR 100 1000 2 3 5 7 11
would use the round-robin real-time scheduler and each thread would perform one hundred rounds of: obtaining one of the prime numbers in the range 2 through 11 inclusive and simply repeatedly computing the cube of that same number (not re-cubing the result of the previous iteration which could easily introduce overflow and other representation issues we won't go into) 1000 times.
NOTE: Some of these exercises/questions will likely freeze your Pi. Save your work often, and read ahead to make sure you are aware where we expect such freezes may occur.
<program_name> <scheduler> <rounds> <iterations> <number>+
The scheduler argument should indicate either the SCHED_RR, SCHED_FIFO, or SCHED_NORMAL scheduler (note that SCHED_NORMAL is sometimes called SCHED_OTHER).
The rounds argument gives the number of times each thread should select a new number from the data structure.
The iterations argument gives the number of the times within each round that the selected number should be cubed by the thread.
One or more arguments should be given after that, indicating number values that should be read into the data structure (from which the threads will then select specific numbers to cube).
The data structure holding the numbers will be accessed by multiple threads at once, and should maintain (safely) a variable (an index, counter, pointer, etc.) for which number the next thread should read. Each time a number is read by a thread, that variable should advance to the next number in the data structure (and after the last number is read should go back to the first number). You must allow concurrent access to this structure but avoid data races (particularly for that variable). Atomic variables and/or different kinds of locks are possibilities for this.
Furthermore, it would defeat the purpose of the lab to allow certain threads to begin their important work of cubing integers while other threads were still being spawned and pinned. Therefore, create a way for threads to spin wait until all threads are ready to being their task. This is known as a thread barrier (this can be accomplished similarly using atomic variables and/or mutexes -- Anthony Williams has a nice C++ implementation on page 269 of his book C++ Concurrency in Action, though his barrier yields the processor, and yours needs to spin). Have your threads wait on the barrier once more after they have finished their work.
What to turn in: (1) all the code and compilation files used to implement and run your solution (including a Makefile if you used one, etc.); (2) a readme.txt file with the contents described next, and (3) other files (e.g., with screen-shots from Kernelshark) that enhance your report.
The first section of your readme.txt file should include: