"There are two main reasons to use concurrency in an application: separation of concerns and performance. In fact I'd go so far as to say that they're pretty much the only reasons to use concurrency;"
—Anthony Williams, C++ Concurrency in Action, Chapter 1.2
As we have discussed previously, the Linux Kernel provides a variety of schedulers, each of which may be better (or worse) suited for different types of tasks. A good understanding of the semantics of the different schedulers, and which of them is better suited for different scenarios, can make a significant difference in the performance of a system.
As for the previous lab assignment, for this lab you are encouraged to work with one or two other people as a team, though you may complete this assignment individually if you prefer. Teams of more than 3 people are not allowed for this assignment.
In this lab, you will:
Please complete the exercises described below. As you work through them, please take notes regarding both your observations and your answers to the different questions given below, and when you are finished please write-up a cohesive report (see next) and e-mail it along with your code files, Makefile, etc. to email@example.com with the phrase Lab2 in the subject line.
What you will turn in for this lab will differ somewhat from previous lab and studio exercises. You will submit a cohesive report that unifies your observations and thoughts: instead of answering this lab's questions independently of each other, as you go make note of your observations and your answers to each question. As you move from one question to the next, consider new answers as well as the previous answers and as appropriate go back to earlier questions and note connections with other questions. This should help you to synthesize a cohesive report when you are finished.
This lab also is meant to focus less on how things are implemented and more on what you learn and notice about the different scheduling classes. For more information see "What to turn in" below.
Note: the user-space program(s) for this lab can be written in any language of your choosing as long as that language supports the necessary features. The lab's purpose is to cultivate and demonstrate (1) knowledge about different Linux schedulers and (2) the ability to think critically about (and discuss) their behaviors (rather than to demonstrate mastery of any particular programming language). That said, everything in this lab is fairly straight-forward to do in C, and even more so if you use the C++11 libraries' threading and synchronization features. You are free to adopt algorithms, code fragments, etc. from other sources (such as Williams' book noted above), but if you do so you must (1) comment those portions of your code to clearly indicate from where they came, and also (2) discuss and cite what you've used (and the source from which it came) in your submitted report.
In this lab you will create a program that will spawn a certain number of threads to be pinned on each core. These threads will then wait at a synchronization barrier until all other threads have been successfully spawned and pinned. Once all threads have arrived at the barrier (once), they will each repeat the following operations: (safely) select the next number from a data structure and cube that number repeatedly (for a given number of iterations). This activity of: (1) safely selecting the next number (which should be protected from any data races), and (2) repeatedly cubing it (which is intended to define a basic unit of workload for the thread to perform) is then repeated for a given number of rounds in each thread, giving it a sustained and configurable overall workload and some degree of contention among the threads, through which the performance of each scheduler can be evaluated.
The program will take in five or more arguments indicating (1) the scheduling
class to be used (
SCHED_NORMAL), (2) whether threads should consume CPU cycles or suspend
their execution while they wait for each other (
(3) a positive number of rounds for each thread to perform overall,
(4) a positive number of iterations of cubing the selected number
that each thread will perform in each round, and (5+) one or more additional numbers
that should be used to populate the data structure from which the threads will
repeatedly obtain numbers. For example, a command line such as
./myprog SCHED_RR spin 100 1000 2 3 5 7 11
would use the round-robin real-time scheduler, threads would spin-wait in order to synchronize, and each thread would perform one hundred rounds of: obtaining one of the prime numbers in the range 2 through 11 inclusive and simply repeatedly computing the cube of that same number (not re-cubing the result of the previous iteration, which could easily introduce overflow and other representation issues we won't go into) 1000 times.
NOTE: Some of these exercises/questions may freeze your Pi. Save your work often, and read ahead to make sure you are aware where we expect such freezes may occur.
<program_name> <scheduler> <wait-strategy> <rounds> <iterations> <number>+
The scheduler argument should be either
"SCHED_NORMAL" (note that SCHED_NORMAL
is sometimes called SCHED_OTHER but we will use SCHED_NORMAL).
The wait-strategy argument should be either
"sleep" indicating whether active waiting (e.g., via spin-locks)
or passive waiting (e.g., via library features such as mutexes and
condition variables) should
be used to synchronize threads.
The rounds argument gives the number of times each thread should select a new number from the data structure.
The iterations argument gives the number of the times within each round that the selected number should be cubed by the thread.
One or more arguments should be given after that, indicating the values (number arguments) that should be read into the data structure (from which the threads will then select specific numbers to cube).
The data structure holding the numbers will be accessed by multiple threads
at once, and should maintain (safely, i.e., avoiding race conditions for it)
a variable (e.g., an index, counter, pointer, etc.) that keeps track of which
number the next thread should read. Each time a number is read by a thread,
that variable should advance to refer to the next number in the data structure
(and after the last number is read should go back to the first number).
You must allow concurrent access to this structure but avoid data races
(particularly for that variable), e.g., using spin locks if the program's
wait-strategy argument was
"spin", or mutexes
and condition variables if it was
Furthermore, it would defeat the purpose of the lab to allow certain threads
to begin their (important :-) work of cubing integers while other threads were
still being spawned and pinned. Therefore, create a way for threads to synchronize
and wait (again using spin locks if the program's
wait-strategy argument was
"spin", or mutexes
and condition variables if it was
until all threads are ready to begin their work. This is known as a
"spin"and then run it again with the wait-strategy argument set to
"sleep", and when you answer each of the questions below please discuss whether or not you saw any differences in behavior when using one strategy versus the other (and if you did, what those differences were and why you think they may have occurred).
How you obtain the timing information is up to you. Be creative. Possibilities include creating a kernel module that that monitors which tasks are on the CPU and/or writing a script that would extract that information from a trace-cmd .dat file. Note and explain your observations. Please use appropriately large numbers of rounds and iterations, so that the scheduling behavior is clear (setting both to 1000 should suffice).
Question 4: Which of the above may help to address the problem of your Pi (potentially) freezing, and how and why would it help? Evaluate your hypothesis by implementing the necessary change and running your program with both SCHED_RR and SCHED_FIFO, examining how their traces differ from what you saw previously, and considering and discussing why they might have done so.
What to turn in: (1) all the code and compilation files used to implement and run your solution (including a Makefile if you used one, etc.); (2) a readme.txt file with the contents described next, and (3) other files (e.g., with screen-shots from Kernelshark) that enhance your report.
The first section of your readme.txt file should include:
Changes since original posting: