"Although it would be nice if every critical region consisted of code that did nothing more complicated than incrementing a variable, reality is much crueler."
—Robert Love, Linux Kernel Development 3rd Ed., Chapter 10, pp. 183
Locking primitives are important user space tools for concurrent and parallel programming. Two main types of locks exist: locks that spin while a program is waiting to acquire the lock, versus those that cause the program to sleep. Spinlocks consume CPU cycles as a process waits, but are well suited for low-latency applications when critical sections are very short. Other locks allow processes to sleep while waiting, which costs fewer CPU cycles but may result in longer latency with processes sleeping and waking up.
In this studio, you will:
Please complete the required exercises below.
As you work through these exercises, please record your answers, and when finished email your results to email@example.com with the phrase Userspace Locks in the subject line.
Make sure that the name of each person who worked on these exercises is listed in the first answer, and make sure you number each of your responses so it is easy to match your responses with each exercise.
workload.cprogram. This program produces several seconds of work on each processor in a system. Build and run the program on your Raspberry Pi (see the comment in the program's code for the command line you will need to use to build it) and verify that it occupies all of your system's cores by using
topand pressing 1.
As the answer to this exercise, please explain why (based on the
output of the
top utility) you think the program is occupying
all of the cores.
critical_section()function concurrently (or even simultaneously if they are on different cores). This is undesirable, since critical sections typically protect important shared data. First we will build a spin lock to protect access to the
Write an initially empty
unlock function. Each of these functions should take a pointer to a
Note: Recall that the
volatile specifier tells the compiler
that the value of a variable may change. In this case, the compiler interprets the
volatile int * type declaration to mean "a pointer to an int that's volatile"
(the value pointed at by the pointer may change unexpectedly, not that the pointer
itself may change.)
To treat an integer like a lock, we need to define two values
that represent the locked and unlocked states, respectively. Define these at
the top of your program with a pre-compiler
Also create two integer variables that will hold these values, and initialize them
Inside the parallel region of the program, insert calls to the
unlock functions before and after the critical
section of the
workload.c program, respectively, passing the address
of the lock variable into each of those calls.
As the answer to this exercise, please show your statements to declare the two values and the three variables.
unlockfunctions we'll use GCC's built-in atomic instructions. If we were working in C++ we could use C++ 11's atomic instructions. If we didn't have access to GCC, or if speed was very critical, we could implement these with assembly instructions.
The atomic built-in functions are documented here.
For the spin lock we will use the function
. The first three arguments determine the meaning of this
function: ptr, expected, and desired. When called,
this function atomically compares the contents of the location pointed to by ptr
with the contents of the expected variable, and
if they are equal, writes the value of desired into the location pointed to
by ptr. The last
three arguments specify special memory orderings, but we'll just opt for a
strong ordering for this studio, as in:
__atomic_compare_exchange( ptr, expected, desired, false, __ATOMIC_ACQ_REL, __ATOMIC_ACQUIRE)
To implement the lock function, you should check for the unlocked state
and write the value of the locked state. However, it's possible that this
function will fail (for example, if another process already holds the lock).
Thus, your lock function should attempt to swap the state of the lock variable,
and continue retrying until it succeeds. WARNING: The
function may overwrite the value of
expected when it fails! Please
make sure to re-initialize its value as needed.
Implement the unlock function using the same
function. However, since we only expect the unlock function to be called when we already
hold the lock, it should succeed unless something is drastically wrong. If the
__atomic_compare_exchange() fails, then rather than
retrying the function, your code should print out an error message and return.
As the answer to this exercise, please show your code for both the
As the answer to this exercise, please explain why (based on the output you saw) the program only allows one thread at a time into the critical section.
unlock()functions, and the variables that held the locked and unlocked states (don't remove the lock variable or the statements that
#definethe locked and unlocked state values).
The sleep and wakeup mechanism we will use for this second version of userspace synchronization is a system call named futex, which stands for a fast userspace mutex. This system call handles the mechanisms for sleeping and waking processes, but userspace library code must decide how and when to use this capability. Specifically, the futex library function is designed to implement a semaphore on top of an integer. There are three states:
|At least one process is sleeping:||any negative number|
Since the futex is designed to implement a semaphore, this means that processes lock and unlock the futex by atomic increments and decrements. When a process claims the futex, it atomically decrements the integer by one. When a process releases the futex, it atomically increments the integer by one. If two processes never conflict, then the value of the futex integer will always be zero or one, and no process will ever have to sleep (and thus, you will never need to make a futex system call).
However, if multiple processes try to lock the futex simultaneously, they will
decrement the integer value to be negative. In this case, a process that gets
some value less than zero will want to go to sleep, and the kernel then
must become involved. The semantics and the particulars of this process
are documented in the man pages produced by the following commands:
man 2 futex and
man 7 futex.
Make sure that the
#define values you have declared
for the unlocked and locked state values are consistent with the semantics
described above, and if not update them accordingly.
As the answer to this exercise, please explain whether or not you needed to change
#define values in your code, and why or why not.
ret_val = __atomic_sub_fetch( ptr, 1, __ATOMIC_ACQ_REL );
__atomic_store_n( ptr, -1, __ATOMIC_RELEASE );
syscall( SYS_futex, ptr, FUTEX_WAIT, -1, NULL );
ret_val = __atomic_add_fetch( ptr, 1, __ATOMIC_ACQ_REL );
__atomic_store_n( ptr, 1, __ATOMIC_RELEASE );
syscall( SYS_futex, ptr, FUTEX_WAKE, INT_MAX );
As the answer to this exercise, please show your implementations of the new
trace-cmd record -e sched_switch. Take a screen shot showing both behaviors.
Notice that your spin lock is able to do synchronization entirely in userspace, while the futex lock sometimes requires the intervention of the kernel. As the answer to this exercise, please explain whether or not it would be possible to do a sleep lock entirely in userspace (i.e. with no system calls), and why or why not?