CSE 522S: Studio 11

Build Your Own Locks

Even as they spoke there came a blare of trumpets. Then there was a crash and a flash of flame and smoke. The waters of the Deeping-stream poured out hissing and foaming: they were choked no longer, a gaping hole was blasted in the wall. A host of dark shapes poured in.

The Two Towers, Chapter 7, Book III

Locking primitives are important userspace tools for concurrent and parallel programming. Two main types of locks exist. Spinlocks consume processor time as a process waits, but are ideally suited for low-latency and low-overhead appliations when critical section lengths are very short. Other locks allow processes to sleep while waiting, which can better utilize processor time but results the higher overhead of sleeping and waking up processes.

In this studio, you will:

  1. Build a userspace spin lock with atomic instructions
  2. Build a userspace sleep lock with atomic instructions and futexes

Please complete the required exercises below, as well as any optional enrichment exercises that you wish to complete.

As you work through these exercises, please record your answers, and when finished email your results to dferry@email.wustl.edu with the phrase Locks in the subject line.

Make sure that the name of each person who worked on these exercises is listed in the first answer, and make sure you number each of your responses so it is easy to match your responses with each exercise.

Required Exercises

  1. As the answer to the first exercise, list the names of the people who worked together on this studio.

  2. Download this program. This is a short parallel workload that creates about five seconds of work on each processor in a system. Build the program and verify that it occupies all of your system's cores by using the top program and pressing 1.

  3. Right now, all functions can execute the critical_section() function simultaneously. This is undesirable, since critical sections typically protect important shared data. First we will build a spin lock to protect access to the critical_section() function.

    Create an empty lock and unlock function. These functions should take a pointer to a volatile integer. Insert these functions into the code around the critical section.

    Note: Recall that the volatile specifier tells the compiler that the value of a variable may change. In this case, the compiler interprets a volatile int* to mean that the value pointed at by the pointer may change unexpectedly, not that the pointer itself may change.

  4. In order to treat an integer like a lock, we need to define two values that represent the locked and unlocked states, respectively. Define these at the top of your program with a compiler pound-define. Create two integer variables that hold these values.

  5. In order to implement the lock and unlock functions we'll use GCC's built-in atomic instructions. If we were working in C++ we could use C++ 11's atomic instructions. If we didn't have access to GCC, or if speed was very critical, we could implement these with assembly instructions.

    The atomic built-in functions are documented here. For the spin lock we will use the function __atomic_compare_exchange() . The first three arguments determine the semantic meaning of this function: ptr, expected, and desired. When called, this function atomically compares the contents of ptr and expected, and if they are equal, writes the value of desired into ptr. The last three arguments specify special memory orderings, but we'll just opt for a strong ordering for this studio. The last three parameters to invoke this function are as so:

    __atomic_compare_exchange( ptr, expected, desired, false, __ATOMIC_ACQ_REL, __ATOMIC_ACQUIRE)

    To implement the lock function, you should check for the unlocked state and write the value of the locked state. However, it's possible that this function will fail (for example, if another process already holds the lock). Thus, your lock function should attempt to swap the state of the lock variable, and continue retrying indefinitely until it succeeds. WARNING: The swap function will overwrite the value of expected when it fails!

    Implement the unlock function with the same command. However, we only expect the unlock function to be called when we already hold the lock. If the call to __atomic_compare_exchange() fails, then rather than retrying the function, you should print out an error message and return.

  6. Lastly, create volatile integer to serve as your lock and initialize it using your pound-define state value. Run your program and use each thread's finishing statement to verify that only one thread enters the critical section at a time.

  7. Now we will implement a sleep lock. The lock above consumes processor time while it's waiting, because it continually retries the lock operation until it succeeds. To implement our sleep lock, we will replace this behavior with one where we try to acquire the lock, but if we fail, the thread sleeps until it is later woken up. To begin, make a copy of your program and delete the function bodies of lock() and unlock(), as well as your lock state pound-defines.

  8. The sleep and wakeup mechanism we will use is a system call called futex, which stands for a fast userspace mutex. The system call handles the mechanism of sleeping and waking processes, but a userspace-side component must decide how and when to use this capability. In particular, the futex is designed to implement a semaphore on top of an integer. There are three states:

    At least one process is sleeping:any negative number

    Since the futex is designed to implement a semaphore, this means that processes lock and unlock the futex by atomic increments and decrements. When a process claims the futex, it atomically decrements the integer by one. When a process releases the futex, it atomically increments the integer by one. If two processes never conflict, then the value of the futex integer will always be zero or one, and no process will ever have to sleep (and thus, you will never need to make a futex system call).

    However, if multiple processes try to lock the futex simultaneously, they will decrement the integer value to be negative. In this case, a process that gets some value less than zero will want to put itself to sleep, and the kernel must become involved. The semantics and the particulars of this process are documented at the man pages man 2 futex and man 7 futex.

  9. First, declare the locked and unlocked states in a pound-define at the top of your program.

  10. Implement your lock function with the following algorithm.

    1. Decrement the lock integer with ret_val = __atomic_sub_fetch( lock_ptr, 1, __ATOMIC_ACQ_REL );
    2. Check to see if the return value is less than zero
    3. If yes, we need to sleep. Set the lock integer to -1 with __atomic_store_n( lock_ptr, -1, __ATOMIC_RELEASE );
    4. Then call the system call: syscall( SYS_futex, lock_ptr, FUTEX_WAIT, -1, NULL );
    5. Then go back to step 1
    6. If no, exit the lock() function

  11. Implement your unlock function with the following algorithm.

    1. Increment the lock integer with ret_val = __atomic_add_fetch( lock_ptr, 1, __ATOMIC_ACQ_REL );
    2. Check to see if the return value is one
    3. If yes, exit the unlock() function
    4. If no, we need to wake up some sleeping thread. Set the lock integer to 1 with __atomic_store_n( lock_ptr, 1, __ATOMIC_RELEASE );
    5. Then call the system call: syscall( SYS_futex, lock_ptr, FUTEX_WAKE, INT_MAX );

  12. Create a volatile integer to serve as your lock. Initialize it with your pound-define unlocked value. Run your program and verify that only one thread is able to access the critical section at a time.

  13. Trace both versions of your program with trace-cmd record -e sched_switch. Take a screen shot showing both behaviors.

  14. Notice that your spin lock is able to do synchronization entirely in userspace, while the futex lock requires the intervention of the kernel. Would it be possible to do a sleep lock entirely in userspace (i.e. with no system calls)?

Things to turn in