CSE 522S: Studio 15

Inter-process Shared Memory

Shared memory is a low level, fast, and powerful method for sharing data between concurrently executing processes. This technique allows two separate processes to read and write the same physical memory, providing an inter-process programming paradigm that is very close to that of multi-threaded or parallel programming.

In this studio, you will:

  1. Create fixed-size shared memory regions across processes
  2. Implement a basic but robust concurrency protocol to manage concurrent reads and writes
  3. Clean up the shared memory regions safely
  4. Benchmark shared memory speed

Please complete the required exercises below, as well as any optional enrichment exercises that you wish to complete.

As you work through these exercises, please record your answers, and when finished email your results to dferry@email.wustl.edu with the phrase Shared Memory in the subject line.

Make sure that the name of each person who worked on these exercises is listed in the first answer, and make sure you number each of your responses so it is easy to match your responses with each exercise.

Required Exercises

  1. As the answer to the first exercise, list the names of the people who worked together on this studio.

  2. Once constructed, the basic interface to a shared memory region is just a void pointer. Rather than working at this low level (and in order to avoid the temptation for unsafe programming) we will start by defining a structure that we will use to impose order. Our basic shared data structure will be a constant-size array.

    1. Create a header file
    2. Define a char* string that will serve as the name for your shared memory region.
    3. Define a #define that holds the size of your shared array. For testing, a size of around 10 should be sufficient.
    4. Declare a structure that will organize your shared data. In this case, declare four fields:
      • volatile int write_guard
      • volatile int read_guard
      • volatile int delete_guard
      • volatile int data[shared_mem_size]

  3. Our concurrency approach today will be to implement a master/slave solution. We will assume that one process, called the master, is the one that is always executed first and creates the shared memory region. Otherwise, we would have to account for a concurrency race on which process creates and sets up the region.

    Create a new file called master.c. This file should create the shared memory region with the following steps. General documentation for Linux shared memory regions is found in man 7 shm_overview.

    1. Create a shared memory file descriptor with shm_open(). You should use the flags O_RDWR | O_CREAT, which specify that this shared memory is read/write and that this call should create the region if it does not already exist. It is sufficient to use S_IRWXU for the third parameter (which defines user permissions to the shared region).
    2. The previous step creates the shared memory region, but it is created with a capacity of zero bytes. Use the function ftruncate() to resize the memory region. Since we are organizing our shared memory via the struct declared in your header file, set the size to be the sizeof this struct.
    3. Now we need a way to read and write to the newly created region. Use the mmap() function to map the shared memory into your program's address space. The addr and offset fields should be NULL and 0, respectively. The permissions parameter should specify both PROT_READ and PROT_WRITE, and the flags parameter should specify that this is a shared mapping with MAP_SHARED.
    4. Finally, we want to treat our shared memory region as though it were a struct of the type declared in our header file. Create a struct pointer and use it to cast the return value of mmap(). Now, you can read and write your shared structure via this pointer.

    Define an array the same size as the data[] array in your shared struct. In your program, use the srand() and rand() functions to populate this local array. Then, copy this array into the shared struct- either with the memcpy() function, or though element-wise assignment. Have your program print out the local array.

  4. Make a copy of your master program named slave.c. This program will gain access to the shared memory region in nearly the same way, with two modifications. First, the call to shm_open() should not specify O_CREAT. Second, the call to ftruncate() is unnecessary, though keeping it (as long as you don't change the size of the region) doesn't hurt anything.

    Modify slave.c so that it prints the contents of the shared data field. Build both programs, and then execute the master and slave programs, in that order. Verify that the program output is identical. Copy and paste the output of both programs as the answer to this exercise.

  5. Right now our processes are effectively acting like they are reading and writing a shared file, but we would like them to share more dynamically. In particular, we want our processes to react to events that occur in their partner. The desired execution is as follows:
    1. The master creates the shared memory region and waits for the slave to be created.
    2. The slave is created, notifies the master to start writing, and waits for the data to be written to the shared struct.
    3. The master writes the data to the struct, notifies the slave to start reading, and waits for the slave to finish reading.
    4. The slave prints the data to the console, notifies the master that it is finished, and unlinks itself.
    5. The master destroys the shared memory region.

    The purpose of the other non-data fields in our shared struct is to facilitate the waiting and notification of these events between processes. For example, in the sequence above the master must wait for the slave to be created before it starts writing data into the shared region. The master can wait on the value of the write_guard variable by spinning,

    while( shared_ptr->write_guard == 0 ){}

    and the slave can notify the master it is safe to proceed by modifying the value,

    shared_ptr->write_guard = 1;

    Modify your program to reflect the sequence of events given above, using the write_guard, read_guard, and delete_guard variables. Also, once it is safe to do so, the master should remove the shared region with the function shm_unlink()

    Note that the shared memory region we created lives outside of either process and persists when neither program is running (existing shared memory regions can be found under the directory /dev/shm/). The above synchronization code relies on the fact that variables are intialized to zero, which may not be true if your shared memory region is not properly destroyed after being used. If you have inexplicable program bugs, you can verify that this is not the issue by manually checking and deleting your shared memory region in the above directory.

  6. Once you are convinced that your concurrency protocol is working, modify the slave so that, rather than printing the contents of the shared structure to the console, you simply copy the shared data into a local array. With the protocol described above, the slave process lives just long enough for the master to write data into the shared array and for the slave to copy data out of the shared array. That is, the lifetime of the slave process is approximately one complete transfer of data through shared memory.

    Use the time command with the slave program to obtain a rough estimate of the bandwidth through shared memory, in bytes per second. Take measurements where the shared array size is one million integers, ten million integers, and one-hundred million integers.

    As the answer to this problem, report your recorded values.

Things to Turn In:

Optional Enrichment Exercises

  1. Shared memory has a reputation for being fast, but that's not necessarily always the case. For example, when typical reads and writes are much smaller than the size of a page (4 kilobytes), or when memory is heavily loaded, this method is known to not perform as well as some others (such as pipes) due to paging overhead. Try benchmarking pipes and sockets versus shared memory for a specific access pattern and see which method is actually fastest!