CSE 522S: Studio 15

Inter-process Shared Memory

Shared memory is a low-level, fast, and powerful method for sharing data between concurrently executing processes. This technique allows two separate processes to read and write the same physical memory, providing an inter-process programming paradigm that is very close to that of multi-threaded or parallel programming.

In this studio, you will:

  1. Create fixed-size shared memory regions across processes
  2. Implement a basic but robust concurrency protocol to manage concurrent reads and writes
  3. Clean up the shared memory regions safely
  4. Benchmark shared memory speed

Please complete the required exercises below, as well as any optional enrichment exercises that you wish to complete.

As you work through these exercises, please record your answers, and when finished email your results to eng-cse522@email.wustl.edu with the phrase Shared Memory in the subject line.

Make sure that the name of each person who worked on these exercises is listed in the first answer, and make sure you number each of your responses so it is easy to match your responses with each exercise.


Required Exercises

  1. As the answer to the first exercise, list the names of the people who worked together on this studio.

  2. Once constructed, the basic interface to a shared memory region is just a void pointer. Rather than working at this low level (and in order to avoid the temptation for unsafe programming) we will start by defining a structure that we will use to impose order. Our basic shared data structure will be a constant-size array.

    1. Create a header file
    2. Define a char* string that will serve as the name for your shared memory region.
    3. Define (#define) a symbolic value that holds the size of your shared array: for testing, a size of around 10 should be sufficient.
    4. Declare a structure that will organize your shared data. In this case, declare four fields:
      • volatile int write_guard
      • volatile int read_guard
      • volatile int delete_guard
      • volatile int data[shared_mem_size]

  3. Our concurrency approach in these exercises will implement a classic leader/follower solution. We will assume that one process, called the leader, is the one that is always executed first and creates the shared memory region. Otherwise, we would have to account for a concurrency race for which process creates and sets up the region.

    Create a new file called leader.c. This file should create the shared memory region with the following steps. General documentation for Linux shared memory regions is found in man 7 shm_overview.

    1. Create a shared memory file descriptor with shm_open(). You should use the flags O_RDWR | O_CREAT, which specify that this shared memory is both readable and writable and that this call should create the region if it does not already exist. It is sufficient to use S_IRWXU for the third parameter (which defines user permissions to the shared region).
    2. The previous step creates the shared memory region, but it is created with a capacity of zero bytes. Use the function ftruncate() to resize the memory region. Since we are organizing our shared memory via the struct declared in your header file, set the size to be the sizeof this struct.
    3. Now we need a way to read and write to the newly created region. Use the mmap() function to map the shared memory into your program's address space. The addr and offset fields should be NULL and 0, respectively. The permissions parameter should specify both PROT_READ and PROT_WRITE, and the flags parameter should specify that this is a shared mapping with MAP_SHARED.
    4. Finally, we want to treat our shared memory region as though it were a struct of the type declared in our header file. Create a struct pointer and use it to cast the return value of mmap(). Now, you can read and write your shared structure via this pointer.

    Define an array the same size as the data[] array in your shared struct. In your program, use the srand() and rand() functions to populate this local array. Then, copy this array into the shared struct- either with the memcpy() function, or though element-wise assignment. Have your program print out the local array.

  4. Make a copy of your leader program named follower.c. This program will gain access to the shared memory region in nearly the same way, but with two modifications. First, the call to shm_open() should not specify O_CREAT. Second, the call to ftruncate() is unnecessary and should be removed (as long as you don't change the size of the region it doesn't hurt anything, but it potentially could be a source of inconsistency if its value were changed).

    Modify follower.c so that it prints the contents of the shared data field. Build both programs, and then execute the leader and follower programs, in that order. Verify that the program output is identical. Copy and paste the output of both programs as the answer to this exercise.

  5. Right now our processes are effectively acting like they are reading and writing a shared file, but we would like them to interact more dynamically. In particular, we want each of our processes to react to events that occur in the other. The desired execution is as follows:
    1. The leader creates the shared memory region and waits for the follower to be created.
    2. The follower is created, notifies the leader to start writing, and waits for the data to be written to the shared struct.
    3. The leader writes the data to the struct, notifies the follower to start reading, and waits for the follower to finish reading.
    4. The follower prints the data to the console, notifies the leader that it is finished, and unlinks itself.
    5. The leader destroys the shared memory region.

    The purpose of the other non-data fields in our shared struct is to facilitate the waiting and notification of these events between processes. For example, in the sequence above the leader must wait for the follower to be created before it starts writing data into the shared region. The leader can wait on the value of the write_guard variable by spinning,

    while( shared_ptr->write_guard == 0 ){}

    and the follower can notify the leader it is safe to proceed by modifying the value,

    shared_ptr->write_guard = 1;

    Modify your program to reflect the sequence of events given above, using the write_guard, read_guard, and delete_guard variables. Also, once it is safe to do so, the leader should remove the shared region with the function shm_unlink()

    Note that the shared memory region we created lives outside of either process and persists when neither program is running (existing shared memory regions can be found under the directory /dev/shm/). The above synchronization code relies on the fact that variables are intialized to zero, which may not be true if your shared memory region is not properly destroyed after being used. If you have inexplicable program bugs, you can verify that this is not the issue by manually checking and deleting your shared memory region in the above directory.

  6. Once you are convinced that your concurrency protocol is working, modify the follower so that, rather than printing the contents of the shared structure to the console, you simply copy the shared data into a local array. With the protocol described above, the follower process lives just long enough for the leader to write data into the shared array and for the follower to copy data out of the shared array. That is, the lifetime of the follower process is approximately one complete transfer of data through shared memory.

    Use the time command with the follower program to obtain a rough estimate of the bandwidth through shared memory, in bytes per second. Take measurements where the shared array size is one million integers, ten million integers, and one-hundred million integers.

    As the answer to this problem, report your recorded values.

Things to Turn In:


Optional Enrichment Exercises

  1. Shared memory has a reputation for being fast, but that's not necessarily always the case. For example, when typical reads and writes are much smaller than the size of a page (4 kilobytes), or when memory is heavily loaded, this method is known to not perform as well as some others (such as pipes) due to paging overhead. Try benchmarking pipes and sockets versus shared memory for a specific access pattern and see which method is actually fastest, under different conditions!