Administrative stuff
The instructor for the course this semester is
Roger Chamberlain.
Prefered email for class business is roger AT cse.wustl.edu.
Class meetings are generally from 4 pm to 5:30 pm on Tuesdays and Thursdays.
Office hours are from 3:30 pm to 4 pm on Tuesdays
and whenever you can find me in my office
(usually midday is better than early morning or late afternoon).
We're trying out the use of the new
course
management system (CMS)
for discussion forum, posting of assignments and readings, etc.
Let me know if you have difficulties accessing it.
Homework assignments
The following are .pdf files of the homework assignments.
- Assignment #1.
The serial Monte Carlo code for problem 5 is
here.
The simple parallel template code is
here.
Included in the program is a random number generator that can let each thread
run with it's own seed.
Collect timing information for 1, 2, 3, and 4 threads (processors).
The total number of iterations should be sufficiently large for the
serial program to run for at least a few minutes.
Use this same number of iterations when running on multiple processors.
-
Assignment #2.
The MPI-based Monte Carlo code for problems 1 and 2 is
here.
Where the documentation describes mpicc and mpiexec, use
mpicc.local and mpiexec.local if you are running on
your local machine (either rote.cec or one of the lab machines).
You do not need to put anything in your execution path (i.e., ignore the
earlier instructions).
Instructions for running across more than one machine are provided
here, which describes how
to setup, script, and submit a job.
- Assignment #3.
- Assignment #4.
Here are a few statistics from the class as a whole:
Median round trip time
for a minimum size message is 28 microseconds.
Median all_reduce time for
2 processors is 47 microseconds, and for 8 processors is 476 microseconds.
Median barrier time (both lock and trylock) for 2 processors is 2 microseconds,
and for 4 processors is 7 microseconds.
Median time to throw a dart is 49 nanoseconds.
- Assignment #5.
Reading assignments
- Course description.
- This
Introduction
to Parallel Computing Tutorial from LLNL is very good.
- This
Pthreads
Tutorial from LLNL is also very good.
- Section 3.1 of R. Chamberlain, D. Chace, and A. Patil,
"How
Are We Doing? An Efficiency Measure for Shared, Heterogeneous Systems,"
in Proc. of the ISCA 11th Int'l Conf. on Parallel and Distributed
Computing Systems, September 1998, pp. 15-21.
- Richard M. Fujimoto,
"Parallel
Discrete Event Simulation,"
Communications of the ACM, Vol. 33, No. 10, pp. 30-53, October 1990.
- Mary L. Bailey, Jack V. Briner, Jr., and Roger D. Chamberlain,
"Parallel
Logic Simulation of VLSI Systems,"
Computing Surveys, Vol. 26, No. 3, pp. 255-294, September 1994.
- Here is the
MPI
Tutorial from LLNL.
- Help in preparing MPI programs for execution is provided in the
MPICH2 User's Guide.
- Here is a cool youtube video of an
n-body simulation
using MPI to coordinate 32 graphics processors.
- Here is a not quite so cool youtube video of an
ocean simulation
that follows the effects of the tsunami in Dec. 2004.
- A few pages
from the AMD Architecture Programmers Manual describing the MOESI
protocol. The full manual is here.
- Chapter 7 of the Intel 64 and IA-32
Architectures Software Developer's Manual Volume 3A: System Programming
Guite, Part 1. This chapter describes multiple-processor issues (e.g.,
atomic instructions, memory models, etc.).
- Milo M.K. Martin, Mark D. Hill, and David A. Wood,
"Token
Coherence: A New Framework for Shared-Memory Multiprocessors,"
IEEE Micro, Vol. 23, No. 6, pp. 108-116, November-December 2003.
- My shared-memory Monte Carlo code for homework 1, problem 5 is
here. The revised code that
eliminates false sharing in the cache is
here.
- My MPI-based Monte Carlo code for homework 2, problem 2 is
here.
- C.N. Keltcher, K.J. McGrath, A. Ahmen, and P. Conway,
"The
AMD Opteron Processor for Multiprocessor Servers,"
IEEE Micro, Vol. 23, No. 2, pp. 66-76, March-April 2003.
The presentation
from Hot Chips, August 2002.
- J. Andrews and N. Baker,
"Xbox 360
System Architecture,"
IEEE Micro, Vol. 26, No. 2, pp. 25-37, March-April 2006.
The presentation from Hot
Chips, August 2006.
- D. Lenoski, J. Laudon, K. Gharachorloo, A. Gupta, and J. Hennessy,
"The
Directory-based Cache Coherence Protocol for the DASH Multiprocessor,"
in Proc. of the Int'l Conf. on Computer Architecture
1990, pp. 148-159.
- Ian Foster,
"Designing
and Building Parallel Programs (Online), Addison-Wesley, Inc.
- Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and
Clifford Stein,
"Chapter 27
Multithreaded Algorithms,"
in Introduction to Algorithms, Third Edition,
The MIT Press.
- Project idea: Parallel A*
Search.
- M. Gschwind, H.P. Hofstee, B. Flachs, M. Hopkins, Y. Watanabe, and
T. Yamazaki,
"Synergistic Processing in Cell's
Multicore Architecture,"
IEEE Micro, Vol. 26, No. 2, pp. 10-24, March-April 2006.
The presentation from Hot
Chips, August 2006, and a
second presentation, also
from Hot Chips, on the various on-chip interfaces.
- A. Gara et al.,
"Overview of the Blue Gene/L system
architecture,"
IBM J. Res. & Dev., Vol. 49, No. 2/3, pp. 195-212, March/May 2005.
A talk on the same
subject from a 2005 workshop.
- A.Y. Grama, A. Gupta, and V. Kumar,
"Isoefficiency:
Measuring the Scalability of Parallel Algorithms and Architectures,"
IEEE Parallel & Distributed Technology, Vol. 1, No. 3, pp. 12-21,
August 1993.
Lectures
- January 13 - Introduction,
including video
and 4-up notes.
- January 15 - Nomenclature,
including video
and 4-up notes.
- January 20 - Programming Paradigms,
including video
and 4-up notes.
- January 22 - Synchronization,
including video
and 4-up notes.
- January 22 - Monte Carlo Simulation,
including 4-up notes.
- January 29 - Logic Simulation,
including video
and 4-up notes.
- February 03 - Program Design,
including video
and 4-up notes.
- February 05 - Applications,
including video
and 4-up notes.
- February 12 - Stream Computing,
including video.
- February 17 - Cache Coherence,
including video
and 4-up notes.
- February 19 - Memory Consistency,
including video
and 4-up notes.
- February 24 - Coherence Protocols,
including video
and 4-up notes.
- February 26 - Coherence Protocols (cont.),
including video
and 4-up notes.
- March 3 - Shared Memeory Synchronization,
including video
and 4-up notes.
- March 5 - Locks and Barriers,
including video
and 4-up notes.
- March 17 - AMD Opteron,
including video.
- March 17 - Midterm Review,
including video
and 4-up notes.
- March 19 - Xbox 360,
including video.
- March 26 - Real Caches,
including video
and 4-up notes.
- April 9 - Memory Consistency (cont.),
including video
and 4-up notes.
- April 14 - Cell Processor,
Cell Interconnect, and
IBM Blue Gene,
including video.
- April 16 - Scaling,
including video
and 4-up notes.
- April 23 - Final Review,
including video
and 4-up notes.
Examinations
There will be one exam during the semester plus a final.
The midterm exam was March 24.
All exams are open book, open notes. Calculators are allowed, laptop computers
are not.
Last modified 23 Apr 2009.
Return to Roger's home page.
Roger Chamberlain <roger AT wustl.edu>