## Racing Arrays

Warning: This studio is too long to be finished in the studio time you have today. Please get as far as you can, but going sufficiently slowly that all group members understand what is happening.

Then, on your own, after studio is over, please continue working on the studio. This material is covered on Exam I, so be sure you get to the end of studio work on your own, if necessary, and feel free to work collaboratively on all elements of this studio.

We will propagate code as you direct us in the green box, and the propagation will be done the day after studio.

Authors:
• Jacob Frank
• Ryan Smith
• Yonatan David
• Tim Heyer
• Ron Cytron

Abstract: Arrays are a fundamental and useful data type. Once an array is allocated, it cannot change size. Nonetheless, we can build lists and other data structures using arrays if we are willing to replace an old, full array with a bigger one.

In this assignment, you study various ways to formulate a bigger array to replace one that has no room left. The point of this work is to understand that the choices you make matter greatly in terms of performance.

What you should learn through this assignment:
• You can use our course infrastructure to measure ticks and time.
• Ticks are a reliable way to measure the number of operations in a computation, but you must insert instrumentation (Java code) to count ticks into our code.
• Time is a more realistic measurement, but it varies between computers.
Also, because computers are so fast, the time of a relatively brief computation may register as 0, even though the number of ticks is positive.
• Allocation of a n integers (or boolean, objects, etc.) does not take constant time. It takes time proportional to n.

In studio this semester, you will be supplying answers to questions posed here. Such questions appear in boxes like this one.

You are in a group, but only one of you needs to open and complete the text file. When you demo, you can specify whose repository contains the write up, and we will propagate code and write ups to the other group members.

Ideally, the write up is on your shared monitor if you have one.

So, one of you, please find and open the studio0.txt file in the studiowriteups folder.

## A. Time and Ticks

In this course, we are interested in the time taken to execute an algorithm. We will analyze algorithms mathematically and empirically, but how do we measure time? We will be using two measurements:
Time
It seems most natural to measure time using time. However, on modern computers, most computations are so fast that their time is difficult to measure with any accuracy. But a prof can dream, so we will try to measure time.
Ticks
We can count operations in your program, using a counter that you advance as appropriate in your code. This assignment investigates how you can do that. You will be using this technique throughout this course.
Your computer is busy doing many things. When you run a program on it, that program competes with other programs for your computer's attention.

The most widely available timer that we can use measures time in milliseconds, or one thousandth of a second. If your computer's clock runs at 3 GHz (billions of cycles per second), then in one millisecond, your computer can execute 3 million instructions. Thus, as much as we would like to measure the time of relatively short programs, unless that short program does at least 3 million things, it won't even take a measurable amount of time.

Let's investigate this.

• In the coursesupport source folder, find and open the timing.examples package and find and open the Linear class.
• Take a look at the code, in particular the run() method:
	public void run() {
for (int i=0; i < n; ++i) {
ticker.tick();
this.value = this.value + i;
}
}

• The statement inside the loop perform a simple addition, storing the result in the value instance variable.
• Also, the call to ticker.tick() advances the tick counter by one.
• If you run this program, you will notice that it runs quickly, using values for n that range from 10000 to 90000 in steps of 10000.
The particular values of n that are used in the run() method above were generated in the main method by:
		GenSizes sizes = GenSizes.arithmetic(10000, 100000, 10000);

which generates an arithmetic sequence for n, starting at 10,000, up to but not including 100,000, in steps of 10,000. The result is then used by:
		ExecuteAlgorithm.timeAlgorithm(
"linear",
"timing.examples.Linear",
new IntArrayGenerator(),
sizes
);


which runs the Linear class on the specified sizes, with values for n of 10000, 20000, 30000, 40000, 50000, 60000, 70000, 80000, and 90000.

Throughout this course, and in the text as well, you will see n as the variable we use to describe the size of an input. It's generally a better practice to name a variable such that it corresponds more closely to how it is used. So, while size would have been a better variable name, we use n so that you become accustomed to thinking of the size of an input as n.

For example, we will consider in this course:

• Sorting n numbers
• Adding to the end of a linked list that has n items in it
• Finding the shortest path from one intersection to another in a map that has n street segments
• Operations on a set containing n elements
• The output in the console window shows the size, ticks, and time taken for each sample of the size variable.
• There is an outputs folder in your repository, but eclipse does not know that the run of the Linear class wrote files into that folder.
To see what's there, you can right (control) click on the outputs folder and select Refresh.
• Once you see the files there, double-click on linear-ticks.csv, which should cause the file to open in excel. What you see might look like this:
These values were generated by your run of Linear.
• Select the data including the columm headings, and use excel to create a marked scatter chart. Get help from the TA if you need it. If you don't know how to do this in Excel, here is some advice:
1. Highlight all data including labels
2. Go to the Insert tab on the Excel toolbar
3. In the Charts header, click on Scatter Plot and select Scatter with Straight Lines and Markers
• The legend may flip the axes by default, in which the legend names will appear as Series1-5
• To resolve this go to the Chart Tools tab and select Switch Row/Column
• You should see a plot that looks like this:
• The plot you see shows how many times ticker.tick() was called in the run of your Linear program for each value of n.
The plot depicts the running time of Linear as a function of the input paramter n. Not surprisingly, the plot shows a linear relationship between the input parameter and the number of times ticker.tick() was called.
• Examine the code and convince yourselves that the cost of executing
			this.value = this.value + i;

n times is indeed a linear function of n.
which implies that the cost of executing the sum and assignment to this.value takes 1 tick (or, constant time)
• There is another file in the outputs folder, namely linear-time.csv. Open it and plot the values the same way we plotted the tick counts.
• What do you see?
You should see values at or near 0, because the cost of running this program is so low, it cannot be measured in milliseconds.
• What can you do to increase the runtime of the program, so that it shows up in the timings?
Modify the line we saw earlier so that much larger values are generated for n:
		GenSizes sizes = GenSizes.arithmetic(1000000, 10000000, 1000000);

so that the sizes that are used are sufficiently large to generate positive times.

If you are unsure of what each parameter means, mouse over the arithmetic method name in eclipse and the JavaDoc will be shown to you.

• Are you able to find values that generate a nice linear plot of time? Discuss what you find with your TA.
You may notice in the console output that the Linear program is run several times with the same input value. Each time we run for a given size (value of n), the ticker count should be the same. However, because of other things running on your computer, the time may vary.

The code we provide for you runs your run() method several times, and then takes the fastest time found among those runs as the best indicator of the time taken by your program.

Let's try this same thing with a different toy algorithm:

• Find and open the Quadratic class in the timing.examples package in the coursesupport source folder.
• Be sure to refresh the eclipse view of the outputs folder, as you did above.
• Find, open, and create a plot of the data you find in both quadratic-ticks.csv and quadratic-time.csv.
Question A1: What do you see in both plots? Are there any differences between the two? What could account for those differences?

Question A2: Why do the times provided for Quadratic produce such a nice plot, while the original values of Linear did not?

• Look at the run() method of Quadratic, and notice that ticker.tick() is called in just one spot:
	public void run() {
for (int i=0; i < n; ++i) {
for (int j=0; j < n; ++j) {
ticker.tick();
this.value = this.value + i;
}
}
}

• Each ticker.tick() call models the cost of one operation. In the extreme, you might have arrived at the code shown below, with a ticker.tick() call for each operation performed in your code, as shown in the comments:
	public void run() {
ticker.tick();                      // i = 0
for (int i=0; i < n; ++i) {
ticker.tick();                  // i < n
ticker.tick();                      // j = 0
for (int j=0; j < n; ++j) {
ticker.tick();                  // j < n
//
// original one below
//
this.value = this.value + i;
ticker.tick();                  // ++j
}
ticker.tick();                  // ++i
}
}

• Experiment now by different placements of ticker.tick() throughout the run() method, and run Quadratic to see how the ticks and time change.
Question A3: From the runs you have tried so far, how does the placement of ticker.tick() calls affect the plots you see? In particular, do the changes affect the shapes of the curves, the values plotted, or both?
• Does it matter if you have extra ticker.tick() calls? In particular:
Question A4: In terms of n, how would you characterize, in the most simple terms, the time and ticks curves that have been generated so far?

Question A5: What would happen if you deleted all ticker.tick() calls in the innermost loop, while leaving other calls to ticker.tick() that you just placed in run()?

Throughout this course, you will be asked to place ticker.tick() calls in your code. Placing them everywhere seems like the right thing to do, so that every operation is counted, as shown in the box above.

You could put them everywhere, and the results will be right, but isn't that tedious? Did your added ticker.tick() statements affect the shape of the resulting curves that show ticks or time?

How do you decide where to put them most parsimoniously?

We will soon speak of this as analyzing an algorithm's asymptotic complexity:

• From a practical point of view, it's the part of your algorithm whose time dominates any curve you could generate that reflects the time or ticks taken by your algorithm.
• We will say that constant differences in time really don't matter: they could attributed, for example, to the difference in speed between one computer and another.
• We will care about the shape of the curves that correspond to ticks and time, and we will reason about those curves in terms of their asympototic behavior.

For now, you have to look at the code and reason about where it spends most of its time, as a function of n.

For the run() method we have been considering, most of the time must be spent in the innermost loop, where we have:

			this.value = this.value + i;

That is why placing just one ticker.tick() there creates operation counts that adequately model the time spent by the entire run() method.

## B. Some statements take constant time, some don't

When we look at a line of Java code above, for example:

   this.value = this.value + i;

we want to reason about how much time it takes to execute that line of code. Our model so far, in terms of ticker.tick() is that it takes one tick, or operation, for such a line of code to execute.

But is this true for all statements? Surely not. For example, the single line of code:

   Arrays.sort(array);

would sort an array of values, which surely takes more than one operation. In fact, for an array of n values, it would take at least n operations to show the results of the sorted array.

Let's investigate this for allocating an array of integers:

• In the studios source folder, find and open the AddsTwoNumbers class in the studio0.allocates package.
• Run AddsTwoNumbers, refresh the outputs folder, and open and plot the time and ticks shown in the and files.
Question B1: What do you see? How do the curves reflect the code inside AddsTwoNumbers?

Do you think the value of n matters here in terms of the time it takes to perform the operation? Why or why not?

• Now find, open, and run the Allocates class, which generates ticks and time data for allocating an array of n integers, with n varying from 0 to 10000.
• Refresh the outputs folder and then open the allocates-time.csv file. Plot the runtime against the value of n.
Question B2: What do the data and plot tell you about the time it takes to allocate an array of n integers?

Is it reasonable to say that the line of code

     this.array = new int[this.n]

takes a constant amount of time, independent of the value of this.n?
• Now open and plot the data in allocates-ticks.csv.
Question B3: Do the ticks agree in shape with the time we measured in running the Allocates code?
• There is another call you can make on a ticker which is
   ticker.tick(n);

The above call causes the ticker to register n ticks instead of just 1.
• Modify the call to ticker in Allocates so that it registers this.n ticks for the new int[this.n] allocation.
• Rerun Allocates and refresh the outputs folder.
• Generate plots for allocates-ticks.csv and allocates-time.csv.
Question B4: Are the plots more similar to each other than before? What does this tell you about how much time it takes to allocate an array of n integers?
So we see (hopefully) that a line of code doesn't always take a constant amount of time. While it was fair to judge
   this.value = this.value + i;

as taking one tick, it is not fair to say that
     this.array = new int[this.n]

takes one tick. Instead we must insist it takes n ticks.

Why does the time depend on n? Java must allocate a chunk of memory for the n integers. While that might take constant time, Java must also initialize that chunk of storage to 0. That cannot take constant time: that cost must depend on n.

Throughout this course, you are asked to reason about the time of a computation. Eventually, you must reason about how much time a given operation or statement takes, in terms of the computation's input size n.

You will be producing timing curves, and we will be looking at your code to ensure you have accounted properly for how time is spent.

• Perhaps the most important message of this course is the following:
How you achieve a particular result—how you implement an algorithm—can have a profound impact on the time necessary to obtain that result.
• To continue with this idea, perform the following exercise in your group:
• One person announces a positive integer n. There is no limit on how large n can be. For example, I might announce the integer five. However, you should pick a larger integer, say between 50 and 75.
• Divide the rest of the group into two teams: the decimal team and the tally mark team.
• The decimal team writes down the announced number in decimal form. For my announced five, they would write 5.
• The tally mark team writes down the announced number in tally mark form. The tally mark form for five is .
Question B5: Which group do you expect to finish first?

Can you formalize, in terms of n the amount of work (ticks) that each group must do to write n in the form required for that group?

Both groups achieve the same result, namely the recording of an integer n.

• Under what circumstances is the decimal notation more efficient than tally marks?
• Under what circumstances is the tally mark notation more efficient than decimal notation? If you are not sure, take a look this.

## C. Growing a List

Let's apply what we have learned to growing an array-based list.

• In your repository, find the Rarrays class in the studio0.growinglist package in the studios source folder.
You will see that this is an abstract class, because the method int getNewSize() is missing. This is intentional, because we want to experiment with various ways of replacing the full array, and we do that by specifying the array's new size.

We provide two extensions of Rarrays for you:

Doubling
causes the new array to be twice the old array's size. This may seem wasteful: a full 1024 cell array would grow to contain 2048 cells just to accommodate, at present, one more cell.
causes the new array to be one item greater than the previous array's size. This is the least we can do to make room for one new element in an already full array.
Each of those classes completes the Rarrays abstract class by providing the missing method int getNewSize().

Take a look.

• Let's look at the code from reset(Ticker):
	public void reset(Ticker ticker) {
this.ticker = ticker;
this.array  = new int[2];
ticker.tick(2);
}

• We save ticker in an instance variable this.ticker so we can use it throughout this class and any of its extensions.
• We start our array with two elements.
• We use the ticker.tick(2) to account for allocating the 2-cell array, recalling from above we model allocating n integers by spending n ticks.
• The run() method attempts to fill the current array, calling replaceArrayWithBiggerOne when the current array no longer has sufficient room.
• We can't make an existing array bigger, but the array instance variable that had two elements can be subsequently reassigned to reference another (perhaps bigger) array.
For example, if for some reason two elements are insufficent for the array, the following would allow the instance variable to reference an 8-element array:
   this.array = new int[8];

However, reassignment of the array loses the reference to the previous array. Any data in that previous array would be lost.

To preserve information as you provision for a larger array, you must

• hold on to both arrays,
• copy the old information into the newer one, and
• then reassign the instance variable.
• Find and look at the comments in the replaceArrayWithBiggerOne method.

Your first task is to complete this method until it passes the TestRarrays unit tests.

The TestRarrays contains 3 test cases:
• testInit should work as given to you.
• testGrowPreservesData ensures that you retain information from the old array as you make the newer, larger one.
• testGrowSufficientTicks ensures that you account properly for the ticks taken for allocating and filling the newer, larger array.

• Work together to write the appropriate code below the comments.
• When your code successfully passes the testGrowPreservesData test, your code is copying data reliably from the old to new array.
• When your code successfully passes the testGrowSufficientTicks test, you are accounting properly for the allocations.
Work on this until you get the green bar, indicating all tests are passing.

## D. How much does it cost to grow the array?

OK we really don't grow the array, but we continually replace it with a larger one. Let's study how the growth rate of the arrays affects the overall cost of Rarrays.

• Run AddOne, which should run without error.
• As before, the output of the run is in your outputs folder, but you won't see anything until you refresh eclipse's view.
• Open the growbyone-ticks.csv and plot the ticks against n.
Question D1: How would you describe the curve you see?

As a team, think about possible polynomial functions that could generate such a curve.

• Repeat the above steps by running Doubling, refreshing your eclipse view, opening doubling-ticks.csv in Excel, and plotting the same data.
• Question D2 Why does your program generate the data you see?
• While the data has a certain disconnected quality to it, let's analyze the plotted data as follows:
• Using any straight edge you can find nearby, see how tightly the straight edge can bound all the points from below.
• Then see if that straight edge can also tightly bound all the points from above.
• Question D3: Describe what you were able to do with the straight edge.
• If you are successful, you see that the points generated by Doubling, while somewhat strange-looking,
• make sense, because the jumps are due to the time spent doubling
• are boundable below by one linear function
• are boundable above by one linear function
You may not be able to complete this part in studio. Take a look at it on your own if necessary.

## E. Mathematical analysis

Now let's look mathematically at how much time is spent growing the list by our two methods: add one, and doubling.

Here, we reach a total of n cells in the array, growing by one cell each time, and copying the old array into the slightly larger new array.

How much time does this take? We copy 1, then 2, then 3, and so on, until we have copied n-1 items to make room for the nth item. If we count the time for allocation, we can extend the sum to n. If T(n) represents the total time taken to achieve an array of n elements by growing one at at time, then

T(n) = 1 + 2 + … n $=\sum _{i=1}^{n}i$
Question E1: What is the closed form solution for T(n)?
Search the web for summation formulas if you need help. You are expected to know such sums in this course.
Does it agree with what you have seen empirically?
Doubling
Here, we reach a total of n cells. Let's assume n=2k for some k, so that we have performed work:
T(n) = 1 + 2 + 4 + … 2k $=\sum _{i=0}^{k}{2}^{i}$
Question E2: What is the closed-form solution for T(n)?

Recalling that n=2k, can you express the result in terms of n?

Based on your analysis and what you have seen, what effect does the growing strategy have on the time necessary to accommodate the array-based list that grew to n items?

## Further exploration

Based on what you have seen emprically and shown mathematically, what do you think would happen if you tried the following strategies for increasing the size of the current array as it fills:
• For one strategy, you grow the array by 10% of its previous size, making sure that the growth adds at least one extra cell to the next array.
• For another strategy, you increase the array by 20 cells each time it fills.

Returning to your code, you will see an OurGrowth1 and OurGrowth2 class. Use those to experiment with these ideas and see what results you get.

• If there is a file for you to complete in studiowriteups, please respond to the questions and supply any requested information.
• You must commit and push all of your work to your repository. It's best to do this from the top-most level of your repository, which bears your name and student ID.