## Extension 1: Pascal's Triangle (3 points):

Authors
• Robert Sedgewick (book)
Take a look at the Wikipedia page on Pascal's Triangle. We will implement this using a two-dimensional array, so it may be easier to imagine the entries left-justified rather than in triangular form:
```        column
0  1  2  3  4
row
0       1
1       1  1
2       1  2  1
3       1  3  3  1
4       1  4  6  4  1
.
.
.
```
Viewed as a two-dimensional array, the computation is specified as follows, for each entry at row r and column c:
• If c is 0, then the entry is 1.
• If c==r, then the entry is 1.
• If r<0 or c<0 or c>r, then the entry is 0 and is not shown.
• Everywhere else, the entry is computed as the sum of the entries at:
• r-1,c
• r-1,c-1

#### Procedure

1. Open the PascalsTriangle class, found in the arrays package of the extensions folder.
2. Insert code to obtain from the user the value N which is the number of rows you should compute of the triangle.
3. Instantiate the two-dimensional array needed to hold the results.
Java lets you do this as a ragged array, where each row can be of a different length.

You are welcome to instantiate the array that way, but it is simpler and equally acceptable to instantiate the array with the same number of columns in each row. Java syntax for that is much simpler. An example of this appears in SelfAvoidingWalk, found in the book.ch1 package of the book source folder in your repositories.

4. Compute the triangle as a two-dimensional array and print the results left-justified as shown above.
5. For extra fun, try to print the triangle centered as shown on the Wikipedia page.
When you done with this extension, you must be cleared by the TA to receive credit.
• Fill in the form below with the relevant information
• Have a TA check your work
• The TA should check your work and then fill in his or her name
• Click OK while the TA watches
• If you request propagation, it does not happen immediately, but should be posted in the next day or so

This demo box is for extension 3.1
 Last name WUSTL Key Propagate? (or your numeric ID) Do not propagate e.g. Smith j.smith 1 Copy from 1 to all others 2 Copy from 2 to all others

## Extension 2: Birthday Problem (5 points):

A big part of Computer Science revolves around finding ways to organize and analyze statistical data. This often includes writing algorithms to sort the information in a useful manner. There are many different algorithms that computer scientists use to sort information. You can learn about a few popular ones below:

Before proceeding, you might think a bit about the above statistics and hypothesize what values you will see in your simulation.

#### Procedure

To make things simple, let's assume that all months have 31 days.
1. Open the Birthday class, found in the arrays package of the extensions folder.
2. Insert code to obtain from the user the number of people N that will enter the room.
3. For each person, randomly generate a month and day on which that person was born.
4. Using a two-dimensional array that is indexed by month and by day, keep track of the number of people born on that month and day.
For example, perhaps you randomly decide the person was born in month 4 and day 28.
5. After processing all of the N people, iterate over your array to compute:
• For each month, the fraction (or percentage) of people born in the that month.
• The average of the 12 values you computed for the above.
• For each day, the fraction (or percentage) of people born on that day, whether in the same or in a different month.
• The average of the 31 values you computed for the above.
• The fraction (or percentage) of people born on exactly the same day.
Be careful! A 1 entry means just one person was born on that particular day. You are looking for the fraction of people who share a birthday with at least one other person.
When you done with this extension, you must be cleared by the TA to receive credit.
• Fill in the form below with the relevant information
• Have a TA check your work
• The TA should check your work and then fill in his or her name
• Click OK while the TA watches
• If you request propagation, it does not happen immediately, but should be posted in the next day or so

This demo box is for extension 3.2
 Last name WUSTL Key Propagate? (or your numeric ID) Do not propagate e.g. Smith j.smith 1 Copy from 1 to all others 2 Copy from 2 to all others

## Extension 3: GC Content (2 points):

Authors
Computer science plays a large role in modern biology. BioJava is a framework for working with biological data. It provides a framework for using sequences and performing common biology functions. In this extension you will be reading in a DNA sequence and computing its GC-Content.

#### Notes

• You can find the Java files for this extension in the extensions source folder.
Locate the following files in the biojavagc package:
• GCContent.java is the file you complete. When you run it, analysis is performed on four genomes and the results are printed as output.
• GCContentTest.java is a unit test for your work. You will receive credit for this extension only if this test passes.
• You can do this extension by following the instructions below. However, you might want to take some time to familiarize yourself with the BioJava website. Of special importance is the BioJava Cookbook. The cookbook has information and examples for performing common tasks. You may also find the BioJava API helpful.

• You will be working with some DNA sequences, represented in the FASTA format, which you can find in the genomes folder of your workspace:
• NC_002677.fasta
• mysterya.fasta
• mysteryb.fasta
• mysteryc.fasta
As you might guess from their names, there is an air of mystery surrounding some of these genomes.

Consider a bacterium and a phage that might be hosted by that bacterium. It turns out that the DNA of the host and of the phage often need to be similar in terms of their GC content for the bacterium to play host to the phage:

Most bacteriophage and other bacteria are lower in GC content than Salmonella and its relatives, so invading DNA is an obvious target for H-NS. "It's like a primitive immune system," says Fang. "Reduce their expression, and the foreign genes can be tolerated."
From http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2064161/
In other words, if the GC content in the bacterium and phage is dissimilar, the bacterium may be immune to infection by the phage.

#### Procedure

1. Complete the method percentGC in GCContent.
Note: BioJava already has a function to get the GC-count (number of G and C nucleotides) of a sequence.

Do not use this function.

Instead, devise a way to iterate through the array of characters (representing nucleotides) yourself.

2. Run the unit test GCContentTest and make sure it passes.
Most students' code fails when given a string with no DNA in it whatsover (an empty string or "". Its GC content is 0%.
3. Run GCContent as a Java Application and be prepared to answer questions by the TA at the demo.
When you done with this extension, you must be cleared by the TA to receive credit.
• Fill in the form below with the relevant information
• Have a TA check your work
• The TA should check your work and then fill in his or her name
• Click OK while the TA watches
• If you request propagation, it does not happen immediately, but should be posted in the next day or so

This demo box is for extension 3.3
 Last name WUSTL Key Propagate? (or your numeric ID) Do not propagate e.g. Smith j.smith 1 Copy from 1 to all others 2 Copy from 2 to all others

## Extension 4: Reading Frame of a Sequence (10 points):

Authors
Sequences of nucleotides can be read as sequences of nucleotide triplets, known as codons. This means there are 3 different ways of reading a nucleotide sequence. One starts with the first nucleotide in the sequence, one starts with the second, and one starting with the third, as shown below:

Image from https://wikispaces.psu.edu/display/Biol230WFall09/Protein+Translation

• The sequences shown above are RNA. The corresponding DNA sequences would have a T everywhere you see a U.
• Reading frame 1 begins at the first nucleotide. Interpreting AGT using the genetic code, Serine would be generated as the first amino acid of the resulting protein, if this is the proper reading frame.
• However, if the reading frame should start one base later, then the first DNA triple is GTC, which encodes Valine.
• Finally, if the reading frame should start two bases later, then the first DNA triple is TCT, which encodes Serine. While this protein begins with the same amino acid as in reading frame 1, the next amino acid is different, as shown in the above figure.
• There is no other reading frame of interest. If we considered a reading frame that begins at the fourth nucleotide, that is the same as if the frame began at the first nucleotide.

• Important: In the picture above, the reading frames are denoted 1, 2, or 3. Keeping with computer science custom, you must compute these as 0, 1, or 2. The unit test will not work if you fail to follow this convention!
How do we determine which reading frame is correct? Our approach is based on the following observations, which are somewhat simplified to suit our purposes. The interested student is encouraged to take Bio 2960 for a much more thorough and revealing treatment of this material.
• Protein translation is initiated only by a special sequence of 3 nucleotides called a start codon.
• Translation continues until a stop codon is encountered, which is also a specially designated sequence of 3 DNA nucleotides.
Note that it is the first stop codon that ends the translation. While the translated region could contain other start codons, which translate as the amino acid methionine, any stop codon encounered after the start codon ends the translation process.
• If we scan random DNA, we expect each of the four bases (A, T, C, G) to occur with probability 1 in 4.
• If we look for two specific bases, one after the other, in random DNA, scanning pairs a time, we expect the two bases to occur with probability 1 in 16 (1 in 4 times 1 in 4).
• If we scan DNA looking for a specific triplet, moving 3 bases each time we look, we expect that given triplet to occur with probability 1 in 64 (1 in 4 times 1 in 4 times 1 in 4).

For example, we expect the start codon to occur with probability 1 in 64.

• Because there are three possible stop codons, the probability in random DNA that a given triplet is one of the three stop codons is 3 in 64, which is approximately 1 in 20.
• This means that if DNA is random, a protein sequence would be about 20 amino acids in length, once its start codon is found.
• It turns out that proteins are much longer than that, say at least 60 amino acids in length.
• If we start in the wrong reading frame, the DNA should appear to us as random, and we should obtain very short amino acid chains by translation.
Thus, the best reading frame for a sequence of DNA is the offset at which translation would produce the longest chain of amino acids. The DNA sequence that produces that chain begins with the start codon. The start codon and all subsequent codons are interpreted in groups of 3 bases. The sequence contains no stop codon until the end of the translated region. In other words, the translated region may contain multiple start codons in the middle (which are translated as the amino acid methionine), but the region contains no stop codons in the middle.

Your task in this extension is to analyze a sequence of DNA to determine its best reading frame.

#### Notes

• You can find the Java files for this extension in the extensions source folder.
Locate the following files in the biofindframe package:
• FindTheFrame.java is the file you complete.
• FindTheFrameTest.java is a unit test for your work. You will receive credit for this extension if this test (usually) passes.
• You will be working with some DNA sequences, represented in the FASTA format, which you can find in the genomes folder of your workspace.

#### Procedure

1. The code given to you prompts the user for a genome DNA file, and then reads that file into a char array using methods provided by bioJava.

If you are uncertain about char arrays, this would be a good time to review the relevant material in your text and in the lecture notes.

2. Your work takes place in the method bestReadingFrame. Open that file now and take a look at what is provided.

There are comments in that file to direct your work, which consists of the steps described below.

You should read these instructions and the comments carefully before you begin. Solutions that fail to follow the instructions will receive no credit!
3. First, define 3 char arrays, one for each of the possible stop codons: ochre, amber, and opal.
4. Next, define a char array named methionine for the start codon.
The most common start codon is Methionine, which is encoded by the DNA sequence AUG. For the purposes of this extension, Methionine is the only start codon you will consider.

5. The rest of the code you write will attempt to read the DNA in each of the possible 3 reading frames. Find the best reading frame and return the index at which that best reading frame occurs. Thus, your method will return a value as follows:
• 0, if reading frame 1 is the right answer. That frame begins at index 0 of the dna array you are given.
• 1, if reading frame 2 is the right answer. That frame begins at index 1 of the dna array you are given.
• 2, if reading frame 3 is the right answer. That frame begins at index 2 of the dna array you are given.
• -1 if no reading frame exists at all
Hint: Scan the DNA one base a time, looking for the longest coding sequence that begins at that base. A coding sequence begins with a start codon and ends at the next stop codon that is found by scanning triplets.

Based on the index at which the longest coding sequence occurs, compute the corresponding reading frame (0, 1, or 2). That is the value you want to return.

When you done with this extension, you must be cleared by the TA to receive credit.
• Fill in the form below with the relevant information
• Have a TA check your work
• The TA should check your work and then fill in his or her name
• Click OK while the TA watches
• If you request propagation, it does not happen immediately, but should be posted in the next day or so

This demo box is for extension 3.4
 Last name WUSTL Key Propagate? (or your numeric ID) Do not propagate e.g. Smith j.smith 1 Copy from 1 to all others 2 Copy from 2 to all others

## Extension 5: Data Sorting and Analysis (0 points):

Authors
• Brennan Morell
• Jarett Gross
• 2
A big part of Computer Science revolves around finding ways to sort data to be organized in a useful manner. There are many different algorithms that computer scientists use to sort information. You can learn about a few popular ones below:
For the purposes of this assignment, we will implement our sorting using a naive process of repeatedly selecting the smallest value from a data set of type int[], and swapping it to the front of the collection. In computer science, this is called a Selection Sort .

#### Procedure

1. Open the Sorting class, found in the arrays package of the extensions folder.
2. Prompt the user for the size of the collection
If the user enters a negative number, continue to prompt them with a useful message until they enter a positive number.
3. Continue to prompt the user to input numbers one at a time, using an array to store the input data.
4. After processing all of the numbers entered, the data will be sorted using the following naive algorithm:
• Create an int sortCount to keep track of the number of times you've scanned the data
• Create an outer loop to limit the number of scans to be equal to the size of the collection (while(sortCount < size))
• Create an inner loop to iterate over the unsorted portion of the array, each time selecting the minimum value (keep track of this value and its index in the array)
The unsorted portion will always be the sub-array with indexes sortCount to size inclusive.
• Swap this minimum value with the value currently at the head of the unsorted portion of the array
• At this point, sortCount should be incremented to reflect the new unsorted portion of the array.
5. After the sorting the data, we will take advantage of its useful organization to compute the following statistics:
Mean
The simple average of the data.
Median
The middle value in the ordered dataset (Note: How does this computation vary for even and odd sized datasets?)
Min
The smallest value in the ordered dataset.
Max
The largest value in the ordered dataset.
Range
The difference between the maximum and minimum values of the data set.
6. Finally, arrange for your output to display the sorted dataset and accompanying statistics in the following manner:
1 2 3 4 5
Mean: 3.0
Median: 3.0
Min: 1
Max: 5
Range: 4
When you done with this extension, you must be cleared by the TA to receive credit.
• Fill in the form below with the relevant information
• Have a TA check your work
• The TA should check your work and then fill in his or her name
• Click OK while the TA watches
• If you request propagation, it does not happen immediately, but should be posted in the next day or so

This demo box is for extension 3.5
 Last name WUSTL Key Propagate? (or your numeric ID) Do not propagate e.g. Smith j.smith 1 Copy from 1 to all others 2 Copy from 2 to all others

## Extension 6: Blackjack! (0 points):

Authors
• Nathan Vogt
In this assignment, we will create the game blackjack. To simplify the code, assume that each player is dealt one card at a time and aces are always worth 11.

• Loops
• Arrays
• Math.round

1. Prompt the user for the number of players
2. Create a new array with size of the number of players
3. Create a loop that iterates through each player

Now we have our array of size players and our loop. We will store the score or count of each player in the array. The loop follows the game mechanics of Blackjack that are explained in wiki. Now we must think of how to randomly draw a card.

Remember that cards 2-9 will count as their number
ex. 2 diamonds = 2
7 hearts = 7
10-King all count as 10
Aces count as 11
We have 3 different scenarios that all have a different probability of occurring. How do we perform a random operation in java? (See a TA if you need help)

1. Create the game mechanics inside the loop
2. Store each player’s data in the array from task 1
3. Verify that the array stores each player’s count correctly
4. Verify that the outcomes have realistic values
Ex. Very unlikely that each player would draw an ace (have a count of 11)
5. Recommend to check with TA before moving on

In Blackjack, each player is allowed to hit or draw an extra card. Create this mechanic in the game mechanic loop. Hint: Recommend using a while loop for this task

Task 4 If a player’s count exceeds 21, that hand is known as a bust.

1. Which players bust?
2. Which player wins the round? (player closest to 21 without going over)
When you done with this extension, you must be cleared by the TA to receive credit.
• Fill in the form below with the relevant information
• Have a TA check your work
• The TA should check your work and then fill in his or her name
• Click OK while the TA watches
• If you request propagation, it does not happen immediately, but should be posted in the next day or so

This demo box is for extension 3.6
 Last name WUSTL Key Propagate? (or your numeric ID) Do not propagate e.g. Smith j.smith 1 Copy from 1 to all others 2 Copy from 2 to all others

## Extension 100: Pascal's Triangle (C++) (5 points):

Authors
• Dan Nolan
Issues: This extension is not available for any credit.
Take a look at the Wikipedia page on Pascal's Triangle. We will implement this using a two-dimensional array, so it may be easier to imagine the entries left-justified rather than in triangular form:
```        column
0  1  2  3  4
row
0       1
1       1  1
2       1  2  1
3       1  3  3  1
4       1  4  6  4  1
.
.
.
```
Viewed as a two-dimensional array, the computation is specified as follows, for each entry at row r and column c:
• If c is 0, then the entry is 1.
• If c==r, then the entry is 1.
• If r<0 or c<0 or c>r, then the entry is 0 and is not shown.
• Everywhere else, the entry is computed as the sum of the entries at:
• r-1,c
• r-1,c-1

#### Procedure

1. Create the pascals source file, in the arrays package of the extensions folder.
2. Insert code to obtain from the user the value N which is the number of rows you should compute of the triangle.
3. Instantiate the two-dimensional array needed to hold the results.
C++ will not let you do this as a ragged array, unlike java, where each row can be of a different length. You will have to find a way to create filler values that are ignored when the triangle is printed.

It is simpler to instantiate the array with the same number of columns in each row.

4. Compute the triangle as a two-dimensional array and print the results left-justified as shown above.
5. For extra fun, try to print the triangle centered as shown on the Wikipedia page.
When you done with this extension, you must be cleared by the TA to receive credit.
• Fill in the form below with the relevant information
• Have a TA check your work
• The TA should check your work and then fill in his or her name
• Click OK while the TA watches
• If you request propagation, it does not happen immediately, but should be posted in the next day or so

This demo box is for extension 3.100
 Last name WUSTL Key Propagate? (or your numeric ID) Do not propagate e.g. Smith j.smith 1 Copy from 1 to all others 2 Copy from 2 to all others

## Extension 200: Birthday Problem (C++) (5 points):

Authors
• Dan Nolan
Issues: This extension is not available for any credit.
Some number of people, say N, wander into a room. We are interested in computing the fraction of those people who are:
• are born in the same month
• are born on the same day, in the same or in different months
• are born on the exact same day (the same day in the same month)
Before proceeding, you might think a bit about the above statistics and hypothesize what values you will see in your simulation.

#### Procedure

To make things simple, let's assume that all months have 31 days.
1. Create the Birthday source file, in the arrays package of the extensions folder.
2. Insert code to obtain from the user the number of people N that will enter the room.
3. For each person, randomly generate a month and day on which that person was born.
4. Using a two-dimensional array that is indexed by month and by day, keep track of the number of people born on that month and day.
For example, perhaps you randomly decide the person was born in month 4 and day 28.
5. After processing all of the N people, iterate over your array to compute:
• For each month, the fraction (or percentage) of people born in the that month.
• The average of the 12 values you computed for the above.
• For each day, the fraction (or percentage) of people born on that day, whether in the same or in a different month.
• The average of the 31 values you computed for the above.
• The fraction (or percentage) of people born on exactly the same day.
Be careful! A 1 entry means just one person was born on that particular day. You are looking for the fraction of people who share a birthday with at least one other person.
When you done with this extension, you must be cleared by the TA to receive credit.
• Fill in the form below with the relevant information
• Have a TA check your work
• The TA should check your work and then fill in his or her name
• Click OK while the TA watches
• If you request propagation, it does not happen immediately, but should be posted in the next day or so

This demo box is for extension 3.200
 Last name WUSTL Key Propagate? (or your numeric ID) Do not propagate e.g. Smith j.smith 1 Copy from 1 to all others 2 Copy from 2 to all others

## Extension 300: GC Content (C++) (4 points):

Authors
• Dan Nolan
Issues: This extension is not available for any credit.
Computer science plays a large role in modern biology. In this extension you will be reading in a DNA sequence and computing its GC-Content.

#### Notes

• You can find the C++ files for this extension in the extensions source folder. In this extension we will introduce header files, a concept not found in Java. If you already know what a header file is, you can continue as normal. If you don't, then more information can be found here for a primer on headers, includes, and why we need them.
Locate the following files in the fastaReader directory:
• fastaReader.h This header, combined with the source file, has been provided for you in order to allow you to use the .fasta files.
• fastaReader.cpp This file has only one function readFasta which takes a filename and a string, the file is read into the string and returned to you for use.
• cgContent.cpp This is where your extension code and testing goes.

• You will be working with some DNA sequences, represented in the FASTA format, which you can find in the genomes folder of your workspace:
• NC_002677.fasta
• mysterya.fasta
• mysteryb.fasta
• mysteryc.fasta
As you might guess from their names, there is an air of mystery surrounding some of these genomes.

Consider a bacterium and a phage that might be hosted by that bacterium. It turns out that the DNA of the host and of the phage often need to be similar in terms of their GC content for the bacterium to play host to the phage:

Most bacteriophage and other bacteria are lower in GC content than Salmonella and its relatives, so invading DNA is an obvious target for H-NS. "It's like a primitive immune system," says Fang. "Reduce their expression, and the foreign genes can be tolerated."
From http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2064161/
In other words, if the GC content in the bacterium and phage is dissimilar, the bacterium may be immune to infection by the phage.

#### Procedure

1. Complete the method gcContent.cpp
2. Write a test to prove to us that your methods work.
3. Run these tests.
4. Be prepared to answer questions by the TA at the demo.
When you done with this extension, you must be cleared by the TA to receive credit.
• Fill in the form below with the relevant information
• Have a TA check your work
• The TA should check your work and then fill in his or her name
• Click OK while the TA watches
• If you request propagation, it does not happen immediately, but should be posted in the next day or so

This demo box is for extension 3.300
 Last name WUSTL Key Propagate? (or your numeric ID) Do not propagate e.g. Smith j.smith 1 Copy from 1 to all others 2 Copy from 2 to all others

## Extension 400: Reading Frame of a Sequence (C++) (8 points):

Authors
• Dan Nolan
Issues: This extension is not available for any credit.
Sequences of nucleotides can be read as sequences of nucleotide triplets, known as codons. This means there are 3 different ways of reading a nucleotide sequence. One starts with the first nucleotide in the sequence, one starts with the second, and one starting with the third, as shown below:

Image from https://wikispaces.psu.edu/display/Biol230WFall09/Protein+Translation

• The sequences shown above are RNA. The corresponding DNA sequences would have a T everywhere you see a U.
• Reading frame 1 begins at the first nucleotide. Interpreting AGT using the genetic code, Serine would be generated as the first amino acid of the resulting protein, if this is the proper reading frame.
• However, if the reading frame should start one base later, then the first DNA triple is GTC, which encodes Valine.
• Finally, if the reading frame should start two bases later, then the first DNA triple is TCT, which encodes Serine. While this protein begins with the same amino acid as in reading frame 1, the next amino acid is different, as shown in the above figure.
• There is no other reading frame of interest. If we considered a reading frame that begins at the fourth nucleotide, that is the same as if the frame began at the first nucleotide.

• Important: In the picture above, the reading frames are denoted 1, 2, or 3. Keeping with computer science custom, you must compute these as 0, 1, or 2. The unit test will not work if you fail to follow this convention!
How do we determine which reading frame is correct? Our approach is based on the following observations, which are somewhat simplified to suit our purposes. The interested student is encouraged to take Bio 2960 for a much more thorough and revealing treatment of this material.
• Protein translation is initiated only by a special sequence of 3 nucleotides called a start codon.
• Translation continues until a stop codon is encountered, which is also a specially designated sequence of 3 DNA nucleotides.
Note that it is the first stop codon that ends the translation. While the translated region could contain other start codons, which translate as the amino acid methionine, any stop codon encounered after the start codon ends the translation process.
• If we scan random DNA, we expect each of the four bases (A, T, C, G) to occur with probability 1 in 4.
• If we look for two specific bases, one after the other, in random DNA, scanning pairs a time, we expect the two bases to occur with probability 1 in 16 (1 in 4 times 1 in 4).
• If we scan DNA looking for a specific triplet, moving 3 bases each time we look, we expect that given triplet to occur with probability 1 in 64 (1 in 4 times 1 in 4 times 1 in 4).

For example, we expect the start codon to occur with probability 1 in 64.

• Because there are three possible stop codons, the probability in random DNA that a given triplet is one of the three stop codons is 3 in 64, which is approximately 1 in 20.
• This means that if DNA is random, a protein sequence would be about 20 amino acids in length, once its start codon is found.
• It turns out that proteins are much longer than that, say at least 60 amino acids in length.
• If we start in the wrong reading frame, the DNA should appear to us as random, and we should obtain very short amino acid chains by translation.
Thus, the best reading frame for a sequence of DNA is the offset at which translation would produce the longest chain of amino acids. The DNA sequence that produces that chain begins with the start codon. The start codon and all subsequent codons are interpreted in groups of 3 bases. The sequence contains no stop codon until the end of the translated region. In other words, the translated region may contain multiple start codons in the middle (which are translated as the amino acid methionine), but the region contains no stop codons in the middle.

Your task in this extension is to analyze a sequence of DNA to determine its best reading frame.

#### Notes

• You can find the C++ files for this extension in the extensions source folder.
Locate the following files in the biofindframe folder:
• FindTheFrame.cpp is the file you complete.
• You will be working with some DNA sequences, represented in the FASTA format, which you can find in the genomes folder of your workspace.

#### Procedure

1. The code given to you prompts the user for a genome DNA file, and then reads that file into a std::string.

If you are uncertain about strings in C++, this would be a good time to review as they are very similar to Java. Open that file now and take a look at what is provided.

There are comments in that file to direct your work, which consists of the steps described below.

You should read these instructions and the comments carefully before you begin. Solutions that fail to follow the instructions will receive no credit!
2. First, define 3 char arrays, one for each of the possible stop codons: ochre, amber, and opal.
3. Next, define a char array named methionine for the start codon.
The most common start codon is Methionine, which is encoded by the DNA sequence AUG. For the purposes of this extension, Methionine is the only start codon you will consider.

4. The rest of the code you write will attempt to read the DNA in each of the possible 3 reading frames. Find the best reading frame and return the index at which that best reading frame occurs. Thus, your method will return a value as follows:
• 0, if reading frame 1 is the right answer. That frame begins at index 0 of the dna array you are given.
• 1, if reading frame 2 is the right answer. That frame begins at index 1 of the dna array you are given.
• 2, if reading frame 3 is the right answer. That frame begins at index 2 of the dna array you are given.
• -1 if no reading frame exists at all
Hint: Scan the DNA one base a time, looking for the longest coding sequence that begins at that base. A coding sequence begins with a start codon and ends at the next stop codon that is found by scanning triplets.

Based on the index at which the longest coding sequence occurs, compute the corresponding reading frame (0, 1, or 2). That is the value you want to return.

When you done with this extension, you must be cleared by the TA to receive credit.