|Assignment 1: Exploring information on the Internet.
One point for each question. The full score for this assignment is 18.
Remember to consult the FAQ page (http://helix-web.stanford.edu/bmi214/faq.html) periodically as it will be updated with answers to common questions.
This assignment is due at 5 pm on April 9,2007. The assignment should be submitted electronically by sending an email to firstname.lastname@example.org. Please only submit .pdf files.
You were recently at the Pacific Symposium on Biocomputing (http://psb.stanford.edu/) on Maui, Hawaii and spent most of the meeting on a whale watching expedition. You have acquired a profound interest in sperm whales (physeter catodon). You are specifically interested in getting data about cytochrome B, a mitochondrial protein involved in the electron transport chain. Luckily, you have access to the internet, and can get all this information in a matter of moments.
How many million total nucleotide bases are in GenBank (the traditional GenBank division) as of August 2006?
How many million total sequence records are in GenBank (the traditional GenBank division) as of August 2006?
Search for the DNA sequence of sperm whale (physeter catodon) cytochrome b (in GenBank) and open the GenBank search result. What is the value of ACCESSION field of the retrieved GenBank file?
(THINGS TO NOTE: We are looking for the entry with complete cds available. "cds", meaning "coding sequence", is part of a standard annotation vocabulary for DNA sequences. There is only one such entry. In this entry there are 1140 bases that code for protein.)
In the record you found in question 3, there is a link to its corresponding protein sequence file. What is the value of the ACCESSION record in the protein sequence file?
Other organisms might also have cytochromes with similar sequences to those in sperm whales. BLAST is a tool that allows you to search for similar sequences among NCBI’s various sequence databases. Read about BLAST at website http://www.ncbi.nlm.nih.gov/BLAST/.
Which BLAST method do you want to use to search for a protein sequence against a protein database (eg. blastn, blastp, tblastn, or blastx)?
Which BLAST method would you use to search for a nucleotide sequence against a nucleotide database?
Which BLAST method would you use to search for a nucleotide sequence against a protein database (think about why you might want to do this)?
Do a BLAST search against SwissProt database using the first three rows of the sperm whale cytochrome B protein sequence.
(THINGS TO NOTE: The results give a long list of "hits" along with scores. The cytochrome b sequence has a large number of hits in many different organisms indicating that the cytochrome b sequence is very conserved)
The top few hits are the sperm whale (physeter catodon) protein sequences. Look at the first hit which is not from the sperm whale. What organism is that sequence from?
In a BLAST alignment, a '+' indicates what type of mutation?
Go to the protein data bank which is a database of protein structural information (http://www.rcsb.org).
How many structures are in PDB?
Search the Protein Data Bank for a crystal structure of E. coli CAP (catabolite activator protein). You should get several possible hits – we will focus on the record 1ZRC. This file contains the 3D structure for the CAP protein as well as DNA bound to it.
According to the results page of your search, what experimental method was used to obtain the protein structure?
In the structure you will shortly see an obvious kink in the DNA. With your PDB hit there should be an abstract listed. This is the paper that produced the crystal structure. According to the abstract of this paper, what is the roll angle of the DNA kink? (Note: This paper reports on four different structures, each with a different base pair substitution at position 6 of the DNA consensus sequence. Record 1ZRC is the structure in which the original DNA sequence is maintained.)
PDB offers a variety of different visualization programs to look at the protein structure. Open the applet entitled Jmol view. The PDB crystal structure contains not only the protein structure of CAP, but also the DNA that binds to the protein. With the current representation it is difficult to discern the protein and the DNA. Color the DNA red. (dragging your mouse over the image rotates the structure and may allow you to see the DNA more clearly).
Choose the AT pairs and color them blue. Choose the GC pairs and color them yellow.
(Note #1: To make a selection right click on the image and go to the select menu. You can then choose to select nucleotides, proteins, etc. After selecting the appropriate elements go to the color -> cartoon menu and choose your color selection. Jmol allows you to make successive selections so it is important that you reset the selection before focusing on a different element. (selet-> frames->all frames). For example, if you select all the Valine amino acids in the stucture you must reset the selection before you can highlight all the Leucines in the structure).
(Note #2: In a few cases Jmol confuses T with U in the DNA. So if you try highlighting the T’s and it does not work you might want to play around with highlighting the U’s instead.)
Approximately how many AT pairs are represented in this structure?
We were taught in class that Adenine (A) always pairs with Thymine (T). What does U stand for and under what circumstances do you see it instead of Thymine?
Now select Protein->all. In cartoon view we can color the secondary structures. The CAP protein contains both alpha helices and beta sheets. There should be an obvious secondary structure visibly interacting with the DNA major groove. What is it?
DNA-binding proteins rarely "reinvent the wheel". In fact, most bind DNA with one of only a few motifs. According to this web-site which DNA-binding domain does CAP use?
a. Leucine Zipper
b. Zinc finger
c. CRP domain
Now we want to compare the positions of Valine and Histidine in your protein. Select these amino acid and color them on the image. Does this comparison give us any clues as to their physicochemical properties?
b. Valine has a more rigid side-chain than Histidine
c. Valine is hydrophobic, Histidine is hydrophilic
d. Valine and Histidine are both essential to alpha-helices.
View the structure using KiNG viewer. The beta-sheets are represented by arrows, where the arrow directions are the direction of beta-sheets.
Which of the following secondary structures is also visible in the protein structure?
a. Parallel beta-sheet
b. Antiparallel beta-sheet
c. Mixed beta-sheet
d. Anti-clockwise beta-sheet