Biol 591 
Introduction to Bioinformatics
Fall 2003 

Scenario 3: Comparison of protein sequence against database
How to translate the DG47 DNA sequence

You want to figure out what amino acid sequence the DNA in the DG47 fragment might encode. You don't know where translation begins in the gene and you don't even know whether it is on the strand you provide or on the complementary strand. So a useful translation tool needs to translate all three reading frames in the forward direction and all three reading frames in the (implied) backwards direction. It's up to you to choose what you think is a reasonable open reading frame. For starters, you might look for large regions devoid of stop codons.

There are several online tools to translate DNA sequences into amino acid sequences. One useful tool is the Orf Finder at NCBI, but this page will describe the use of the translation tool provided in the Colorado State University Molecular Toolkit. It is possible to avoid the need for translation altogether by running the DNA sequence through BlastX, which translates the sequence as it goes, before comparing the translation product to a protein database.
 

Action
Explanation and notes
Go to the Colorado State University Molecular Toolkit This toolkit has a variety of useful features. Why not bookmark the site?
Click on the Translate under Nucleic Acid Analysis... Look at that... Isn't it nice to find a web tool that supplies its own clear instructions?
Scroll down to the white box. Paste in the sequene of DG47.  Warning: Most programs accept sequences in FastA format, but this one does not. Copy only the sequence itself, not the FastA header line.
Paste the DG47 sequence into the Search box. If you don't have the sequence, get it now by clicking here Your browser may ask you whether to save or open the file DG47.nt. Saving it is best, since you're going to use it later. You can open it in Notepad to get the sequence. The DG47.nt sequence is in FastA format. It's best to copy the entire file into the Search box, including the FastA header (the line beginning >), because Blast recognizes the header and uses the line as a label.
Click the Translate DNA button. This will produce graphical output in the black box. Note that there are six red lines (= 3 reading frames x 2 strands). Note that only one, Forward frame 1, lacks pink stop codons. Note also that while this reading frame contains a possible start codon, start codons also function as methionine codons.
Click the Text output button. Make sure the left box is set to Forward frame 1. Now the lower box has the translated amino acid sequence. Check that it is 190/3 = 63 amino acids long. If you wanted to look at other reading frames, you could reset the left box.
Click on the third box (showing Amino acids and DNA) and change it to read Amino acids only. Change the fourth box from 60 characters/line to 90 characters/line. These manipulations make it possible now to cut out just the amino acid sequence (without accompanying coordinates).
Congratulate self You're done!