The Institute for Genomic Research (TIGR) has developed what is rather ambitiously termed the Comprehensive Microbial Resource (CMR), which provides a common interface through which to analyze all completed microbial genomes. You can also use CMR to download genomic sequences and sets of proteins deduced from genomic sequences, in case you want to do a kind of analysis not supported at the site.
The instructions that follow are designed to download
the set of proteins deduced from the E. coli K-12 genomic sequence.
You can modify the instructions to download sets from other organisms or
to download the complete genomic sequence.
|
|
Go to the TIGR/CMR web site | (When you have a moment, pay a visit to the many other links on this page) |
Scroll most of the way down to a section on the left
called Multi-Genome Applications.
Click on Batch Download |
Gets you to CMR Batch Download page. |
Choose type of sequence you want to download: protein | Choosing DNA would allow you to download the genomic sequence |
Choose specific organism: Escherichia coli K12-MG1655 | |
Choose specific molecule: Main Escherichia coli... | I know there's only one molecule. Click on it anyway. |
Scroll down and click on Submit
Save the file to the same subdirectory in which you saved Blast, giving it a descriptive name like EcK12.FA |
Your browser will probably ask you whether to open the
file or save it to disk. If so, choose save. If not, go to the browser's
File
menu and click on SaveAs. If your browser
shows you a Save as type box, ignore it..
The FA extension stands for FastA, a file format discussed below. |
Congratulate self | You're done! |
FastA format
This is a common format for protein and DNA sequences, originally used by the FastA program (a program similar to Blast). The format is:
>[description]
[sequence, in one-letter code if protein. Upper/lower
case doesn't matter.]
Example of a protein sequence in FastA format
>b5937: a mythical protein
MTAQQDPRES...
Example of a DNA sequence in FastA format
>zlr4284: a mythical gene
ATGCCCGACGAAGAC...