I presume that you have already downloaded and installed Blast (if not, then click here) and downloaded two sets of protein deduced from genomic sequences, one from the genomic sequence of E. coli K-12 and the other from the genomic sequence of either E. coli O157:H7 EDL399 or E. coli O157 Sakai (if not, then click here). (If you don't know which strain to choose, click here)
Blasting the protein of one genome against the protein of another proceeds in two steps. First, you need to let Blast analyze one set of protein to create a database it can understand. Second, you need to run Blast to compare each protein of the OTHER set of protein to that database. You'll make the database from the set of E. coli K12 protein. You'll run the set of proteins from your pathogenic strain against that database.
1. Create a database of E. coli K12 protein
a. Get into a Dos window (Run Command or Cmd)
b. Get into the directory where Blast and the FA files reside (CD \Blast)
c. Type the following command to format the database:
formatdb -ieck12.FA –pT –oT –nK12-Prot
- formatdb invokes the Blast accessory program to create the database
- -i tells the program that the path that follows leads to the input file. The file name eck12.FA is used only as an example. Use whatever name you gave the file of E. coli K12 protein sequences you downloaded.
- -pT tells the program "True, the file consists of protein sequences" (-pF would have been appropriate for DNA sequences)
- -oT tells the program "True, you should make an index of the identification numbers for the protein sequences" (the K12 file uses ID numbers like b0001). Frankly, I don't know what good the index does, but it's cheap.
- -n tells the program that the characters that follow should be used as the name of the database (you can name it anything you want, so long as you use 8 or fewer legal characters).
[NULL_Caption] WARNING: lcl|1445 has zero-length sequence
[NULL_Caption] WARNING: lcl|2827 has zero-length sequence
[NULL_Caption] WARNING: lcl|3800 has zero-length sequence
[NULL_Caption] WARNING: lcl|3973 has zero-length sequenceThese messages mean that the protein sequences b1445, b2827, b3800, and b3873 don't have any amino acids. Which is not very likely. TIGR evidently screwed up, but problems with four out of about four thousand proteins aren't going to hurt us much.
2. Run Blast to compare the set of proteins from
the pathogenic strain to the database
a. Type the following command to run Blast:
blastall –pblastp –dK12-Prot –iEdl-Prot.FA –oEdVsK12.txt –e.001
b. Be prepared to wait a while. It may be a couple of hours, depending on how fast your computer is. The program is done when it brings you back to a DOS prompt (>). It will not ring a bell. It will not print a message.
- blastall invokes the program of comparing individual sequences to a database of sequences
- –pblastp tells Blastall that the specific Blast program you want to use is blastp (good for comparing protein sequences to protein sequences)
- –dK12-Prot tells the program to use K12-prot as the database. If you named the database made by FormatDB something else, then use that name instead
- –iEdl-Prot.FA tells the program to use Edl-Prot.FA as the input to Blast. The file name Edl-Prot.FA is given only as an example. Use whatever name you gave the file of pathogenic E. coli protein sequences you downloaded.
- –oEdVsK12.txt tells the program to use EdVsK12.txt as the output file. The file name EdVsK12.txt is given only as an example. Use any name you like.
- –e.001 tells the program to ignore matches that would occur by chance with an e-value (probability) greater than 0.001
c. Be prepared to fill your disk drive with LOTS of output, something on the order of 40 megabytes.
d. How do you know whether the program worked? Don't try read it into something like Word (you risk choking it). I don't think that Microsoft has any solution for us, but there is an ancient freeware program from the pre-Windows era that will do the job. Click here to download DR (standing for DiRectory). Put it in the Blast directory.
e. Run DR (>DR) to get a list of files in \Blast, then press the F10 key to sort the files by date of creation, then press the End key to go to the end of the list. You should see the file you just made. Press the Enter key to see the contents of the file (you can scroll through the file using the keys you expect).
f. You should see something like:
BLASTP
2.1.3 [Apr-1-2001]
Reference:
Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer,
Jinghui
Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997),
"Gapped
BLAST and PSI-BLAST: a new generation of protein database search
programs",
Nucleic Acids Res. 25:3389-3402.
Query=
thrL, thr operon leader peptide, Escherichia coli O157:H7
(EDL933)
(27 letters)
Database:
K12-Prot
4289 sequences; 1,355,879 total letters
If so, you win!