Biol 591 |
Scenarios |
Fall 2002
|
Scientific story (html)
In brief: You hit on the idea of undestanding the basis for pathogenesis by the deadly E. coli O157:H7 by comparing its total complement of protein with that of the nonpathogenic strain E. coli K12. Unfortunately, the comparison nets you a file bigger than anything you could go through in a year. How can you extract the useful information from the file and put it in a form a human could understand?Bioinformatic tools
BlastNotes - Molecular biology (PDF) (Questions)
Standard program to find similarities between sequences or sets of sequences.
Parsing program
Scans output, looking for items of interest as you define them. Outputs them to a separate file.
Programs
Blast (obtainable from NCBI site - see instructions) on how to download and run the program)Perl focus: Pattern matching and extraction of strings through regular expressions
Most people run this program off of the web. The point of interest for now is learning how to download the program so that you can tailor it to your own purposes.Protein databases (obtainable from TIGR-CMR site - see instructions)
Files containing all proteins deduced from completed DNA sequences of E. coli strains, used by Blast.Parsing program: BlastParser.pl- slightly simplified
BlastParser2.pl - full strength version
Problem Set - Molecular biology (PDF)
Problem Set - Programming (html)