Biol 591 |
Scenarios |
Fall 2003
|
Scientific story (html)
In brief: Genes that provide bacteria with exotic abilities, such as pathogenesis, often arise by horizontal transfer from other organisms. You would like to identify all genes in the sequenced genome of a bacterium that have foreign origins. Current methods work well with large blocks of DNA (i.e. many tens of genes in length) but not so well with individual genes, because they do not extract sufficient amount of DNA from a single gene to permit the characteristics of foreign genes to reliably rise above random variation. You would like to adapt a technique that makes greater use of the information within genes and use it to identify foreign genes.Bioinformatic tools
Markov modelsMolecular biology concepts: Compositional inhomogeneities in genomic sequences
Contrary to all those disclaimers from investment advisors, past performance CAN predict future behavior.
Perl focus: Using hashes
Papers
Ute Hentschel and Jörg Hacker (2001). Pathogenicity islands: the tip of the iceberg [Review]. Microbes and Infection 3:545-548NotesA quick review of pathogenicity islands (referred to in Notes for Nov 24)Samuel Karlin (2001). Review: Detecting anomalous gene clusters and pathogenicity islands in diverse bacterial genomes. Trends in Microbiology 9:335-343Review of methods to detect pathogenicity islands (main focus of Notes for Nov 24)Jan Mrázek, Devaki Bhaya, Arthur R. Grossman, and Samuel Karlin (2001). Highly expressed and alien genes of the Synechocystis genome. Nucleic Acids Research 29:1590-1601Attempt to apply methods to detect pathogenicity islands for the detection of individual foreign genes. Most of the article is concerned with highly expressed genes, however. (referred to in Notes for Nov 24)
Detection of anomolous regions of a genome (PDF)Programs
Construction of programs to detect anomalous genes (PDF)
Hamlet.pl - Creates Markov model based on text in input file and uses it to create pseudotextDATA: HamletSpeech.txt - Possible input for Hamlet.pl
DATA: Carols.txt - Possible input for Hamlet.pl
Display_hash.pl - Displays the contents of a hash in a logical format
MakeMarkov.pl - Creates Markov model based on set of DNA sequences. You'll write this based on Hamlet.plDATA: 6803PHX.nt - Training set of DNA sequences from bona fide genes of the Synechocystis PCC 6803
Run_orfs_through_model.pl - Assesses open reading frames using Markov modelProblem Set: Problem Set 8DATA: 6803Orfs.nt - All protein-encoding genes from Synechocystis PCC 6803
Alternate results (used in PS8.1h): 6803orfs_codon_bias.xls
Data file (used in PS8.6): aa_info.txt