Scenario3: Nuanced search for motifs in genome

Biol 591

Introduction to Bioinformatics
Scenarios

Fall 2003

Nuanced search for motifs in a genome

Scientific story (html)

In brief: There must exist a gene in the cyanobacterium Anabaena that is regulated by nitrogen deprivation (through the DNA-binding protein NtcA) and whose product regulates differentiation (leading to N₂-fixing heterocysts). You have in hand a reasonable collection of sequences known to bind NtcA, but you're not sure exactly what features the protein finds important. Your task is to extract as much information from the known binding sequences as possible and use it to scan the genome of Anabaena looking for candidate binding sites.

Bioinformatic tools

Position-specific scoring matrices (PSSMs)
Identify positions in sequence alignments that carry the most information and use frequencies at those positions to characterize aligned motifs

Molecular biology concepts: Nothing new

Perl focus: Hashes; Sorting

Programs

FindMotif.pl - Constructs PSSM from aligned sequences, scans genome, produces list of most plausible motifs
Data: Small set of aligned sequences (71NpNtSm.txt)
Meme - Web-based program designed to find statistically overrepresented motifs in a collection of sequences.
Click on MEME - Submission form to use program. Explore other links to learn more about the program.

Notes

Position-specific scoring matrices (PDF) (Questionnaire)
PSSM program (PDF) (Questionnaire)

Problem Set: Just one for this scenario (HTML)