P1P.1. SequenceSearch2.pl searches for the pattern GTA.{8}TAC.{20,24}TA.{3}T, which stand for GTA followed by a gap of eight positions, then TACfollowed by a gap of 20 to 24 positions, then TA followed by a gap of three positions and a T.Exact matches: 147 Matches with one possible mismatch 3734
Suppose we decide to narrow the search, that we are only interested in some of the sequences we counted above: those where GTA is immediately followed by another G. How would you alter the pattern?
Check: Perhaps you're pretty sure you've got the right pattern and would like some confirmation. Here's what I get when I run the program with my solution:
P1P.2. Now suppose that we narrow the search further: as well as a G following the initial GTA, we want TAC to be immediately followed by a C. What pattern should we use?Exact matches: 34 Matches with one possible mismatch 1017
Check: this pattern should give 9 exact and 294 inexact matches.
P1P.3. Go back again to the original pattern, GTA.{8}TAC.{20,24}TA.{3}T . How would you change this pattern to search only for the consensus NctA binding site, without caring if it's in range of a promoter or not?
Check: this pattern should give 1497 exact and 28731 inexact matches.
P1P.4. Figure 3 on page 5 of the Scenario 1 Molecular Biology notes (PDF) lists the sequences upstream from 20 cyanobacterial genes regulated by nitrogen deprivation. In each sequence the bases corresponding to the consensus NctA binding site and to the promoter sequence are printed in bold. Sometimes the correspondence is exact; the upstream sequence has all nine of the bold bases. But that's not always true. For instance, in the second line (nirB-ntcB), the second base of the promoter is a T rather than an A, so that line has only eight bold bases.
Examine the sixth sequence (amtl). Find a reason for wondering if our simulation might underestimate the probability of finding a match for the consensus binding sequence and promoter. (Consider exact and inexact matches.)
P1P.5. Examine the first two bold columns of Figure 3. Find a reason for wondering if our simulation might overestimate the probability of finding a match. (Again consider exact and inexact matches.)