Program Outline


Here's one reasonable outline: read in the genome, find the ORFs, sort them, and print them.
############################# Files ###################################

$genome = get_genome_sequence("Synecho.nt");

########################## Main Program ###############################

find_orfs();
sort_orfs_by_starting_position();
print_orfs();
For a quick test, we can fill in some stubs (subroutines that don't do anything, inserted so Perl won't complain), and steal a genome-reading subroutine from an earlier program.

######################## Subroutines ##################################

sub find_orfs {
}

sub sort_orfs_by_starting_position {
}

sub print_orfs {
}

# Read in a FASTA-format sequence
#
sub get_genome_sequence {
my ($file_path) = @_;
open GENOME, "<$file_path" or die "Can't open $file_path: $!\n";
$_ = <GENOME>; # discard first line
my $sequence = "";
while (<GENOME>) {
chomp($_); # Remove line break
s/\r//; # And carriage return
$_ = uc $_; # Make the line uppercase;
$sequence = $sequence . $_; # Append the line to the sequence
}
return $sequence;
}
Here's orf1.pl, which has what we've done so far.

If you run it, it will pause briefly as it reads in the genome, but it doesn't really do anything. We need to start filling in the other subroutines; but where should we start?  This is a chicken-and-egg problem: We've done printing  and sorting before, so those are very likely easier than
find_orfs, and we'll want to have them done so we can check the output from find_orfs; but without a few ORFs to sort and print, we'll have a hard time testing the sort and print routines.

SQ2: Think about possible solutions to the chicken-and-egg problem.
SQ3: What will the body of the sort routine look like?

and go to the next page.