Active Stubs, Sorting, and Printing

An active stub is a subroutine that does just a little more than nothing -- it produces an arbitrary result that we use to test the rest of the program. Here's an active stub for find_orfs:

sub find_orfs {
   @orfs = ([23, 42, "d"], [0, 1, "c"]);
}

Here we're creating two rows in the @orfs array, one supposed ORF from the genome as read in (that's the "d"), starting at position 23 and ending at 42 and another supposed ORF that's from the reverse complement ("c"), starting at 0 and ending at 1.

SQ4: Could a real ORF-finding program produce these? (Hint: No.) Why not?

Here's a routine for sorting -- we've seen statements like this before.

sub sort_orfs_by_starting_position {
   @orfs = sort { $$a[0] <=> $$b[0] } @orfs;
}

For printing we could do something like the following, running through each row of @orfs. The starting position is in the first column, the ending position is in the second column, and the direction in the third column. Because Perl numbers things starting from zero, we add 1 to the starting and ending positions.

sub print_orfs {
   my $size = @orfs;
   foreach my $o (0..$size-1) {
      my $start = $orfs[$o][0];
      my $end = $orfs[$o][1];
      my $direction = $orfs[$o][2];
      $start = $start + 1;
      $end = $end + 1;
      print "$start\t$end\t$direction\n";
   }
}

Here's another way to do the same thing. Instead of running through the row numbers, and referring to each row with $orfs[$o], it uses a pointer to the rows. When $orf points to a row, we can refer to the entire row by saying @$orf, which lets us be more succinct:

sub print_orfs {
   foreach my $orf (@orfs) {
      my ($start, $end, $direction) = @$orf;
      $start = $start + 1;
      $end = $end + 1;
      print "$start\t$end\t$direction\n";
   }
}

Here's orf3.pl, which you can download and run. You should see the following output:

1       2       c
24      43      d

Now we're finally ready to find some real ORFs. We have two subtasks here: finding ORFs in the direct frames, and finding ORFs in the reverse complement frames. Let's start off with a slightly more detailed stub:

sub find_orfs {
   my @direct_orfs;    # ORFs found in the genome as given
   my @reverse_orfs;   # ORFs found in the reverse complement

   @direct_orfs = ([23, 42, "d"]);

   @reverse_orfs = ([0, 1, "c"]);

   # Set @orfs to the @direct_orfs followed by the @reverse_orfs
   @orfs = ???;
}

Oops! We haven't seen a Perl feature (yet) that lets us combine two arrays into one with a single statement.

SQ5: How would you perform the task with the Perl features you know?
SQ6: Perl does have a way to do this in a single statement. What do you think it looks like? Go to the next page to find out.