The File Decoded

One way to do it is this: first read in the eight bytes with no decoding, using the a8 packing code. That give us a string of eight bytes. Reverse the bytes (with reverse), and decode the reversed string using the d packing code. Something like this:

   while (read ORF_DATA, $buffer, $record_length) {
      my ($orf_name, $orf_left, $orf_right, $orf_direction, $orf_descr)
         = unpack("x5x4A11x4x4x1a8x1a8x4A1x4x14x4x5x1x8x4A50", $buffer);
      my $reverse_left = reverse($orf_left);
      $orf_left = unpack("d", $reverse_left);
      my $reverse_right = reverse($orf_right);
      $orf_right = unpack("d", $reverse_right);
      print "$orf_name, $orf_left, $orf_right, $orf_direction, $orf_descr\n";
   }
   close ORF_DATA;

And sure enough, we get:

all0001, -311, 918, c, unknown protein
all0002, 981, 1718, c, unknown protein
asl0003, 2617, 2805, c, unknown protein
arl5500, 2861, 3247, c, ssrA: 10Sa RNA
all0004, 3418, 4365, c, AtpC: ATP synthase subunit gamma

A more succinct way of writing

      my $reverse_left = reverse($orf_left);
      $orf_left = unpack("d", $reverse_left);
      my $reverse_right = reverse($orf_right);
      $orf_right = unpack("d", $reverse_right);
      print "$orf_name, $orf_left, $orf_right, $orf_direction, $orf_descr\n";

is the following:

      foreach my $coordinate ($orf_left, $orf_right) {
         $coordinate = unpack("d", reverse($coordinate));
      }

Here's the complete program to display the fields we're interested in:

#!/usr/bin/perl -w
use strict;

dump_data("7120DB.DAT");

sub dump_data {
   my ($orf_file) = @_;
   my $record_length = 230;
   open ORF_DATA, "<$orf_file" or die "Can't open $orf_file: $!\n"; 
   binmode ORF_DATA;  # Tell Perl this isn't a text file
   my $buffer;
   while (read ORF_DATA, $buffer, $record_length) {
      my ($orf_name, $orf_left, $orf_right, $orf_direction, $orf_descr)
         = unpack("x5x4A11x4x4x1a8x1a8x4A1x4x14x4x5x1x8x4A50", $buffer);
      foreach my $coordinate ($orf_left, $orf_right) {
         $coordinate = unpack("d", reverse($coordinate));
      }
      print "$orf_name, $orf_left, $orf_right, $orf_direction, $orf_descr\n";
   }
   close ORF_DATA;
}

If we run it, at the very end we'll see the following puzzling line:

x outside of string at print-db-7.pl line 13.

This is a symptom of a short line: Perl has tried to skip, with the x packing code, past the end of $buffer. That is, having read some number of 230-byte records from 7120DB.DAT, there's something left over; the file is not evenly divisible by 230.

The length of the file is 1425775. A small Perl program, or some work with a calculator, tell us that 230 times 6119 is 1425770; in other words, there are five extra bytes in the file.

Hmmm... five extra bytes. We previously decided there was an extra five-byte field at the beginning of each 230-byte record. Maybe instead of five bytes at the beginning of each record there are five extra bytes at the beginning of the entire file; five extra bytes and no more.

We can add the line

   read ORF_DATA, $buffer, 5

to fix the problem.

SQ8: Where should we add the line?