A New Packing String

SQ2a: What character signals the start of strings?  What character signals the start of numbers?

It looks like S actually comes in front of strings, since quite often we see an S and then a little later some intelligible words (or at least printable letters).  There seem to be lots of N's followed by eight pound signs, or eight characters most of which are pound signs.  (Sometimes by coincidence a few bytes of an eight-byte number will turn out to be printable letters when looked at one by one.)

SQ2b: Make an hypothesis about where each field in the table starts and ends, and write the start and end points in the printout above.

Here are some field locations:

     <---OrfName---><Contig><OrfLeft><OrfRght><Dir><--OrfAccession--><OrfPct-><-
#X###S###all0001 S###C N#sp#####N########S###cS###sp|Q06852|SLP1S### 50 N#

Eval--><----------------------OrfDescr---------------------->
#######S##2unknown protein S###PM# #S###NPun64

7.032N##3#####N##7 ####N1#####BCS### N########N########N####
It looks like there is an additional five-type field at the beginning of the record that wasn't listed; that strings start with an S, and have three bytes for a length rather than four; and that numbers begin with an N rather than an S.

It also looks like the strings are all padded with blanks, so we may not really need to read in the string length.

SQ3: Here's a modified version of our original loop, with $name_length and $descr_length removed. Change the packing string to agree with the analysis of fields above. Check your answer on the next page.
   while (read ORF_DATA, $buffer, $record_length) {
my ($orf_name, $orf_left, $orf_right, $orf_direction, $orf_descr)
= unpack("Va11x4x5x1dx1dx4a1x4x14x4x5x1x8Va50", $buffer);
print "$orf_name, $orf_left, $orf_right, $orf_direction, $orf_descr\n";
}