A New Packing String
SQ2a: What character signals the
start of strings? What character signals the start of numbers?
It looks like S actually comes in front of strings, since quite often
we see an S and then a little later some intelligible words (or at least
printable letters). There seem to be lots of N's followed by
eight pound signs, or eight characters most of which are pound signs.
(Sometimes by coincidence a few bytes of an eight-byte number
will turn out to be printable letters when looked at one by one.)
SQ2b: Make an hypothesis about
where each field in the table starts and ends, and write the start and
end points in the printout above.
Here are some field locations:
<---OrfName---><Contig><OrfLeft><OrfRght><Dir><--OrfAccession--><OrfPct-><-
#X###S###all0001 S###C N#sp#####N########S###cS###sp|Q06852|SLP1S### 50 N#
Eval--><----------------------OrfDescr---------------------->
#######S##2unknown protein S###PM# #S###NPun64
7.032N##3#####N##7 ####N1#####BCS### N########N########N####
It looks like there is an additional five-type field at the beginning
of the record that wasn't listed; that strings start with an S, and have
three bytes for a length rather than four; and that numbers begin with
an N rather than an S.
It also looks like the strings are all padded with blanks, so we may
not really need to read in the string length.
SQ3: Here's a modified version
of our original loop, with $name_length and $descr_length
removed. Change the packing string to agree with the analysis of fields
above. Check your answer on the next page.
while (read ORF_DATA, $buffer, $record_length) {
my ($orf_name, $orf_left, $orf_right, $orf_direction, $orf_descr)
= unpack("Va11x4x5x1dx1dx4a1x4x14x4x5x1x8Va50", $buffer);
print "$orf_name, $orf_left, $orf_right, $orf_direction, $orf_descr\n";
}