Getting Closer
We need to change the packing string "Va11x4x5x1dx1dx4a1x4x14x4x5x1x8Va50". First we
change the two V's,
which we had used to read in $name_length and $descr_length.
Replacing them with x4's gives "x4a11x4x5x1dx1dx4a1x4x14x4x5x1x8x4a50". Since we're not going to use
the lengths, we might as well use capital A's to
trim any blanks from the end of the string. We also need to put
an x5 at
the beginning to compensate for the apparent five-byte field at the
beginning of the record: "x5x4A11x4x5x1dx1dx4A1x4x14x4x5x1x8x4A50".
A typical line starts:
all0001, 5.39198933346459e+67, 6.51851512433577e+91, S, nknown protein
We've got two problems: the numbers are insane, and $orf_direction
and $orf_descr
seem to be off by one. Let's concentrate first on the strings.
Looking again at our layout, it looks like we're reading the S
signalling the start of the OrfAccession string instead of the c
immediately to its left; and it looks like we're missing the first
character of the description -- surely it should be "unknown protein"
instead of "nknown protein". Comparing our layout to the table of
fields, we see that OrfContig is actually 4 characters long instead of
the 5 in the table. So our actual packing string should be "x5x4A11x4x4x1dx1dx4A1x4x14x4x5x1x8x4A50"
<---OrfName---><Contig><OrfLeft><OrfRght><Dir><--OrfAccession--><OrfPct-><-
#X###S###all0001 S###C N#sp#####N########S###cS###sp|Q06852|SLP1S### 50 N#
That looks better:
all0001, 3.64109780379298e-317, 5.71645809813822e-317, c, unknown protein
all0002, 5.45767837042208e-317, 7.013392276047e-317, c, unknown protein
asl0003, 3.71199029518345e-317, 7.59760711589102e-317, c, unknown protein
arl5500, 2.93514518881368e-317, 3.06504097589309e-317, c, ssrA: 10Sa RNA
Well, except for the numbers. Now we're in a quandry. It doesn't seem
likely that the numbers are in the wrong location, since the strings
after them are printing out perfectly.
SQ4: Name two plausible reasons
the numbers might be decoding incorrectly.
Go to the next page to find out which it
is...