Getting Closer


We need to change the packing string "Va11x4x5x1dx1dx4a1x4x14x4x5x1x8Va50". First we change the two V's, which we had used to read in $name_length and $descr_length. Replacing them with  x4's gives "x4a11x4x5x1dx1dx4a1x4x14x4x5x1x8x4a50". Since we're not going to use the lengths, we might as well use capital A's to trim any blanks from the end of the string.  We also need to put an x5 at the beginning to compensate for the apparent five-byte field at the beginning of the record: "x5x4A11x4x5x1dx1dx4A1x4x14x4x5x1x8x4A50".

A typical line starts:
all0001, 5.39198933346459e+67, 6.51851512433577e+91, S, nknown protein
We've got two problems: the numbers are insane, and $orf_direction and $orf_descr seem to be off by one.  Let's concentrate first on the strings. Looking again at our layout, it looks like we're reading the S signalling the start of the OrfAccession string instead of the c immediately to its left; and it looks like we're missing the first character of the description -- surely it should be "unknown protein" instead of "nknown protein". Comparing our layout to the table of fields, we see that OrfContig is actually 4 characters long instead of the 5 in the table. So our actual packing string should be "x5x4A11x4x4x1dx1dx4A1x4x14x4x5x1x8x4A50"
     <---OrfName---><Contig><OrfLeft><OrfRght><Dir><--OrfAccession--><OrfPct-><-
#X###S###all0001 S###C N#sp#####N########S###cS###sp|Q06852|SLP1S### 50 N#
That looks better:
all0001, 3.64109780379298e-317, 5.71645809813822e-317, c, unknown protein
all0002, 5.45767837042208e-317, 7.013392276047e-317, c, unknown protein
asl0003, 3.71199029518345e-317, 7.59760711589102e-317, c, unknown protein
arl5500, 2.93514518881368e-317, 3.06504097589309e-317, c, ssrA: 10Sa RNA
Well, except for the numbers. Now we're in a quandry. It doesn't seem likely that the numbers are in the wrong location, since the strings after them are printing out perfectly.

SQ4: Name two plausible reasons the numbers might be decoding incorrectly.

Go to the next page to find out which it is...