Reversal of Fortune
One possibility is that the numbers actually are four-byte numbers, and
the other four bytes of the field are somehow skipped (perhaps hold the
number 4, the length of the actual data in the field). This
theory has the advantage that if it's correct Perl will do all the work
for us, so it's easy to check. Changing the d
packing codes to x4N
gives
all0001, 0, 0, c, unknown protein
-- not as weird as the previous attempts, but surely an open reading
frame can't start and end at the same nucleotide.
Using x4V
instead gives exactly the same result as x4N.
SQ5: Why?
We could try Nx4 and Vx4;
they give
all0001, 3228790784, 1082961920, c, unknown protein
and
all0001, 7369664, 11570240, c, unknown protein
respectively.
Let's abandon that idea, and go back to thinking of $orf_left
and $orf_right
as eight-byte numbers. At this point we might hit on the idea of
reversing those numbers before decoding them, by one of two methods:
- Inspiration: Since x4V and x4N
give zero, we know that the right-hand side of the eight-byte numbers
are zero, while the large numbers in Nx4 and Vx4
tell us that the left-hand sides do contain something. Perhaps these are
big-endian numbers, as opposed to the little-endian numbers of our
personal computers.
- Desperation: The reasoning
above isn't really correct (SQ6:
Why?) But otherwise we're at the end of our rope. We know that numbers
are written in two different directions, and the one we've tried
doesn't work. So why not try the other direction?
The fly in the ointment is that Perl doesn't have a "reverse d".
However, we can do something equivalent with some hand-written
Perl code.
SQ7: How?
Answer here.