In bioinformatics, the road to perdition generally goes right through a black box. In any field,
if you use a formula you don't understand or rely on a mysterious tool that "just works"
(except when it doesn't), you'll sometimes get burned. In bioinformatics, where big data sets
magnify the effects of small misunderstandings, you'll find yourself frequently in a conflagration.
Don't do it! Repent before it's too late!
Statistics are designed to help us keep our feet on firm ground, but they themselves are
black boxes to most, invitations to use formulas you don't understand. Some of us (not many, and
certainly not me!) gain understanding into statistical formulas by deriving them from first
principles. If you're in the other camp, connecting them to concrete simulations may help you see what they
do and when to use them (and when not to).
This is why you've gone through the Mendel tour constructing a simulation that cranks out a
Chi-squared result and why you've made an analogous simulation for t-test.
- How far were you able to get in simulating Mendel's experiment?
- Have you compared the results of your simulation with a computed chi-squared test?
- How far were you able to get in simulating a t-test (either comparing the frequencies of
"ATG" in overlapping vs nonoverlapping genes or analyzing coral reef phages (in problem set 6)?
- Do you see how the two simulations are inherantly different? (because the circumstances
in which chi-squared tests are appropriate are inherantly different from circumstances in which
t-tests are appropriate)
- Any other questions on your mind about the tour or statistics?