Biol 591 |
Scenarios |
Fall 2003
|
Scientific story (html)
In brief: You have in hand RNA from bone marrow samples from two classes of patients: those with acute lymphoblastic leukemia and those with acute myeloid leukemia. Superficially, the two classes of leukemia are very similar, but effective treatment of them differs markedly. How can you use the RNA to identify genes that are expressed differentially between the two classes of leukemia. How can you use this knowledge to build a tool to identify patients with one class or another, thereby pointing the way to effective treatment?Bioinformatic tools
Statistical analysis of microarray dataMolecular biology concepts: Microarrays
How to find from data exhibiting some degree of random fluctuation genes whose expression levels can be used to distinguish between two classes of people.
Perl focus: Planning and writing a Perl program
Paper
Golub et al (1999). Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring. Science 286:531-537.
Data (Training set): Expression pattern of 6817 human genes in leukemia patients with identified class of leukemia (txt)
Samples 1 - 27 are from patients with acute lymphoblastic leukemia (ALL)Data (Independent set): Expression pattern of 6817 human genes in leukemia patients with identified class of leukemia (txt)
Samples 28-38 are from patients with acute myeloid leukemia (AML)
Samples 39-72 are from patients with acute leukemia of an unknown classSee also web site for paper: http://www-genome.wi.mit.edu/MPR Presentations and notes
Microarray technology: Presentation (ppt)ProgramsSpreadsheet: Average difference (xls)Distinguishing clinical subgroups using microarrays: Presentation (ppt) Notes (pdf)Spreadsheet: ALL vs AML example (xls)A program to predict the ALL/AML class distinction: Notes (pdf)
Spreadsheet: Correlation (xls)
Class_predictor.pl - Shell of a program to calculate and sort correlation values a la Golub et al and to display the genes that best predict the ALL/AML class distinction.Problem Set: (pdf) (uses programs with links above)
Permute_training_set.pl - Program to calculate predicted curves for randomized data (as seen in Fig. 2 of Golub et al). Used in Problem Set 6.
Vote.pl - Shell of a program to predict ALL/AML status of patients in independent training set (see above). Used in Problem Set 6. Golub et al's own opinions on the matter can be found at the web site for the paper (see above).
data_set_ALL_AML_best.txt: Predictor set used by Vote.pl