High Throughput Data Analysis
BIOS 691-803; Winter/Spring 2010
Instructor: Dr. Mark Reimers, VCU Biostatistics
This course is designed to build competence in quantitative methods for the analysis of high- throughput molecular biology data. The emphasis will be on how to think about issues that will come up in many sorts of high-throughput data, such as systematic and correlated technical errors, data normalization, correlated variability due to similar biological function, approximate significance estimation, and biologically meaningful testing. The technologies emphasized will be microarrays (using data from all major manufacturers, and data of all major types (CGH, ChIP, Methylation, and SNP chips), and quantitative high-throughput sequencing. The course meetings will consist of short lectures, demonstrations, and discussions.
Displaying residuals for QA diagnostics
Comparing current normalization methods for expression data
Advantages and drawbacks of quantile normalization
Covariation of residuals with technical covariates
Systematic errors and correlated differences
Approaches to normalization by estimating technical distortion
Normalization by singular value decomposition
Linear models for oligonucleotide probe sets
CGH arrays -- normalization and segmentation
Normalization of immuno-precipitation (IP) arrays
ChIP and MeDIP arrays -- estimation by moving average
Biological issues in moving average techniques
Issues in mapping reads
Variability in representation of sequences
Normalization of raw count data
ChIP-Seq analysis
RNASeq analysis
Metagenomic data analysis
Simes’ Lemma and FDR theory
Permutation procedures for FDR
Stein’s Theorem and shrinkage procedures for reducing overall error
SAM and Empirical Bayes procedures
Power Calculations
Multivariate analysis of pathways and GO functional groups
Tests for systematic (but modest) changes in groups of genes
Comparison of pathway configurations between control and disease/treatment groups