High Throughput Data Analysis Course

High Throughput Data Analysis

BIOS 691-803; Winter/Spring 2010
Instructor: Dr. Mark Reimers, VCU Biostatistics

This course is designed to build competence in quantitative methods for the analysis of high- throughput molecular biology data. The emphasis will be on how to think about issues that will come up in many sorts of high-throughput data, such as systematic and correlated technical errors, data normalization, correlated variability due to similar biological function, approximate significance estimation, and biologically meaningful testing. The technologies emphasized will be microarrays (using data from all major manufacturers, and data of all major types (CGH, ChIP, Methylation, and SNP chips), and quantitative high-throughput sequencing. The course meetings will consist of short lectures, demonstrations, and discussions.

Quality Assessment and Normalization of Arrays (6 classes)

Displaying residuals for QA diagnostics
Comparing current normalization methods for expression data
Advantages and drawbacks of quantile normalization
Covariation of residuals with technical covariates
Systematic errors and correlated differences
Approaches to normalization by estimating technical distortion
Normalization by singular value decomposition

Summarization of various Array-Based Assays: CGH, ChIP and Methylation (5 classes)

Linear models for oligonucleotide probe sets
CGH arrays -- normalization and segmentation
Normalization of immuno-precipitation (IP) arrays
ChIP and MeDIP arrays -- estimation by moving average
Biological issues in moving average techniques

Issues for high-throughput sequencing data (6 classes)

Issues in mapping reads
Variability in representation of sequences
Normalization of raw count data
ChIP-Seq analysis
RNASeq analysis
Metagenomic data analysis

Tests of significance and multiple comparisons

Simes’ Lemma and FDR theory
Permutation procedures for FDR
Stein’s Theorem and shrinkage procedures for reducing overall error
SAM and Empirical Bayes procedures
Power Calculations

Multivariate analysis of pathways and GO functional groups

Tests for systematic (but modest) changes in groups of genes
Comparison of pathway configurations between control and disease/treatment groups