Virginia Institute for Psychiatric and Behavioral Genetics (cross-appointed to the Dept. of Biostatistics and the Center for the Study of Biological Complexity), at Virginia Commonwealth University
Today’s sophisticated biotechnologies and electronics enable researchers to gather data in quantities unimagined ten years ago. These data acquisition technologies are changing the nature of research in biology and are poised to revolutionize medical diagnosis and treatment. At the same time the infrastructure of knowledge is changing: a great deal of relevant information is stored in online databases, which may aid interpretation of experimental and clinical data.
Statisticians and mathematicians must accept the challenge of analyzing and integrating these new data sets. The first challenge is to extract a clear signal from the technologies; there are many confounding factors, such as technical or physiological artifacts, which distort the signals. Then we may test hypotheses about biological organization or mechanisms against the data. Usually we are testing hypotheses of a common form for many specific items, such as genes or brain regions; these may be simple hypotheses (e.g. which gene expressions are changed) or more complex (e.g. which measures are correlated). Finally we must take advantage of previous efforts, usually in the form of databases, to constrain and aid our analysis.
Recently new technologies such as fMRI, calcium imaging, and voltage-sensitive dyes have enabled collection of broad swathes of neural activity over time. This is the domain of multivariate analysis but only recently have a few statisticians begun to develop multivariate methods specific for such data.
We who analyze such data are like the prisoners in Plato’s Cave: with our measures we perceive only a shadow of the reality, and we must infer the reality from the data using our imagination and logic. In my opinion the best analytic approaches combine statistical subtility with knowledge of the processes under study.
Many psychiatric diagnoses are highly heritable, but common genetic variants can only have very modest effects on such disorders. Then the problem becomes how to identify these variants. We are combining information from genetic association studies with the ENCODE, BrainSpan and NIH RoadMap Epigenomics data as well as genomic conservation to identify variants that are more likely to have effects, using a novel empirical Bayes approach.
We are developing methods for analysis of genome-wide data from the brain, especially the BrainSpan data. We are addressing questions about development and homeostasis through integrative analysis of many different data sets. We are addressing issues in neurogenomics such as how to handle the varied proportions of different cell types in brain tissue, and how to normalize epigenetic data.
We are developing methods to model and analyze high-dimensional brain activity recorded by voltage-sensitive dyes, calcium imaging, and high-density electrode arrays. We have developed a model for cortical activity that integrates intrinsic dynamics and cross-regional connectivity. We have developed new methods for characterizing plasticity in cortex. We are developing methods for characterizing the common factors behind the activity of many individual neurons. We have found through analysis of many human neurogenomics data sets that there is extraordinary individual variation in the expression of the crucial GABA receptors. We are modeling the effect this would have on gamma rhythms.
This course discusses in depth bioinformatic methods for sequence analysis and integrates them with the new high-throughput epigenetic and chromatin data.
This course introduces major techniques for the analysis of multi-unit recordings, EEG and fMRI data.
This course for molecular biologists has been taught every year since 2005; in 2012 a version was taught in Moscow.
This course covers analysis of high-throughput sequencing data and advanced array and pathway analysis.
This course introduces analysis of high-throughput genomic assays using microarray data