Iterative Denoising

Overview

Data mining in the contexts of text and network traffic analysis is beset by multiple difficulties when the datasets are large and in high dimension. From a performance perspective, it can be prohibitively expensive to search in a high dimensional space. Moreover, visualizing and comprehending such spaces can be difficult for the user. Also, complex datasets often have local relationships of interest, findings that might be missed with global searches. Finally, an unsupervised context limits the ability of a user to analyze the corpus without first applying some structure to the data. Traditional data mining approaches are limited due to algorithms that make data distributional assumptions, are not scalable due to the Curse of Dimensionality, and do not provide intuitive ways for a user to visualize high-dimensional data. Our approach to overcome these difficulties is called Iterative Denoising, a methodology that allows the user to explore the data and extract meaningful, implicit, and previously-unknown information from a large unstructured corpus.

Publications

"Iterative Denoising of Computer Network Application Traffic," Kendall Giles, David Marchette, Carey Priebe, Don Waagen, Technical Report No. 655, Department of Applied Mathematics and Statistics, Johns Hopkins University, Baltimore, Maryland, November 2007.
"Iterative Denoising," Kendall Giles, Michael Trosset, David Marchette, Carey Priebe, Computational Statistics, Accepted for Publication, September 2007.
"Knowledge Discovery in Computer Network Data: A Security Perspective," Kendall Giles, Ph.D. Dissertation, Johns Hopkins University, Baltimore, Maryland, October 2006.
"Fast Iterative Denoising," Kendall Giles, Michael Trosset, David Marchette, Carey Priebe, Technical Report No. 653, Department of Applied Mathematics and Statistics, Johns Hopkins University, Baltimore, Maryland, December, 2005.
"Integrated Sensing and Processing Decision Trees," Carey Priebe, David Marchette, and Dennis Healy, IEEE Transactions on Pattern Analysis and Machine Intelligence, 26(6):699–708, 2004.
"Iterative Denoising for Cross-Corpus Discovery," Carey Priebe, David Marchette, Youngser Park, Ed Wegman, Jeff Solka, A. Socolinsky, Damianos Karakos, K. Church, R. Guglielmi, R. Coifman, D. Lin, Dennis Healy, M. Jacobs, and Anna Tsao, In J. Antoch, editor, COMPSTAT: Proceedings in Computational Statistics, 16th Symposium, pages 381–392, Physica-Verlag, Springer, 2004.

Presentations

"Knowledge Discovery with Iterative Denoising", Computer Science Department Seminar, Virginia Commonwealth University, Richmond, Virginia, October 8, 2008.
"Interactive Text Analysis with Iterative Denoising," Interface 2008, Text Data Analysis Session, Durham, North Carolina, May 23, 2008.
"Text Analysis with Iterative Denoising," International Biometric Society, Eastern North American Region, Arlington, Virginia, 19 March 2008.
"Iterative Denoising: A Flexible (and applicable to bioinformatics?) Knowledge Discovery Methodology," Bioinformatics Colloquium, George Mason University, 12 February 2008.
"Basics of Knowledge Discovery Engines," Mathematics of Knowledge and Search Engines research program, University of California, Los Angeles, Institute for Pure and Applied Mathematics, Los Angeles, California, September 2007.
"Towards the Use of Iterative Denoising for Biological Data Exploration," 2007 Summer Research Conference, Southern Regional Council on Statistics, Richmond, Virginia, June 2007.

Collaborators

Kendall Giles: Virginia Commonwealth University
David Marchette: Naval Surface Warfare Center
Youngser Park: Johns Hopkins
Carey Priebe: Johns Hopkins
Jeff Solka: Naval Surface Warfare Center
Michael Trosset: Indiana University
Don Waagen: Raytheon
Ed Wegman: George Mason

Disclaimer

This page does not reflect an official position of Virginia Commonwealth University.