Iterative Denoising


Overview

Data mining in the contexts of text and network traffic analysis is beset by multiple difficulties when the datasets are large and in high dimension. From a performance perspective, it can be prohibitively expensive to search in a high dimensional space. Moreover, visualizing and comprehending such spaces can be difficult for the user. Also, complex datasets often have local relationships of interest, findings that might be missed with global searches. Finally, an unsupervised context limits the ability of a user to analyze the corpus without first applying some structure to the data. Traditional data mining approaches are limited due to algorithms that make data distributional assumptions, are not scalable due to the Curse of Dimensionality, and do not provide intuitive ways for a user to visualize high-dimensional data. Our approach to overcome these difficulties is called Iterative Denoising, a methodology that allows the user to explore the data and extract meaningful, implicit, and previously-unknown information from a large unstructured corpus.

Publications

Presentations

Collaborators

Links



Disclaimer

This page does not reflect an official position of Virginia Commonwealth University.