About |
Software |
Data |
Publications |
Presentations
About
Word Sense Disambiguation (WSD) is the task of automatically identifying the intended sense (or concept) of an ambiguous word based on the context in which the word is used. In our work, the set of possible meanings for a word are defined by Concept Unique Identifiers (CUIs) associated with a particular term in the Unified Medical Language System (UMLS). Thus, when performing WSD of biomedical terms, our more specific goal is to assign a term one of its possible CUIs based on its surrounding context. For example, the term cold could refer to the temperature (C0009264) or the common cold (C0009443), depending on the context in which it occurs.
Automatically identifying the intended concept of ambiguous words improves
the performance of clinical and biomedical applications such as medical
coding and indexing for quality assessment, cohort discovery and other
secondary uses of data. These capabilities are becoming essential tasks
due to the growing amount of information available to researchers, the
transition of health care documentation towards electronic health records,
and the push for quality and efficiency in health care.
In this work, we are exploring three types types of methods: supervised,
unsupervised and knowledge-based. Supervised methods use machine learning
algorithms (e.g. SVMs, Naive Bayes) to learn from manually tagged training
data; unsupervised methods rely on the distributional characteristics of
the terms in large unannotated corpora; and lastly, knowledge-based methods
use information from an external knowledge source.
Software
CuiTools -- A freely available suite of Perl programs for supervised and unsupervised WSD experiments.
UMLS-SenseRelate -- A freely available suite of Perl programs for exploring the use of semantic similarity and relatedness between UMLS concepts to disambiguate terms in biomedical text.
Data
NLM-WSD dataset
MSH-WSD dataset
Abbrev dataset
Conflate dataset
Publications
Challenges and Practical Approaches with Word Sense Disambiguation of
Acronyms and Abbreviations in the Clinical Domain. Sungrim Moon,
Bridget T. McInnes, and Genevieve B Melton. Healthcare informatics
research, 2015, 21 (1), 35-42.
Determining the Difficulty of Word Sense Disambiguation.
Bridget T. McInnes and Mark Stevenson.
Journal of Biomedical Informatics. 2014 Feb; 47:83-90.
Evaluating Measures of Semantic Similarity and Relatedness to Disambiguate
Terms in Biomedical Text.
Bridget T. McInnes and Ted Pedersen.
Journal of Biomedical Informatics. 2013 December; 46(6):1116-24.
Knowledge-based Method for Determining the Meaning of Ambiguous
Biomedical Terms Using Information Content Measures of Similarity.
Bridget T. McInnes, Ted Pedersen, Ying Liu, Serguei Pakhomov, and
Genevieve B. Melton.
Appears in the Proceedings of the Annual Symposium of the American
Medical Informatics Association (AMIA). Oct. 2011, Washington DC.
Exploiting MeSH Indexing in MEDLINE to Generate a Data set For
Word Sense Disambiguation.
Antonio Jimen-Yepes, Bridget T. McInnes and Alan R. Aronson.
BMC Bioinformatics. 2011 Jun 2;12(1):223.
Using Second-order Vectors in a Knowledge-based Method for Acronym
Disambiguation.
Bridget T. McInnes, Ted Pedersen, Ying Liu, Serguei Pakhomov, and
Genevieve B. Melton. Appears in the Proceedings of the Fifteenth
Conference on Computational Natural Language Learning (CoNLL 2011),
June 23-24, 2011, pp. 145 - 153, Portland, Oregon.
Collocation Analysis for UMLS Knowledge-based Word Sense Disambiguation
Antonio Jimen-Yepes, Bridget T. McInnes and Alan R. Aronson.
BMC Bioinformatics. 2011, 12(Suppl 3):S4.
Supervised and Knowledge-based Methods for Disambiguating
Terms in Biomedical Text using the UMLS and MetaMap.
Bridget T. McInnes. Doctor of Philosophy Dissertation,
Department of Computer Science, University of Minnesota,
Twin Cities, September, 2009.
An Unsupervised Vector Approach to Biomedical Term Disambiguation:
Integrating UMLS and Medline. Bridget T. McInnes.
In Proceedings of the Assocation for Computational
Linguistics Student Research Workshop (ACL-SRW) 2008.
Using UMLS Concept Unique Identifiers (CUIs) for Word Sense
Disambiguation in the Biomedical Domain. Bridget T. McInnes,
Ted Pedersen, and John Carlis. In Proceedings of the Annual
Symposium of the American Medical Informatics Association (AMIA),
pages 533-37, Nov. 2007, Chicago, IL.
Presentations
Right Arm and Right Atrium: How to distinguish between the two. Institue of Heath Informatics Seminar Series, University of Minnesota, March 2011.
Representing Meaning in Unsupervised WSD. Bridget T. McInnes.
National Library of Medicine's Brown Bag Series. September 2008.
Last modified 25/08/2014
|