About |
Software |
Data |
Publications
About
Semantic similarity and relatedness measures quantify the degree to
which two concepts are similar (e.g. liver-organ) or related (e.g.
headache-aspirin). The automated discovery of groups of semantically
similar or related concepts and terms is critical to improving the
retrieval and clustering of biomedical and clinical documents, and
the development of biomedical terminologies and ontologies.
Relatedness measures quantify the degree to which two words are
associated with each other (scissors-paper). Similarity is a
subset of relatedness and quantifies how alike two concepts are
based on their location within an is-a hierarchy (car-vehicle).
The score assigned to a term pair indicates the degree to which the terms
are connected together through is-a relations. For example, "Lung
Cancer" is-a type of "Disease" and therefore would receive a high
similarity score, but "Lung Cancer" and "Coughing" would not receive
a high similarity score, although the two are clearly related.
In this work, we are exploring taxonomy based metrics, corpus based
metrics and hybrids.
Software
UMLS-Similarity -- a suite of Perl modules that implement a number of semantic similarity measures. The measures use the UMLS-Interface module to access the UMLS to generate similarity scores between concepts. Currently, this package
includes programs that implement the similarity measures described by Leacock & Chodorow (1998), Wu & Palmer (1994), Nguyen & Al-Mubaid (2006), Rada, et. al. (1989), Jiang & Conrath (1997), Resnik (1995) and Lin (1998), and the
relatedness measures proposed by Banerjee & Pedersen (2002) and Patwardhan (2003).
Data
MiniMayoSRS Semantic Relatedness Reference Standard
MayoSRS Semantic Relatedness Reference Standard
UMNSRS Semantic Similarity Reference Standard
UMNSRS Semantic Relatedness Reference Standard
Publications
Evaluating Semantic Similarity and Relatedness over the Semantic
Grouping of Clinical Term Pairs. Bridget T. McInnes and Ted Pedersen.
Journal of Biomedical Informatics. 2015 Apr; 54:329-336.
U-path: An undirected path-based measure of semantic similarity.
Bridget T. McInnes, Ted Pedersen, Ying Liu, Serguei Pakhomov, and
Genevieve B. Melton. To Appear in the Proceedings of the Annual
Symposium of the American Medical Informatics Association
(AMIA), November 2014.
Determining the Difficulty of Word Sense Disambiguation.
Bridget T. McInnes and Mark Stevenson.
Journal of Biomedical Informatics. 2014 Feb; 47:83-90.
Evaluating Measures of Semantic Similarity and Relatedness to Disambiguate
Terms in Biomedical Text.
Bridget T. McInnes and Ted Pedersen.
Journal of Biomedical Informatics. 2013 December; 46(6):1116-24.
UMLS::Similarity: Measuring the Relatedness and Similarity of
Biomedical Concepts.
Bridget T. McInnes, Ted Pedersen, Ying Liu, Serguei Pakhomov, and
Genevieve B. Melton.
In the Proceedings of the North American Association
of Computational Linguistics Demonstration Systems.
June 10-12, 2013, Atlanta, Georgia.
Evaluating Semantic Relatedness and Similarity Measures with Standardized
MedDRA Queries.
Robert W. Bill, Ying Liu, Bridget T. McInnes, Genevieve B. Melton, Ted
Pedersen, and Serguei Pakhomov. Appears in the Proceedings of the Annual
Symposium of the American Medical Informatics Association
(AMIA), November 2012.
Semantic Relatedness Study Using Second Order Co-occurrence Vectors
Computed from Biomedical Corpora, UMLS and WordNet. Ying Liu, Bridget
T. McInnes, Ted Pedersen, Serguei Pakhomov and Genevieve B. Melton.
Appears in the Proceedings of the 2nd ACM SIGHIT International Health
Informatics Symposium, January 28-30, 2012, Miami, FL.
Measuring the Similarity and Relatedness
of Concepts in the Medical Domain : IHI 2012 Tutorial Overview.
Ted Pedersen, Serguei Pakhomov, Bridget T. McInnes, and Ying Liu.
Appears in the Proceedings of the 2nd ACM SIGHIT International
Health Informatics Symposium, January 28-30, 2012, Miami, FL.
Towards a Framework for Developing Semantic Relatedness Reference
Standards.
Serguei V. Pakhomov, Ted Pedersen, Bridget T. McInnes, Genevieve B.
Melton, Alexander Ruggieri, and Christopher G. Chute.
Journal of Biomedical Informatics. 2011 44(2), 251-265.
Knowledge-based Method for Determining the Meaning of Ambiguous
Biomedical Terms Using Information Content Measures of Similarity.
Bridget T. McInnes, Ted Pedersen, Ying Liu, Serguei Pakhomov, and
Genevieve B. Melton.
Appears in the Proceedings of the Annual Symposium of the American
Medical Informatics Association (AMIA). Oct. 2011, Washington DC.
Towards a Framework for Developing Semantic Relatedness Reference
Standards.
Serguei V. Pakhomov, Ted Pedersen, Bridget T. McInnes, Genevieve B.
Melton, Alexander Ruggieri, and Christopher G. Chute.
Journal of Biomedical Informatics. 2011 44(2), 251-265.
Semantic Similarity and Relatedness between Clinical Terms: An
Experimental Study. Serguei Pakhomov, Bridget T. McInnes, Terrence
Adam, Ying Liu, Ted Pedersen and Genevieve B. Melton. Appears
in the Proceedings of the Annual Symposium of the American Medical
Informatics Association. Nov 13-17, 2010, Washington, DC.
UMLS-Interface and UMLS-Similarity : Open Source Software for
Measuring Paths and Semantic Similarity. Bridget T. McInnes,
Ted Pedersen and Serguei V. Pakhomov. In the Proceedings of the Annual
Symposium of the American Medical Informatics Association,
Nov 14-18, 2009, San Francisco, CA
Last modified 25/08/2014
|