Semantic Similarity and Relatedness


About | Software | Data | Publications

About

Semantic similarity and relatedness measures quantify the degree to which two concepts are similar (e.g. liver-organ) or related (e.g. headache-aspirin). The automated discovery of groups of semantically similar or related concepts and terms is critical to improving the retrieval and clustering of biomedical and clinical documents, and the development of biomedical terminologies and ontologies.

Relatedness measures quantify the degree to which two words are associated with each other (scissors-paper). Similarity is a subset of relatedness and quantifies how alike two concepts are based on their location within an is-a hierarchy (car-vehicle). The score assigned to a term pair indicates the degree to which the terms are connected together through is-a relations. For example, "Lung Cancer" is-a type of "Disease" and therefore would receive a high similarity score, but "Lung Cancer" and "Coughing" would not receive a high similarity score, although the two are clearly related.

In this work, we are exploring taxonomy based metrics, corpus based metrics and hybrids.

Software

  • UMLS-Similarity -- a suite of Perl modules that implement a number of semantic similarity measures. The measures use the UMLS-Interface module to access the UMLS to generate similarity scores between concepts. Currently, this package includes programs that implement the similarity measures described by Leacock & Chodorow (1998), Wu & Palmer (1994), Nguyen & Al-Mubaid (2006), Rada, et. al. (1989), Jiang & Conrath (1997), Resnik (1995) and Lin (1998), and the relatedness measures proposed by Banerjee & Pedersen (2002) and Patwardhan (2003).
  • Data

  • MiniMayoSRS Semantic Relatedness Reference Standard
  • MayoSRS Semantic Relatedness Reference Standard
  • UMNSRS Semantic Similarity Reference Standard
  • UMNSRS Semantic Relatedness Reference Standard
  • Publications

  • Evaluating Semantic Similarity and Relatedness over the Semantic Grouping of Clinical Term Pairs. Bridget T. McInnes and Ted Pedersen. Journal of Biomedical Informatics. 2015 Apr; 54:329-336.
  • U-path: An undirected path-based measure of semantic similarity. Bridget T. McInnes, Ted Pedersen, Ying Liu, Serguei Pakhomov, and Genevieve B. Melton. To Appear in the Proceedings of the Annual Symposium of the American Medical Informatics Association (AMIA), November 2014.
  • Determining the Difficulty of Word Sense Disambiguation. Bridget T. McInnes and Mark Stevenson. Journal of Biomedical Informatics. 2014 Feb; 47:83-90.
  • Evaluating Measures of Semantic Similarity and Relatedness to Disambiguate Terms in Biomedical Text. Bridget T. McInnes and Ted Pedersen. Journal of Biomedical Informatics. 2013 December; 46(6):1116-24.
  • UMLS::Similarity: Measuring the Relatedness and Similarity of Biomedical Concepts. Bridget T. McInnes, Ted Pedersen, Ying Liu, Serguei Pakhomov, and Genevieve B. Melton. In the Proceedings of the North American Association of Computational Linguistics Demonstration Systems. June 10-12, 2013, Atlanta, Georgia.
  • Evaluating Semantic Relatedness and Similarity Measures with Standardized MedDRA Queries. Robert W. Bill, Ying Liu, Bridget T. McInnes, Genevieve B. Melton, Ted Pedersen, and Serguei Pakhomov. Appears in the Proceedings of the Annual Symposium of the American Medical Informatics Association (AMIA), November 2012.
  • Semantic Relatedness Study Using Second Order Co-occurrence Vectors Computed from Biomedical Corpora, UMLS and WordNet. Ying Liu, Bridget T. McInnes, Ted Pedersen, Serguei Pakhomov and Genevieve B. Melton. Appears in the Proceedings of the 2nd ACM SIGHIT International Health Informatics Symposium, January 28-30, 2012, Miami, FL.
  • Measuring the Similarity and Relatedness of Concepts in the Medical Domain : IHI 2012 Tutorial Overview. Ted Pedersen, Serguei Pakhomov, Bridget T. McInnes, and Ying Liu. Appears in the Proceedings of the 2nd ACM SIGHIT International Health Informatics Symposium, January 28-30, 2012, Miami, FL.
  • Towards a Framework for Developing Semantic Relatedness Reference Standards. Serguei V. Pakhomov, Ted Pedersen, Bridget T. McInnes, Genevieve B. Melton, Alexander Ruggieri, and Christopher G. Chute. Journal of Biomedical Informatics. 2011 44(2), 251-265.
  • Knowledge-based Method for Determining the Meaning of Ambiguous Biomedical Terms Using Information Content Measures of Similarity. Bridget T. McInnes, Ted Pedersen, Ying Liu, Serguei Pakhomov, and Genevieve B. Melton. Appears in the Proceedings of the Annual Symposium of the American Medical Informatics Association (AMIA). Oct. 2011, Washington DC.
  • Towards a Framework for Developing Semantic Relatedness Reference Standards. Serguei V. Pakhomov, Ted Pedersen, Bridget T. McInnes, Genevieve B. Melton, Alexander Ruggieri, and Christopher G. Chute. Journal of Biomedical Informatics. 2011 44(2), 251-265.
  • Semantic Similarity and Relatedness between Clinical Terms: An Experimental Study. Serguei Pakhomov, Bridget T. McInnes, Terrence Adam, Ying Liu, Ted Pedersen and Genevieve B. Melton. Appears in the Proceedings of the Annual Symposium of the American Medical Informatics Association. Nov 13-17, 2010, Washington, DC.
  • UMLS-Interface and UMLS-Similarity : Open Source Software for Measuring Paths and Semantic Similarity. Bridget T. McInnes, Ted Pedersen and Serguei V. Pakhomov. In the Proceedings of the Annual Symposium of the American Medical Informatics Association, Nov 14-18, 2009, San Francisco, CA


  • Last modified 25/08/2014