Invited Speaker: Dominic Widdows.
Wednesday, May 10, 2006.
NOTE: Room change - Talk to be held in Room 5317
About the Author
Dominic Widdows is a Senior Research Engineer at MAYA Design, Inc , and author of Geometry and Meaning, a critically
acclaimed introduction to Text Mining for the general reader.
Abstract
Automatically Adapting Lexical Resources to the Biomedical Domain
(work with Beate Dorow, Adil Toumouh and Ahmed Lehireche)
After giving a brief introduction to the combination of lexicosyntactic patterns and graph theory, as used in recent years for lexical acquisition from corpora, the talk will focus on some recent experiments on using these techniques to adapt WordNet to the medical domain. Our basic technique is to extract relationships between terms using the Ohsumed corpus, a large collection of abstracts from PubMed, and to compare the relationships extracted with those that would be expected for medical terms, given the structure of the WordNet ontology. The linguistic methods involve the use of a variety of lexicosyntactic patterns, that enable us to extract pairs of coordinate noun terms, and also related groups of adjectives and nouns, using Markov clustering. This enables us in many cases to analyse ambiguous words and select the correct meaning for the biomedical domain. While results are often encouraging, the paper also highlights evident problems and drawbacks with the method, and outlines suggestions for future work. This will be described as part of the ongoing challenge to produce lexical semantic language models to complement traditional n-gram and syntactic language models.
The recent results in this talk are drawn from our upcoming LREC paper, and from Beate Dorow's recent PhD thesis.