By Behrang Mohit
Abstract: I present a framework to train a named entity (NE) tagger from a limited amount of annotated lexical resources. My approach leverages from other available resources such as syntactic and shallow semantic analyses. These resources are helpful in locating potential named entities that can be used to train a tagger with unsupervised approaches. My final goal was the development of the system for Arabic or other languages with limited resources. I first performed a proof of concept study on English as well. I report experimental results showing that there is a steady boost of classification accuracy when we use the extracted unla-beled data together with a small set of labeled training data. I also report the results of our effort on porting the system to the Arabic language. While the accuracy of the Arabic system is lower than the English system, our findings about the effects of different syntactic features hold for both languages
Eugene Charniak
Brown University
Friday, January 13, 2006
10:30am - SENSQ 5317
Refreshments at 10:00am
Hosted by Jan Wiebe
Abstract
Parsing is the problem of mapping a sentence (in, say, English) to a phrase structure. It is important because it gives us a first rough cut at meaning. During the 1990s there was a flurry of new results using statistical techniques that gave us our first robust parsers ready for every-day use. While there has been continued results since then, the practical parsers at the start of 2005 were no better than what has available in 2000. The first part of the talk will recap this ancient history.
The last 12 months, however have seen a dramatic turn-around, with error rates decreasing by 25%. The second and third parts of the talk describe the two techniques responsible for this state of affairs: discriminative reranking and self training. We also show that the latest results seem to be less corpus specific than the previous results. (That is, they carry over to text corpora reasonably different than those upon which they were trained.
Finally we discuss a new parsing paradigm, course-to-find parsing, and present some starry-eyed proposals for radically different views of parsing.