February 11, 2009

[Using Zipf Frequencies as a Representativeness Measure in Statistical Active Learning of Natural Language] Onur Cobanoglu: Tuesday 2/17/09 @3-4 pm

Venue: Senott Square Rm 6329

Active learning has proven to be a successful strategy in quick development of corpora to be used in training of statistical natural language parsers.
A vast majority of studies in this field has focused on estimating informativeness of samples; however, representativeness of samples is another important criterion to be considered in active learning.

We present a novel metric for estimating representativeness of sentences, based on a modification of Zipf's Principle of Least Effort. Experiments on WSJ corpus with a wide-coverage parser show that our method performs always at least as good as and generally significantly better than alternative representativeness-based methods.

Posted by nlplab at 11:48 AM