Ed Hovy, visiting from ISI. Starting time moved to 1 (or later).
Named Entity Extraction from Arabic Text
Behrang Mohit
In this presentation, I will talk about our ongoing work on the task of Named Entity (NE) Extraction from Arabic text. NE Extraction is a challenging task for Arabic since the language does not have the important word capitalization feature (For English, using only word capitalization can identify NEs with a fairly high accuracy).
As a baseline, we consider porting a system that learns to classify English NEs (Collins and Singer, 1999) to Arabic. Under this framework, the system highlights NEs by a syntactic approach (using parse tree rules) and is followed by an unsupervised classification of names into different classes of named entities. Due to linguistic differences between English and Arabic, a direct application of this approach does not yield as good a result for Arabic as it did for English.
We are currently improving the coverage of this model by adding richer syntactic information. The early part of the talk will include an introduction to the Arabic language structure and some of the major challenges that exist in working with this language.
Magnitude Estimation is a technique originally used in psychophysics to measure judgments of sensory stimulation, for instance, brightness and loudness. However, studies in linguistics have shown that magnitude estimation can reliably be used to make other judgments of scale, such as how gramatical is a sentence. Magnitude estimation has been used to judge document relevance, plausibility of adjective-noun pairs, and politeness of spoken Japanese. In my talk, I will be giving an introduction to magnitude estimation and discussing its possible usefulness for the task of annotating the intensity of opinions, emotions, and other private states in text.