Detecting Arguing and Sentiment in Meetings
This paper analyzes opinion categories like Sentiment and Arguing in meetings.
We first annotate the categories manually. We then develop genre-specific lexicons using interesting function word combinations for detecting the opinions. We analyze relations between dialog structure information and opinion expression in context of multi-party discourse. Finally we show that classifiers using lexical and discourse knowledge have significant improvement over baseline.
A major engineering challenge in statistical machine translation systems is the efficient representation of extremely large translation rulesets. In phrase-based models, this problem can be addressed by storing the training data in memory and using a suffix array as an efficient index to quickly lookup and extract rules on the fly. Hierarchical phrase-based translation introduces the added wrinkle of source phrases with gaps. Lookup algorithms used for contiguous phrases no longer apply and the best approximate pattern matching algorithms are much too slow, taking several minutes per sentence. I describe new lookup algorithms for hierarchical phrase- based translation that reduce the empirical computation time by nearly two orders of magnitude, making on-the-fly lookup feasible for source phrases with gaps. I will also discuss some novel applications of these algorithms.
Adam Lopez is a Ph.D. candidate in computer science at the University of Maryland, expecting to graduate in August 2007. His dissertation work focuses on statistical machine translation and his interests are in large-scale natural language processing and algorithms. Prior to graduate school, he worked as a software engineer at the IBM Corporation, after receiving his bachelor's degree in computer science from Duke University.
NOTE - THIS TALK WILL BE AT 12 NOON!
I introduce the overview of the researches in Toyota Central Labs. We are developing the dialogue system for the car navigation system and the home robot. I mainly work on the affective dialogue of the home robot, so that it would be closely connected with the Emotion Detection in Tutoring task and Opinion type analysis.
Speaker: Swapna Somasundaran
Room : Board room ( 6th floor - room 6329) Sennot Square
Time : 9:00 am
Practice talk for ICWSM-07.
Abstract
In this work, we explore the utility of attitude types for improving question answering (QA) on both web-based discussions and news data. We present a set of attitude types developed with an eye toward QA and show that they can be reliably annotated. Using the attitude annotations, we develop automatic classifiers for recognizing two main types of attitudes: sentiment and arguing. Finally, we exploit information about the attitude types of questions and answers for improving opinion QA with promising results.
Nigel G. Ward, Yaffa Al Bayyari, Rafael Escalante, Thamar Solorio
University of Texas at El Paso
12 noon, 5317 Sennott Square (ISP Forum)
Abstract: Good listeners generally produce back-channel feedback, and do so in a
language-appropriate way. Second language learners often lack this
skill. We present a training sequence which enables learners to
acquire a basic Arabic back-channel skill, namely, that of producing
feedback immediately after the speaker produces a sharp pitch
downslope. This training sequence includes an explanation, audio
examples, the use of visual signals to highlight occurrences of the
pitch downslope, auditory and visual feedback on learners' attempts to
produce the cue themselves, and feedback on the learners' performance
as they play the role of an attentive listener in response to one side
of a pre-recorded dialog. Preliminary experiments suggest that this
allows some learners to acquire this behavior.
The talk will also touch on the role of back-channels in various types
of dialog, methods for the discovery and quantification of
dialog-relevant prosodic cues, potential cross-cultural
misunderstandings of prosodic signals, the interplay between
meta-communication and the communication of content, and ways to
quantify the value of good turn-taking relative to other dialog skills.
Building an English-Iraqi Arabic Machine Translation System for Spoken Utterances with Limited Resources.
By: Behrang Mohit
This is a joint work with Jason Riesa, Kevin Knight and Daniel Marcu
The paper can be found here .
Speaker: Amruta Purandare
Purpose: Prelim Exam
Abstract: We analyze humorous spoken conversations from a classic comedy television show, FRIENDS, by examining acoustic-prosodic and linguistic features and their utility in automatic humor recognition. Using a simple annotation scheme, we automatically label speaker turns in our corpus that are followed by "laughs" as Humorous, and the rest as Non-Humorous. Our humor-prosody analysis reveals significant differences in prosodic characteristics (such as pitch, tempo, energy etc.) of humorous and non-humorous speech. Humor recognition was carried out using standard supervised learning classifiers, and shows promising results significantly above the baseline.
Speaker: Jan Wiebe
Title: Word Sense and Subjectivity
Speaker Swapna Somasundaran
Practice talk for ACL 2006 Workshop on Frontiers in Annotation
Abstract
This paper applies the categories from an opinion annotation scheme developed for monologue text to the genre of multiparty meetings. We describe modifications to the coding guidelines that were required to extend the categories to the new type of data, and present the results of an inter-
annotator agreement study. As researchers have found with other types of
annotations in speech data, interannotator agreement is higher when the
annotators both read and listen to the data than when they only read the transcripts. Previous work exploited combinations of prosodic and lexical clues to perform automatic detection of speaker emotion (Liscombe et al. 2003). Our findings suggest that doing so to recognize opinion categories would be a promising line of work.
Practice talk for EMNLP 2006. Here is the paper abstract:
In this paper we study the utility of discourse structure for spoken dialogue performance modeling. We experiment with various ways of exploiting the discourse structure: in isolation, as context information for other factors (correctness and certainty) and through trajectories in the discourse structure hierarchy. Our correlation and PARADISE results show that, while the discourse structure is not useful in isolation, using the discourse structure as context information for other factors or via trajectories produces highly predictive parameters for performance analysis.
Invited Speaker: Dominic Widdows.
Wednesday, May 10, 2006.
NOTE: Room change - Talk to be held in Room 5317
About the Author
Dominic Widdows is a Senior Research Engineer at MAYA Design, Inc , and author of Geometry and Meaning, a critically
acclaimed introduction to Text Mining for the general reader.
Abstract
Automatically Adapting Lexical Resources to the Biomedical Domain
(work with Beate Dorow, Adil Toumouh and Ahmed Lehireche)
After giving a brief introduction to the combination of lexicosyntactic patterns and graph theory, as used in recent years for lexical acquisition from corpora, the talk will focus on some recent experiments on using these techniques to adapt WordNet to the medical domain. Our basic technique is to extract relationships between terms using the Ohsumed corpus, a large collection of abstracts from PubMed, and to compare the relationships extracted with those that would be expected for medical terms, given the structure of the WordNet ontology. The linguistic methods involve the use of a variety of lexicosyntactic patterns, that enable us to extract pairs of coordinate noun terms, and also related groups of adjectives and nouns, using Markov clustering. This enables us in many cases to analyse ambiguous words and select the correct meaning for the biomedical domain. While results are often encouraging, the paper also highlights evident problems and drawbacks with the method, and outlines suggestions for future work. This will be described as part of the ongoing challenge to produce lexical semantic language models to complement traditional n-gram and syntactic language models.
The recent results in this talk are drawn from our upcoming LREC paper, and from Beate Dorow's recent PhD thesis.
Based on recent advancements in spoken dialogue technologies, researchers have begun implementing spoken dialogue systems in more complex domains. This work is part of our ongoing project that studies the challenges posed by the tutoring domain to spoken dialogue design. Our approach is to study dependencies between speech recognition problems and various dialogue factors. In our previous work, we found interesting results using this methodology: chaining effects for certain speech recognition problems (our Interspeech 2005 paper) and interactions with certainty, correctness and frustration/anger (paper submitted to ACL 2006).
In this presentation, I talk about our preliminary results that analyze the role of dialogue structure for understanding several dialogue phenomena.
Presentation by: Behrang Mohit
I will talk about the challenge of Speech Translation and the ways that we used a text translation system to build a speech translator. Specifically, our efforts were aimed at leveraging from the resources in Modern Standard Arabic (MSA) to enrich the translation and language models of a speech translation system for the Iraqi Arabic and English.
This is a joint work with Jason Riesa, Kevin Knight and Daniel Marcu.
By Behrang Mohit
Abstract: I present a framework to train a named entity (NE) tagger from a limited amount of annotated lexical resources. My approach leverages from other available resources such as syntactic and shallow semantic analyses. These resources are helpful in locating potential named entities that can be used to train a tagger with unsupervised approaches. My final goal was the development of the system for Arabic or other languages with limited resources. I first performed a proof of concept study on English as well. I report experimental results showing that there is a steady boost of classification accuracy when we use the extracted unla-beled data together with a small set of labeled training data. I also report the results of our effort on porting the system to the Arabic language. While the accuracy of the Arabic system is lower than the English system, our findings about the effects of different syntactic features hold for both languages
Eugene Charniak
Brown University
Friday, January 13, 2006
10:30am - SENSQ 5317
Refreshments at 10:00am
Hosted by Jan Wiebe
Abstract
Parsing is the problem of mapping a sentence (in, say, English) to a phrase structure. It is important because it gives us a first rough cut at meaning. During the 1990s there was a flurry of new results using statistical techniques that gave us our first robust parsers ready for every-day use. While there has been continued results since then, the practical parsers at the start of 2005 were no better than what has available in 2000. The first part of the talk will recap this ancient history.
The last 12 months, however have seen a dramatic turn-around, with error rates decreasing by 25%. The second and third parts of the talk describe the two techniques responsible for this state of affairs: discriminative reranking and self training. We also show that the latest results seem to be less corpus specific than the previous results. (That is, they carry over to text corpora reasonably different than those upon which they were trained.
Finally we discuss a new parsing paradigm, course-to-find parsing, and present some starry-eyed proposals for radically different views of parsing.
*****Jost is not able to visit us and give his talk due to Hurricane Wilma*****
Jost Schatzmann (http://mi.eng.cam.ac.uk/~js532/) is visiting on Nov 21. He
is going to give a talk on "Learning Dialogue Management Strategies with a
Simulated User" at 10:30AM in SS5317. You can sign up to talk to him by making comments to this message.
(by Joel Tetreault, talk for Wed Nov 09)
II'll be giving a talk on the work I have been doing on using Markov Decision Processes (MDP's)'s to determine good policies for our ITSPOKE tutoring dialogues. The problem with dialogue systems, and especially tutoring ones, is that there are a lot of possible actions a tutor can take depending on the student state. For example, if a student appears frustrated and uncertain answering the last question the tutor poses, we may want to ask the student an easier question or give a hint the next turn. Or if the student has been doing really well lately and is breezing through our tutoring session, we may want to give him or her a harder question and possibly also ease back on the amount of feedback to give. Given the fact there are a wide range of features to describe the student state, making a hand-tuned policy for every possible student state is simply too laborious of a task to undertake. What we propose is to use MDP's to learn the best policies for a computer to take in a system. In this talk I will present preliminary results of our research.
By Rebecca Hwa and Carol Nichols
ABSTRACT:
Parsing is an important component in many NLP systems. While
recent advances in statistical methods and machine learning have made
it possible to build highly accurate parsers, the success depends on
the quantity and quality of annotated training data, which may not
always be available. Arabic is an interesting case because it is
diglossic (i.e., the language exists in two forms: a "prestigious"
variety for formal writings (Modern Standard Arabic) and colloquial
varieties that are primarily spoken and are not standardized (Arabic
dialects)). There is much on-going NLP work in building resources for
MSA, but resources and NLP research for Arabic dialect are still at an
infancy stage. Because there are no parallel written corpora between
any of the dialects and any other language, including MSA, most of the
techniques developed for parsing that exploit supervised machine
learning do not apply.
In this talk, we describe our framework for leveraging existing
resources and tools for MSA in order to parse Arabic dialects. In
particular, we focus on building a bilexicon between MSA and the
Levantine dialect and building a Levantine part-of-speech tagger by
adapting from a MSA tagger. We will also present some preliminary
findings in building a Levantine parser from these resources.
This work was conducted as a part of the Parsing Arabic Dialect team
at the 2005 JHU Summer Workshop on Language Engineering.
Theresa Wilson
In this talk, I will present a new approach to phrase-level sentiment analysis that first determines whether an expression is neutral or polar and then disambiguates the polarity of the polar expressions. With this approach, our system is able to automatically identify the contextual polarity for a large subset of sentiment expressions, achieving results that are significantly better than baseline.
Practice talk for EMNLP
Chenhai Xi
The lack of annotated data is an obstacle to the development of many natural language processing applications; the problem is especially severe when the data is non-English. Previous studies suggested the possibility of acquiring resources for non-English languages by bootstrapping from high quality English NLP tools and parallel corpora; however, the success of these approaches seems limited for dissimilar language pairs. In this paper, we propose a novel approach of combining bootstrapped resource with a small amount of manually
annotated data. We compare the proposed approach with other bootstrapping methods in the context of training a Chinese Part-of-Speech tagger. Experimental results show that our proposed approach achieves a significant improvement over EM and self-training and systems that are only trained on manual annotations.
This is a practice talk for EMNLP 2005.
Noah Smith
Johns Hopkins University
Title: Contrastive Estimation for Unsupervised Sequence Modeling
Abstract:
Conditional random fields (Lafferty, McCallum, and Pereira, 2001) are
quite effective at sequence labeling tasks like shallow parsing (Sha
and Pereira, 2003) and named-entity extraction (McCallum and Li,
2003). CRFs are *log-linear*, allowing the incorporation of arbitrary
features into the model. Clever new features are one way to improve
performance; clever objective functions are another (see, for
instance, recent work on max-margin parsing by Taskar, Klein, et al.,
2004).
We have developed a method to do both, in the unlabeled data
framework. That is, we use log-linear models capable of exploiting
new features, and a new class of objective functions: contrastive
estimation (CE). CE can be intuitively understood as exploiting
implicit negative evidence and is computationally efficient (unlike
log-linear EM). In fact, CE generalizes EM and a variety of other
objective functions. By engineering classes of implicit negative
evidence, CE can be adapted for specific applications.
We describe applications to two natural language learning
problems---POS tagging of unlabeled text with a dictionary (Merialdo,
1994) and dependency grammar induction (Klein and Manning, 2004)---and
show how contrastive estimation outperforms EM (with the same feature
sets), is more robust to loss of domain knowledge (dictionary
degradation or uninformative initialization), and can recover by
modeling additional, nonorthogonal features.
This is joint work with Jason Eisner and was presented at ACL 2005 and
the IJCAI 2005 Workshop on Grammatical Inference Applications.
Schedule:
10:15 -- 10:30 Rebecca SENSQ 5421
10:30 -- 11:30 Behrang, Carol, Chenhai
11:30 -- 12:30 Lunch (Rebecca, Diane, Jan, Mihai)
12:30 -- 2:00 Talk
2:00 -- 2:30 Mihai
2:30 -- 3:00 Theresa
3:00 -- 3:30 Amruta, Hua
3:30 -- 4:00 Swapna, Paul
4:00 -- 4:30 Rebecca
Mihai: This will be my practice talk for the paper I will present at INTERSPEECH/EUROSPEECH.
Title:
Interactions between Speech Recognition Problems and User Emotions
Abstract:
Understanding how speech recognition problems affect the interaction with the user is a topic of great interest for the spoken dialogue community. We examine the dependencies between speech recognition problems in adjacent turns. We also examine the dependencies between speech recognition problems and student emotions within a turn and in adjacent turns. We apply Chi Square (χ2) analysis to a corpus of speech-based computer tutoring dialogues to discover these dependencies. We find that rejections are followed by more rejections than expected if there was no dependency between rejections, and that misrecognitions are followed by more misrecognitions than expected. We also find a strong dependency between recognition problems in the previous turn and user emotion in the current turn: after a system rejection there are more emotional user turns than expected. Surprisingly, in our data, we find no relationship between user emotions and recognition problems within a turn nor between previous turn user emotions and current turn recognition problems.
Diane: This will be my practice talk for Sigdial
Title:
Using Bigrams to Identify Relationships Between Student Certainness States and Tutor Responses in a Spoken Dialogue Corpus
Abstract:
We use n-gram techniques to identify dependencies between student affective states of certainty and subsequent tutor dialogue acts, in an annotated corpus of human-human spoken tutoring dialogues. We first represent our dialogues as bigrams of annotated student and tutor turns. We next use chi square analysis to identify dependent bigrams. Our results show dependencies between many student states and subsequent tutor dialogue acts. We then analyze the dependent bigrams and suggest ways that our current computer tutor can be enhanced to adapt its dialogue act generation based on these dependencies
Title: Natural Language Generation for Intelligent Tutoring Systems: A Case Study
Speaker: Dr. Barbara DiEugenio, University of Illinois at Chicago
When: Thursday, July 7, 10:00am
Where: Sennott Square 5317, University of Pittsburgh
Abstract:
---------
It is still an open question whether Natural Language (NL) interaction
between students and an Intelligent Tutoring System (ITS) improves
learning, and if yes, what specific features of the NL interaction are
responsible for the improvement. To investigate this issue, we developed
two different feedback generation engines for an ITS that teaches students
to troubleshoot complex systems. We systematically evaluated the two NL
interfaces in a three way comparison that included the original ITS as
well. We found that the version of the ITS which intuitively produces the
best language does engender the most learning. Specifically, it appears
that presenting feedback at a more abstract level is responsible for the
improvement.
This will be my practice talk for the paper I will present at the AAAI Workshop on Question Answering in Restricted Domains.
Title:
Improving Question Answering for Reading Comprehension Tests by Combining Multiple Systems
Abstract:
Most work on reading comprehension question answering systems has focused on improving performance by adding complex natural language processing (NLP) components to such systems rather than by combining the output of multiple systems. Our paper empirically evaluates whether combining the outputs of seven such systems submitted as the final projects for a graduate level class can improve over the performance of any individual system. We present several analyses of our combination experiments, including performance bounds, impact of both tie-breaking methods and ensemble size on performance, and an error analysis. Our results, replicated using two different publicly available reading test corpora, demonstrate the utility of system combination via majority voting in our restricted domain question answering task.
Carol: abstract TBA
Theresa: Annotating Attributions and Private States
A Landscape Model analysis, adopted from the text processing
literature, was run on transcripts of tutoring sessions, and a
technique developed to count the occurrence of key physics points in
the resulting connection matrices. This point-count measure was found
to be well correlated with learning.
ABSTRACT
In this paper we take advantage of the
availability of a large amount of manually
annotated data to analyze the applicability
of Co-training (Blum and Mitchell,
1998) for predicting emotions with Spoken
Dialogue Data. The manual annotations
yielded the Upper Bounds of Cotraining
that show the trade-off between
the size of the initial train set and the accuracy
obtained by the final train set generated
with this method when adding examples
based on agreements, disagreements
and confidence of the predictions. We
found that in the ideal case, disagreements
would lead to build a more robust system,
but there is a gap between the Upper
Bounds and the behavior of Co-training
that lays on the amount of incorrect examples
added by the system. Our best
results achieve a maximum accuracy improvement
of 18.51% over the majority
class baseline, and 1.49% of improvement
over the accuracy of the initial train set.
We examine correlations between dialogue behaviors and learning in tutoring, using two corpora of spoken tutoring dialogues: a human-human corpus and a human-computer corpus. To formalize the notion of dialogue behavior, we manually annotate our data using a tagset of student and tutor dialogue acts relative to the tutoring domain. A unigram analysis of our annotated data shows that student learning is correlated both with the dialogue acts of the tutor and with the dialogue acts of the student. A bigram analysis of our data shows that learning is also correlated with joint patterns of tutor and student dialogue acts. Our results show that while the use of dialogue act n-grams is a promising method for examining correlations between dialogue behavior and learning, specific findings can differ in human versus computer tutoring, with the latter better motivating adaptive strategies for implementation. In addition, we also show that although many of our students experience problems with speech recognition, such problems do not negatively correlate with student learning.
Annotating clinical conditions in reports is necessary for compiling reference standards against which automated indexing systems are compared. However, the task is vague and produces substantial variation among annotators. For example, the sentence "Patient has severe left-sided chest pain" could result in several different annotations, including "pain," "chest pain," "left-sided chest pain," and "severe left-sided chest pain." We created guidelines detailing medical and linguistic instructions about what text to include in annotations of clinical concepts and measured agreement between two annotators using the guidelines. I will present our results and describe future plans for the guidelines.
Beatriz will talk about her research and results of applying Cotraining and Self Training in Spoken Dialogue Data.
Dr. Dan Gildea from University of Rochester will be giving a talk on his recent work on March 18th, Noon.
Syntactic Structure and Statistical Machine Translation
Given that statistical methods have revolutionized both
natural language parsing and machine translation, it may
seem surprising that most current statistically-based
translation systems make no use of syntactic structure.
I will describe work on models of translation that aim
to fill this gap, presenting results for models that
make use of syntactic information provided for one or
both languages, as well as models that infer structure
directly from parallel bilingual text. I will also
describe the use of syntactic information for the
automatic evaluation of machine-produced translations.
Please sign up for a slot to meet with Dan
9:45 -- 10:00 Rebecca (SENSQ 5421)
10:00 -- 10:30 Behrang (SENSQ 5503)
10:30 -- 11:00 Paul, Swapna, and Jason (SENSQ 5422)
11:00 -- 11:15 Rebecca Part Deux (SENSQ 5421)
11:15 -- 11:45 Daqing and Hua (Cheng) (SENSQ 5111)
11:45 -- 12:00 Talk prep
12:00 -- 1:15 Talk (SENSQ 5317)
1:15 -- 2:45 Lunch (with Rebecca, Lillian, Oren, Bo, Diane?, Mihai)
2:45 -- 3:15 Diane (SENSQ 5105)
3:15 -- 3:45 Amruta and Hua (Ai) (SENSQ 5108)
3:45 -- 4:15 Theresa (SENSQ 5422)
4:15 -- 4:45 Mihai and Beatriz (SENSQ 5420)
Dinner at 6pm (with Jan, Joel, Rebecca)
One of the purposes of our NLP meeting is to have an opportunity to read
and discuss new research. Art will briefly present, then lead discussion
of the paper: "Toward a mechanistic psychology of dialogue" by Martin
Pickering and Simon Garrod. This paper proposes a mechanism of automatic
alignment, by which two dialogue partners come to use similar semantic,
syntactic, lexical and phonological representations. The resulting
alignment simplifies the production and comprehension of dialogue.
Art will forward a PDF to the NLP mailing list. If you don't get your copy, please let him know.
December 13: NLP meeting.
I'll be using the NLP meeting to give a practice talk for my dissertation defense the 17th. All are welcome!
Tatiana Gavrilova
Visiting Fulbright Scholar
Informal Ontology Design
Bo Pang
Cornell University
Title: A sentimental education: Sentiment analysis using
subjectivity summarization based on minimum cuts.
Abstract
Sentiment analysis, which seeks to identify the viewpoint(s)
underlying a text span, has recently attracted a great deal
of attention. Automatic analysis of such information can be
helpful for business intelligence applications, recommender systems,
and editorial sites. One example application is to determine
a review's sentiment polarity (``thumbs up'' or ``thumbs down'').
In particular, we consider the domain of movie reviews, which was
shown to be difficult for the polarity classification task in
previous work. We propose a novel machine-learning method to
first extract the subjective portions of the documents and then
apply text-categorization techniques to the resulting extracts
rather than to the entire reviews. Discarding the objective
portions of the review helps prevent the polarity classifier
from considering irrelevant or even potentially misleading text;
in addition, subjective extracts created in this process can be
presented to users as summaries of subjective content.
Our results show that the subjective extracts we create compactly
and accurately represent sentiment information: they are as informative
as the original documents while at the same time being 40% shorter.
Depending on the choice of downstream polarity classifier, using these
extracts can even lead to highly statistically significant improvement
for the polarity classification task. Also, we explore extraction
methods based on a minimum cuts formulation, which provides an efficient
and effective means for integrating inter-sentence-level contextual
information with traditional bag-of-words features.
This is joint work with Lillian Lee.
Tessa Warren (Psychology and LRDC), syntactic complexity and reference, details TBA
Paul Hoffmann
Title: Polarity in Context
Abstract: This talk describes an annotation scheme for marking the polarity of ons and expressive subjective elements in context and presents results of an annotation study.
Oren Kurland
Cornell University
Title: Corpus structure, language models, and ad hoc information retrieval
Abstract:
The fundamental principle of the language-modeling approach to ad hoc
information retrieval is that given a query, documents will be ranked
according to their estimated language models' similarity to that of the
query.
Most previous work on the language-modeling approach to ad hoc information
retrieval, however, focuses on document specific-characteristics, and
therefore doesn't take into account the structure of the surrounding corpus.
We propose a novel algorithmic framework in which information provided by
document-based language models is enhanced by the incorporation of
information drawn from clusters of similar documents.
In this talk, we will first present the framework and describe a suite of new
algorithms that are natural instantiations of it. Even the simplest typically
outperforms the standard language-modeling approach. We will then discuss
connections to other work such as latent-variable models and present
experimental results which show that our best-performing algorithms post
improvements with respect to state of the art language-modeling based
algorithms over various data corpora.
This is joint work with Lillian Lee.
Title: Opinions In Question Answering: Current Research Directions.
Abstract: This talk will describe current research directions in the ARDA AQUAINT project "Opinions in Question Answering". We will focus on our current research in extracting "opinion frames" to represent subjective expressions in text.
Ed Hovy, visiting from ISI. Starting time moved to 1 (or later).
Named Entity Extraction from Arabic Text
Behrang Mohit
In this presentation, I will talk about our ongoing work on the task of Named Entity (NE) Extraction from Arabic text. NE Extraction is a challenging task for Arabic since the language does not have the important word capitalization feature (For English, using only word capitalization can identify NEs with a fairly high accuracy).
As a baseline, we consider porting a system that learns to classify English NEs (Collins and Singer, 1999) to Arabic. Under this framework, the system highlights NEs by a syntactic approach (using parse tree rules) and is followed by an unsupervised classification of names into different classes of named entities. Due to linguistic differences between English and Arabic, a direct application of this approach does not yield as good a result for Arabic as it did for English.
We are currently improving the coverage of this model by adding richer syntactic information. The early part of the talk will include an introduction to the Arabic language structure and some of the major challenges that exist in working with this language.
Magnitude Estimation is a technique originally used in psychophysics to measure judgments of sensory stimulation, for instance, brightness and loudness. However, studies in linguistics have shown that magnitude estimation can reliably be used to make other judgments of scale, such as how gramatical is a sentence. Magnitude estimation has been used to judge document relevance, plausibility of adjective-noun pairs, and politeness of spoken Japanese. In my talk, I will be giving an introduction to magnitude estimation and discussing its possible usefulness for the task of annotating the intensity of opinions, emotions, and other private states in text.
Daqing He
In this talk I will talk about our participation (University of Maryland and John's Hopkins University team) to TREC High Accuracy Retrieval of Document (HARD) track in both 2003 and 2004. I will first intruduce the HARD experiment setting, then talk about interactive relevance feedback in HARD framework. I will also talk about building passage retrieval module for identifying sub-document unit that are highly relevant to user's queries. Our passage retrieval module was among the best in the track last year, but it is still far from matching to human performance. Finally, I will talk about some other interesting areas that HARD is trying to explore beyond plain batch news article document retrievals.
Mihai Rotaru
In this talk I will present our ongoing work in developing features and models for detecting student emotional states, given only information available during a spoken tutoring dialogue. Prior research has primarily focused on the use of turn-level prosodic features as predictors. We extend the turn-level prosodic feature set used in our previous studies, and additionally apply these same set of features at the word level. Even under a simplifying word-level emotion model, our preliminary results show an improvement in prediction using word level features compared to using turn level features.
In this meeting, I will talk about my Master's Thesis that I have
recently finished from the University of Minnesota. This talk will
essentially be same as my thesis defense. Thesis report and defense slides
are available online at - http://www.cs.pitt.edu/~amruta/pubs.html
Title: "Unsupervised Word Sense Discrimination by Clustering Similar
Contexts."
Abstract: Word sense discrimination is the problem of identifying
different contexts that refer to the same meaning of an ambiguous
word. For example, given multiple contexts that include the word
'sharp', we would hope to discriminate between those that refer to an
intellectual sharpness versus those that refer to a cutting sharpness.
Our methodology is based on the strong contextual hypothesis of Miller
and Charles (1991), which states that "two words are semantically
related to the extent that their contextual representations are
similar."
This thesis presents corpus--based unsupervised solutions that
automatically group together contextually similar instances of a word
as observed in a raw text. We do not utilize any manually created or
maintained knowledge--rich resources such as dictionaries, thesauri
or annotated corpora. As a result, our approach is well suited to the
fluid and dynamic nature of word meanings. It is also portable to
different domains and languages, and scales easily to larger samples
of text.
The overall objective of this thesis is to study the effect of various
feature types, context representations and clustering methods on the
accuracy of sense discrimination. We also apply dimensionality
reduction techniques to capture conceptual similarities among the
contexts and don't just rely on the surface forms of words in the text.
This is Diane's practice talk for the ACL conference.
Abstract: We examine the utility of speech and lexical features for automatically predicting student emotions in human-computer spoken tutoring dialogues. We first annotate studentturns for negative, neutral, positive and mixed emotions.
We then extract acoustic-prosodic features from the speech signal, and lexical
items from the transcribed or recognized speech. We compare the results of
machine learning experiments using these features alone or in combination to
predict various categorizations of the annotated emotions. Our best results yield a 19-36% relative improvement in error reduction over a baseline. Finally, we compare our results with predicting emotion in human-human dialogues.
Carol Nichols
Karina Ivanetich
We are sharing this date to present our DMP summer research.
Carol's project is creating a test bed for collecting word alignment data from bilingual speakers of English and Chinese for use by a machine translator. This program will also gather data on how sure the people providing the word alignments are about their alignments and how long it took them, and this information would be useful to experimenters studying machine translation and word alignments.
Karina's Abstract:
Some languages (such as English) are rich in annotated resources, while many other languages experience a shortage or absence of annotated data. In addition, human annotation, although highly accurate, is costly in terms of both time and money. Researchers have created systems that utilize well-annotated languages in order to project POS tags onto other languages. However, the result has often been less than accurate. Researchers David Yarowksy and Grace Ngai have added to traditional projection algorithms, and for English-to-French projections, have obtained much higher levels of accuracy.
In my work here this summer, I will attempt to replicate their results, this time for English-to-Chinese projections. Since translation issues differ between these two sets of languages, I am expecting that I will need to improve the model to better serve the English-to-Chinese projections. My presentation will discuss this proposal as well as current progress.
Theresa Wilson, presenting joint work with Janyce Wiebe and Rebecca Hwa.
This will be a practice talk for AAAI 2004.
Abstract: There has been a recent swell of interest in the automatic identification and extraction of opinions and emotions in text.
In this paper, we present the first experimental results classifying the strength of opinions and other types of subjectivity and classifying the subjectivity of deeply nested clauses. We use a wide range of features, including new syntactic features developed for opinion recognition. In 10-fold cross-validation experiments using support vector regression, we achieve improvements in mean-squared error over baseline ranging from 57\% to 64\%.
Kappa is the primary statistic used in NLP research to evaluate agreement among raters. However, there are many problems with the kappa statistic. In this talk I will discuss kappa and how to account for problems not addressed by kappa with different statistics. I will also describe how to calculate a generalizability coefficient that measures the reliability or reproducability of a reference standard created from human raters. I will use data from a current study we are evaluating to help understand how all the agreement statistics can help answer the question, "How good is my reference standard?"
Rebecca Hwa
This is not so much a talk but a round-table discussion that I'd like to host. With the conference season fast approaching, it might be good for us to get together and trade ideas on giving presentations.
Diane Litman and Kate Forbes-Riley
We compare the learning gains from tutoring with spoken versus typed dialogue. In one experiment, the tutor was a human. In the other experiment, the tutor was a tutoring system. The main results of our study are that changing the modality from text to speech caused large differences in the learning gains, time and superficial dialogue characteristics of human tutoring, but for computer tutoring, it made less difference. (This is material that will be presented at the Intelligent Tutoring Systems Conference).
Jan Wiebe will describe a new project entitled Opinions in Question Answering. The project is part of the ARDA AQUAINT Question Answering program, and is joint with Claire Cardie at Cornell and Ellen Riloff at Utah. The goals of the project are to extract detailed information about opinions from text and then create summary representations of the opinions expressed about a topic in one or many documents.
Beatriz Maeireizo Tokeshi
On May 26th 2004, Beatriz will give a small talk about the poster submitted to ACL 2004, as a result of the research done IN the CS PhD course 2002 (Research Experience in CS).
ABSTRACT
Natural Language Processing applica-tions often require large amounts of an-notated training data, which are expensive to obtain. In this paper we in-vestigate the applicability of Co-training to train classifiers that predict emotions in spoken dialogues. In order to do so, we have first applied the wrapper ap-proach with Forward Selection and Naïve Bayes, to reduce the dimensionality of our feature set. Our results show that Co-training can be highly effective when a good set of features are chosen.
The NLP Group will continue its weekly meetings throughout the summer. This week, we will meet to set up the talk schedule for the rest of the term.
Mark Core
University of Edinburgh
Monday May 10 , 10:00
731 LRDC
This work is the first systematic investigation of initiative in
human-human tutorial dialogue. We studied initiative management in two
dialogue strategies: didactic tutoring and Socratic tutoring. We
hypothesized that didactic tutoring would be mostly tutor-initiative while
Socratic tutoring would be mixed-initiative, and that more student
initiative would lead to more learning (i.e., task success for the
tutor). Surprisingly, students had initiative more of the time in the
didactic dialogues (21% of the turns) than in the Socratic dialogues (10%
of the turns), and there was no direct relationship between student
initiative and learning. However, Socratic dialogues were more interactive
than didactic dialogues as measured by percentage of tutor utterances that
were questions and percentage of words in the dialogue uttered by the
student, and interactivity had a positive correlation with learning.
(The above is his EACL 2003 talk. Since that was a short talk,
if time permits he might also present some research that he is
presenting at HLT-NAACL...
Robustness versus Fidelity in Natural Language Understanding
A number of issues arise when trying to scale-up natural language
understanding (NLU) tools designed for relatively simple domains (e.g.,
flight information) to domains such as medical advising or tutoring where
deep understanding of user utterances is necessary. Because the subject
matter is richer, the range of vocabulary and grammatical structures is
larger meaning NLU tools are more likely to encounter out-of-vocabulary
words or extra-grammatical utterances. This is especially true in medical
advising and tutoring where users may not know the correct vocabulary and
use common sense terms or descriptions instead. Techniques designed to
improve robustness (e.g., skipping unknown words, relaxing grammatical
constraints, mapping unknown words to known words) are effective at
increasing the number of utterances for which an NLU sub-system can produce
a semantic interpretation. However, such techniques introduce additional
ambiguity and can lead to a loss of fidelity (i.e., a mismatch between the
semantic interpretation and what the language producer meant). To control
this trade-off, we propose semantic interpretation confidence scores akin
to speech recognition confidence scores, and describe our initial attempt
to compute such a score in a modularized NLU sub-system.)
----
Short bio:
Mark received his Ph.D. from the University of Rochester under the supervision
of Len Schubert. The subject of his dissertation was dialog parsing; his
dialog parser identified speech repairs as well as the dialogue acts of
utterances. Starting in 2000, Mark has been a researcher at the University
of Edinburgh, working with Johanna Moore on the BEETLE tutorial dialogue
system. He built a natural language understanding module for BEETLE using
the CARMEL workbench, adding features such as unknown word handling and
semantic-confidence-score calculation. The second area of his research is
dialogue annotation and analysis, looking at phenomena such as initiative,
and dialogue acts and games.
Joel Tetrault
University of Rochester
Friday May 7, 1:30
731 LRDC
In a spoken dialog system, the job of a reference resolution module is to
identify noun phrases and resolve them to entities evoked in the dialogue.
This involves finding antecedents for pronouns such as "that" or "they" and
resolving definite noun phrases such as "the two hospitals" or "the ambulance
here." Though reference is just one part of the overall interpretation of
a sentence, it is a very important piece because failure to resolve the
entities in a sentence correctly can lead to an incorrect interpretation
of a sentence and thus an erroneous response to the user.
Many approaches to reference resolution, specifically pronoun resolution,
have relied heavily on syntactic and surface features. While these
methods are able to perform very well, such as resolving as much as 80% of
the pronouns in a large corpus correctly, the "20% gap" has been hard
to overcome because these pronoun require additional information on top of
syntactic features for resolution. In this talk I present work that
incorporates discourse structure and semantic features into a pronoun
resolution algorithm to improve performance over two types of corpora: a
newspaper domain (Penn Treebank) and human-human spoken dialogue.
Short Bio:
Joel Tetreault is in his final year of his PhD in Computer Science at the
University of Rochester. He received his bachelor's degree from Harvard
University in 1998 and Master's from Rochester in 2000. His main
interest is Natural Language Processing. He has done work in reference
resolution, discourse processing, spoken dialogue systems, and information
retrieval techniques for detecting affect.
Diane will give a practice talk (about 20 minutes) of our HLT-NAACL paper:
==================================================================
TITLE
-----
Predicting Emotion in Spoken Dialogue from Multiple Knowledge Sources
==================================================================
AUTHORS
-------
Kate Forbes-Riley and Diane Litman
==================================================================
ABSTRACT
--------
We examine the utility of multiple types of turn-level and contextual
linguistic feature s for automatically predicting student emotions in
human-human spoken tutoring dialogues. We first annotate student
turns in our corpus for negative, neutral and positive emotions. We
then automatically extract features representing acoustic-prosodic and
other linguistic information from the speech signal and associated
transcriptions. We compare the results of a variety of machine
learning experiments using different feature sets to predict the
annotated emotions. Our best performing feature set contains both
acoustic-p rosodic and other types of linguistic features, extracted
from both the current turn and a context o f previous student turns.
This feature set yields a prediction accuracy of 84.75%, which is a
44% relative improvement in error reduction over a baseline. Our
results suggest that the intelligent tutoring spoken dialogue system
we are developing can be enhanced to automatically predict and adapt
to student emotions.
On April 7, Jan will summarize the AAAI Spring Symposium on
Exploring Attitude and Affect in Text: Theories and Applications
Regina Barzilay will be a guest speaker in the Department of Computer Science colloquium series. She will be here on both 4/1 and 4/2.
NOTE: The talk is on Thurs. afternoon (4/1), not Friday morning
>
> What: Learning to Model Text Structure
> When: 4/1 at 3:30pm, refreshments at 3
> Where: SENSQ 5317/9
>
> Talk abstract:
>
> The natural language processing community has struggled for years to
> develop computational models of text structure. Such models are essential
> both for interpretation of human-written text and for evaluation of
> machine-generated text. Applications such as text summarization and
> machine translation would greatly benefit from such models.
>
> In this talk, I will present our first steps towards learning to model
> text structure. I will describe two models that are induced from a large
> collection of unannotated texts. The first model captures the notion of
> text cohesion by considering connectivity patterns characteristic of
> well-formed texts. These patterns are inferred from a matrix that
> combines distributional and syntactic information about text entities. The
> second model captures the content structure of texts within a specific
> domain, in terms of the topics the texts address and the order in which
> these topics appear. I will present an effective method for learning
> content models, utilizing a novel adaptation of algorithms for Hidden
> Markov Models. To conclude my talk, I will show how these text models can
> be effectively integrated into natural language generation and
> summarization systems.
>
> This is joint work with Mirella Lapata and Lillian Lee.
>
>
Theresa Wilson and Janyce Wiebe
We present work investigating the topic dependence of words and phrases that have been used in automatic opinion and sentiment recognition. This work is based on machine learning experiments in opinion recognition using topics for cross validation instead of random splits of the data. We find that the clues from previous work are very robust to changes in topic. Surprisingly, while bag-of-words features are not as robust, they do not degrade as much as expected. The best results are obtained when all clues are combined.
Kate Forbes-Riley and Diane Litman
We present an annotation scheme for student emotions in tutoring dialogues.
Analyses of our scheme with respect to interannotator agreement and predictive accuracy
indicate that our scheme is reliable in our domain, and that our emotion
labels can be predicted with a high degree of accuracy.
We discuss issues concerning the implementation of emotion
prediction and adaptation in the computer tutoring dialogue system we are developing.
Beatriz Maeireizo-Tokeshi presents the work she did while interning in Japan.
Sarah Kura, Jan Wiebe, Theresa Wilson discuss the latest development in their work on annotating attitude types.
Diane Litman, Mihai Rotaru, Behrang Mohit, Yanna Shen, Art Ward present the results for the reading comprehension question-answering projects for the Fall 2003 NLP Class.
Rebecca Crowley and Kevin Mitchell
Medical Reports are an important and fertile area for Natural Language
Processing. Information from these free-text documents would be extremely
valuable if it could be automatically extracted and combined with other
data. However, Information Extraction from medical text poses significant
challenges. We describe the early development of a system for Information
Extractipn from Surgical Pathology Reports - a document which contains
essential data related to Cancer diagnosis and prognosis. It includes a
GATE implementation of NegEx - Wendy Chapman's algorithm for negation
detection. We will spend the first half of the talk describing our system
and detailing an evaluation of the Negation tagger compared to a
human-annotated corpus of negations. In the second half of the talk -
we'll show you a set of human annotated examples of attribute:value pairs
and shamelessly solicit advice on how to best extract them.
Speaker: Yanna Shen
Abstract:
Question Answering has become a growing interest in the NLP area in recent
years. But Chinese Question Answering systems still lack behind, so I am
interested in doing some work in Chinese Question Answering systems.
This work was done with other fellow students in NLP Laboratory at
Northeastern University, China. We just borrowed some ideas from several
QA papers and built a small QA demo. Then we tried to utilize these ideas
into the Chinese QA system.
In this talk, I will discuss the design of the demo, and give a few points
in building a Chinese QA system.
Speaker: Behrang Mohit
Semantic Extraction is an NLP task that pertains to the assignment of
semantic bindings to short units of text (usually sentences). NLP problems
such as Information Extraction, Question Answering Systems and Text
Classification Systems could benefit from Semantic Extraction. We have
used two manually-built knowledge bases (WordNet and FrameNet) to automate
Semantic Extraction.
In my presentation, I will give an overview of the FrameNet project and
then talk about my work with Srini Narayanan on Semantic Extraction. I
presented this work last summer as a short paper in NAACL-HLT 2003. The
paper can be downloaded from:
http://www.cs.pitt.edu/~behrang/MohitNarayananHLT2003.pdf
Speakers: Diane Litman and Kate Forbes
Abstract:
We investigate the automatic classification of student emotional
states in a corpus of human-human spoken tutoring dialogues. We
first annotated student turns in this corpus for negative, neutral and
positi ve emotions. We then automatically extracted acoustic and
prosodic features from the student speech, and compared the results of
a variety of machine learning algorithms that use 8 different feature
sets to predict the annotated emotions. Our best results have an
accuracy of 80.53% and show 26.28% relative improvement over a
baseline. These results suggest that th e intelligent tutoring spoken
dialogue system we are developing can be enhanced to automatically
predict and adapt to student emotional states.
This will be an early practice talk for a paper that
will be presented in December at ASRU.
Speaker: Wendy Chapman
Abstract
Biosurveillance systems use electronic patient medical information to
monitor for possible natural or bioterristic outbreaks. Currently, the only
information used by these systems is a patient's triage chief complaint,
which is a short phrase describing the patient's reason for coming to an
emergency room. To monitor for specific diseases or syndromes like Severe
Acute Respiratory Syndrome (SARS) or pneumonia, more specific clinical
information needs to be gathered. That information is in free-text patient
reports.
I will describe a project I embarked on this summer at the National Library
of Medicine in which I applied an NLP indexing tool called MetaMap that was
created for the literature to the task of identifying respiratory findings
from emergency department reports.
* 06/19: Kate Forbes (Annotating Emotion in Spoken Tutoring Dialogues: Working Session)
* 06/18: Lin Ma (Predicting Medical Reasoning Codings in Pathology Protocols using Natural Language Features: Master's Project Presentation)
* 04/28: Mihai Rotaru (Practice CoNLL03 talk)
* 04/14: Janyce Wiebe and Theresa Wilson (Learning Extraction Patterns for Subjective Expressions)
* 03/31: Kate Forbes (Preliminary Results from the ITSPOKE Spoken Tutorial Dialogue Corpus)
* 03/17: Mihai Roturu (Comparing Command, Normal, and Hyperarticulated Speech)
* 03/10: Wendy Chapman
* 02/03: Janyce Wiebe (Improving Subjectivity Classification using Features Learned from Extraction Patterns)
* 11/14: Theresa Wilson (A First Exploration of Subjective Language in Spoken Dialogue)
* 10/31: Diane Litman and Scott Silliman (Spoken Dialogue for the Why2 Intelligent Tutoring System)
* 10/17: Mihai Rotaru (Typicality and Natural Language Learning)
* 10/03: Wendy Chapman (NLP in Medicine)
* 09/26: Theresa Wilson (Opinion Annotation in Newspaper Articles)