April 07, 2007

[NEWS] Best Paper Award

Congratulations to Kate Forbes-Riley, Mihai Rotaru, Diane Litman, and Joel Tetrault, for getting a Best Paper Award (Late-Breaking News category) at NAACL-HLT 2007 for "Exploring Affect-Context Dependencies for Adaptive System Development"
Posted by nlplab at 11:19 AM

November 30, 2006

[news] Congratulations to Greg Nicholas

Greg received an Honorable Mention for CRA's Outstanding Undergraduate Award for 2007!
Posted by nlplab at 02:01 PM

March 11, 2006

[News] Congratulations Art and Behrang

Congratulations to Art and Behrang, who were awarded Mellon Fellowships for the 2006-2007 academic year. Andrew Mellon Predoctoral Fellowships are awarded to students of exceptional ability and promise who are enrolled or wish to enroll at the University of Pittsburgh in programs leading to the Ph.D. in various fields of the humanities, the natural sciences and the social sciences.
Posted by nlplab at 03:44 PM

February 23, 2006

[News] Poster Award

Congratulations to Amruta for winning the Graduate People's Choice Poster Award at the 2006 CS Day! (The poster was called "Concept-Level Topic Analysis of Tutoring Dialogs".)
Posted by nlplab at 09:39 PM

June 04, 2005

[NEWS] Best Paper Award

This year's Speech Communications Best Paper Award is for paper:
Julia Hirschberg, Diane Litman, Marc Swerts, "Prosodic and Other Cues
to Speech Recognition Failures", Speech Communication, 43(1-2):155-176, 2004.

Posted by nlplab at 01:57 PM

May 19, 2005

[NEWS] Congratulations Mihai

Congratulations to Mihai on passing his comprehensive exam today!
His reading lists, writeups, and presentation can be found
online.

Posted by nlplab at 01:56 PM

March 23, 2005

[LDC} new corpora

In this month's newsletter, the LDC would like to announce the availability of a new LDC Online service and the release of three new corpora.

------------------------------------------------------------------------

The LDC is pleased to announce that an improved LDC Online service is now available. LDC Online can be accessed at the following url:

https://online.ldc.upenn.edu/login.html

Organizations that hold 2005 Membership in the LDC will be able to perform text searches on our entire English Gigaword corpus. This corpus is a comprehensive archive of newswire text data that has been acquired over several years by the LDC. Current members will also be able to access the American English Spoken Lexicon (AESL). AESL contains pronunciations in individual audio files for more than 50,000 of the most common words in English

Even if your organization is not a current member, you can access LDC Online through a guest account. As a guest, an LDC online user will be able to access the American English Spoken Lexicon.

We will offer periodic updates to LDC Online to include new corpora and search functions. Please check in with us often as we anticipate this will be an exciting offering.

------------------------------------------------------------------------

ACE 2004 Multilingual Training Corpus contains the complete set of English, Arabic and Chinese training data for the 2004 Automatic Content Extraction (ACE) technology evaluation. The objective of the ACE program is to develop automatic content extraction technology to support automatic processing of human language in text form.
Sites were evaluated on system performance in six areas: Entity Detection and Recognition (EDR), Entity Mention Detection (EMD), EDR Co-reference, Relation Detection and Recognition (RDR), Relation Mention Detection (RMD), and RDR given reference entities. All tasks were evaluated in three languages: English, Chinese and Arabic.ACE 2004 Multilingual Training Corpus is distributed on one CD-ROM.
2005 Subscription Members will automatically receive two copies of this corpus. 2005 Standard Members may request a copy as part of their 16 free membership corpora. Nonmembers may license this data for US$3000.

*

Chinese News Translation Text Part 1 supports the development of automatic machine translation systems, the LDC was sponsored to solicit English translations for a single set of Chinese source materials.

The source Chinese text and its English translations were selected and translated in different LDC projects. A total of about 474K Chinese characters were selected from two sources, namely Xinhua and AFP, and translation services were provided by seven translation agencies. Each Chinese news story was translated once. Chinese News Translation Text Part 1 is distributed via ftp.

2005 Subscription Members will automatically receive two copies of this corpus on CD-ROM. 2005 Standard Members may request a copy as part of their 16 free membership corpora. Nonmembers may license this data for US$1500.

*

Discourse Treebank aims to define a descriptively adequate data structure for representing discourse coherence structures.. This project also investigates the impact of discourse coherence structures on other linguistic processes and natural language applications (e.g. anaphor resolution,summarization, information retrieval), to develop and test discourse parsing algorithms. The data consists of 135 texts from AP Newswire and Wall Street Journal, annotated with coherence relations. The source for data is TIPSTER Complete (LDC93T3A). Discourse Graphbank is distributed via ftp

2005 Subscription Members will automatically receive two copies of this corpus on CD-ROM. 2005 Standard Members may request a copy as part of their 16 free membership corpora. Nonmembers may license this data for US$200.

------------------------------------------------------------------------

If you need further information, or would like to inquire about membership to the LDC, please email ldc@ldc.upenn.edu or call +1 215 573 2175.


Linguistic Data Consortium Phone: (215) 573-1275
University of Pennsylvania Fax: (215) 573-2175
3600 Market St., Suite 810 ldc@ldc.upenn.edu
Philadelphia, PA 19104 http://www.ldc.upenn.edu

Posted by hwa at 03:04 PM

March 13, 2005

[NEWS] Hot Article

Hirschberg, Litman and Swerts 2004 is currently the hottest article in Speech Communication.

Posted by nlplab at 09:22 PM

February 18, 2005

[NEWS] Congratulations Carol and Mihai

At CS Day on February 18, 2005, Carol Nichols won the best undergraduate award, and Mihai Rotaru won the graduate research competition award. Great job to both of you!

Posted by nlplab at 09:18 PM

December 01, 2004

[news] Congratulations to Carol Nichols

Carol received an Honorable Mention for CRA's Outstanding Undergraduate Award for 2005! Great job, Carol!

Posted by hwa at 01:13 PM

September 16, 2004

[news] Conference Info Page

We now have a conference page which has basic information about important dates of the conference. The page is at: http://nlp.cs.pitt.edu/dates.htm

I have also marked the blog for two of the conference deadlines (ACL & AAAI). Please feel free to add information to the conference page and also the blog as new information about conferences arrives.

Posted by behrang at 03:29 PM

September 13, 2004

[news] new meeting time

Meeting time for Fall term has changed to be Mondays at 12:30 -- 2pm.
Next meeting will be on Sept. 13th.

Posted by nlplab at 01:05 PM

September 01, 2004

[NEWS] Fall 2004 Kickoff Meeting

We will have an organizational meeting Wednesday, September 1, 12:15, Room 6329.

Posted by litman at 10:00 AM

June 28, 2004

Presentations

I have created an internal page which will host presentations and related materials for our weekly meetings. The page address is:

http://nlp.cs.pitt.edu/presentations/

Posted by behrang at 11:56 AM

June 05, 2004

Conference Papers Repository

We now have a repository for NLP related conference papers.

Please note that due to copy right restrictions, this page is only accessible within the relevant subdomains of pitt.edu (cs, isp, lrdc, etc.).

If you're not able to access the page from your domain (inside Pitt), please contact Behrang.

Posted by behrang at 04:20 PM

June 04, 2004

[NEWS] Natural Language Tutoring Article and Press Release

See the "Teaching Computers to Teach Like Humans" article in the June 7, 2004 Pitt Chronicle!

http://www.discover.pitt.edu/media/pcc/comps_like_humans.html

Also visit Pitt's website (www.pitt.edu) to see a press release from June 3 2004 about our research.
The text of the release is also below.


FOR IMMEDIATE RELEASE

June 3, 2004

Contact: Patricia Lomando White

412-624-9101

laer@pitt.edu

Pitt Researchers Developing Computers That Teach Like Humans

Natural language recognition key to improved tutoring by machines

PITTSBURGH—While new federal education rules emphasizing testing and standards have fueled a tutoring boom, relatively few pupils enjoy access to effective but costly one-on-one teaching. In an effort to spread the intellectual wealth, scientists at the University of Pittsburgh’s Learning Research and Development Center (LRDC) are working to bring individual instruction to all students.
With $2.5 million from the National Science Foundation (NSF), principal investigator (PI) Kurt VanLehn, a Pitt computer science professor and LRDC senior scientist, is working to build less expensive computer tutors as good as their more expensive human counterparts. Looking specifically at the best ways to teach and learn physics, VanLehn and his colleagues are probing both tutor and student behavior.
“The computer tutors available in stores today just tell you if your answer is right or wrong,” VanLehn said. “With a human tutor, though, students can do much more,” including discussing their reading with the tutor and getting help solving longer, more complex problems.
A major difference between human and computer tutors has been that only human tutors understand unconstrained natural language—the conversational, open-ended give-and-take that can often flummox the smartest software.
Today, commercial educational technology involves two response formats: multiple choice and mathematical formulas. If all goes as planned, a tutoring program should be on the market in five to 10 years that can handle open-ended questions and analyze the students’ text or speech responses.

The LRDC team’s basic approach to improving computer tutoring is to simply study and learn from interactions between humans and computer tutors. As more effective dialogue strategies are identified, they will be incorporated into a natural language-based tutoring system.

LRDC’s new tutoring venture builds on a recently completed five-year, $5 million NSF-funded Center for Interdisciplinary Research on Constructive Learning Environments, led by VanLehn. The center developed several prototypes of natural language tutoring systems both at LRDC and at Carnegie Mellon University. The center also developed tools for building more such tutors.

Capitalizing on LRDC’s ability to attract and link researchers from a wide variety of disciplines, the computer tutor study includes researchers specializing in the cognitive psychology of human tutoring, the technology of natural language processing, and the design of effective tutoring systems.
The Co-PIs are Diane J. Litman, a Pitt computer science professor and LRDC research scientist; Michelene Chi, a Pitt psychology professor and LRDC senior scientist; Pamela W. Jordan, a LRDC research associate; and Carolyn P. Rose, a research scientist at Carnegie Mellon.
The group’s grant is administered under NSF’s Information Technology Research program, which supports innovative multidisciplinary research that extends the frontiers of information technology, leads to new and unanticipated technologies, creates revolutionary applications, or provides alternative approaches to complete important activities.
###
6/3/04/tmw

Posted by litman at 10:57 AM

May 26, 2004

[LDC] latest available corpora and news

Anyone interested in obtaining any of these corpora, please leave a comment.


**
** Introducing: The LDC Institute **

** Membership Year 2004 in Review **

LDC2004S04
** 2002 NIST Speaker Recognition Evaluation (SRE) *

*LDC2004T11
** Arabic Treebank: Part 3 v.1.0 * *

LDC2004S05
** ISL Meeting Corpus Speech Part 1 ***

*LDC2004T10
** ISL Meeting Corpus Transcripts Part 1 *

***

In this month's update, the Linguistic Data Consortium (LDC) would like to introduce the LDC Institute, review Membership Year 2004, and announce the availability of four new corpora.


*

(1) For the past two years, the LDC has hosted the LDC Institute, a seminar series on issues in language data and database creation. The goals of the series are to create a forum to communicate experience in data collection, standards, and annotation, and to work with researchers and others who may be interested in LDC data or who may wish to contribute new data to the archives. Past presentations topics have ranged from information extraction from biomedical texts to the Pennsylvania Sumerian Dictionary project to interfaces for parser and dictionary access.
We would like to invite the LDC community to learn more about this seminar series by visiting the LDC Institute project page. Future newsletters will contain additional information on specific presentations.

*

(2) Each year the LDC strives to provide a rich and diverse array of corpora for LDC members and nonmembers. Membership Year 2004 is shaping up to be no different! In the last few months, we have released 9 publications including treebanks in Arabic and Chinese, English meeting data, and Czech broadcast news. Namely, these corpora are:


LDC2004T02 Arabic Treebank: Part 2 v 2.0
LDC2004T05 Chinese Treebank Version 4.0
LDC2004S01 Czech Broadcast News Speech
LDC2004T01 Czech Broadcast News Transcripts
LDC2004S02 ICSI Meeting Speech
LDC2004T04 ICSI Meeting Transcripts
LDC2004L01 Klex: Finite-State Lexical Transducer for Korean
LDC2004T03 Morphologically Annotated Korean Text
LDC2004T09 TIDES Extraction (ACE) 2003 Multilingual Training Data

For further information on each of the above, please visit:

http://www.ldc.upenn.edu/Catalog/ByYear.jsp#2004

*


(3) The 2002 NIST Speaker Recognition Evaluation is part of an ongoing series of yearly evaluations conducted by NIST. These evaluations provide an important contribution to the direction of research efforts and the calibration of technical capabilities. They are intended to be of interest to all researchers working on the general problem of text independent speaker recognition.
The 2002 NIST Speaker Recognition Evaluation main data was extracted from the Switchboard Cellular part 2. The extended data task used two phases of Switchboard II, phases 2 and 3. This evaluation also included the first multi-modal task, using data from the FBI voice database. There are a total of 9153 speech files in sphere format, for a total of ~156 hours. 2002 NIST Speaker Recognition Evaluation is distributed on 2 DVD.

For further information, including a link to the 2002 NIST Speaker Recognition Evaluation website, please visit:

http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2004S04

Institutions that have membership in the LDC for the 2004 Membership Year will be able to receive this corpus free of charge. Nonmembers may license this data for US$1000.

*

(4) Arabic Treebank: Part 3 v 1.0 is the third part of a corpus of 1,000,000 words of Arabic Treebank, designed to support language research and development of language technology for Modern Standard Arabic. This corpus includes 600 stories from the An Nahar News Agency. There are a total of 340,281 words (counting non-Arabic tokens such as numbers and punctuation) in the 600 files - one story per file. New features of annotation include complete vocalization (including case endings), lemma IDs, and more specific POS tags for verbs and particles.

The corpus contains 293,035 Arabic-only word tokens (prior to the separation of clitics), of which 290,842 (99.25%) were provided with an acceptable morphological analysis and POS tag by the morphological parser, and 2,193 (0.75%) were items that the morphological parser failed to analyze correctly. Arabic Treebank: Part 3 v 1.0 is distributed on 1 CD.

For further information, including online documentation, please visit:

http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2004T11

Institutions that have membership in the LDC for the 2004 Membership Year will be able to receive this corpus free of charge. Nonmembers may license this data for US$3000.


*

(5) ISL Meeting Speech Part 1 is the first subset of the ISL Meeting Corpus (112 meetings). It contains 18 meetings collected at the Interactive Systems Laboratories at Carnegie Mellon University. The recorded meetings were either natural meetings where participants needed to meet in the real world, or artificial meetings, which were designed explicitly for the purposes of data collection but still had real topics and tasks. The duration of the meetings in this corpus ranges from 8 to 64 minutes and averages at 34 minutes. Word-level orthographic transcriptions are available as ISL Meeting Transcripts Part 1 .

ISL Meeting Speech Part 1 includes 105 speech files, for a total of approximately 10 hours of meeting speech. There are a total of 31 unique speakers in the corpus. Meetings involved anywhere from 3 to 9 participants, averaging at 5. The corpus contains a significant proportion of non-native English speakers, varying in fluency. ISL Meeting Speech Part 1 is distributed on 2 DVD.

For further information, including a link to the ISL Meeting Room project page, please visit:

http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2004S05

Institutions that have membership in the LDC for the 2004 Membership Year will be able to receive this corpus free of charge. Nonmembers may license this data for US$1500.


*


(6) The ISL Meeting Transcripts Part 1 is the corresponding transcription for ISL Meeting Speech Part 1 . This corpus consists of 19 word-level transcripts of 18 meetings, time synchronized to digitized audio recordings. There are approximately 116200 word tokens and 5850 unique word types in the transcripts.

Transcriptions were prepared by means of the TransEdit transcription application. This application was developed for the transcription of multi-channel recordings and displays a synchronized multi-track view for all channels of a meeting with listening and segmentation function for each single channel separately. ISL Meeting Transcripts Part 1 is distributed by ftp transfer.

For further information, including a sample transcript, please visit:

http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2004T10

Institutions that have membership in the LDC for the 2004 Membership Year will be able to receive this corpus free of charge. Nonmembers may license this data for US$500.


*


If you need additional information or would like to inquire about membership in the LDC, please send email to or call (215) 573-1275.

----------------------------------------------------------------------------------------------------
Linguistic Data Consortium Phone: 1 (215) 573-1275
University of Pennsylvania Fax: 1 (215) 573-2175
3600 Market St., Suite 810 email: ldc@ldc.upenn.edu
Philadelphia, PA 19104-2653 www: http://www.ldc.upenn.edu


Posted by hwa at 01:23 PM | Comments (1)

April 04, 2004

February 17, 2004

GSR openings for upcoming terms

The Natural Language Processing (NLP) group at the University of
Pittsburgh has several GSR positions to fill, beginning Summer or Fall
2004.

Interested graduate students in Computer Science or Intelligent
Systems are invited to peruse our webpages (nlp.cs.pitt.edu), and to
apply directly to one or more of the following NLP faculty members,
each of whom is hiring:

Professor Rebecca Hwa (hwa@cs.pitt.edu), for the project
    "Semi-supervised Learning for Multilingual Processing"
    (www.cs.pitt.edu/~hwa/semi.htm)
Professor Diane Litman (litman@cs.pitt.edu), for the projects
    "Monitoring Student State in Tutorial Spoken Dialogue" and
    "Adding Spoken Language to a Text-Based Dialogue Tutor"
    (www.cs.pitt.edu/~litman/itspoke.html)
Professor Janyce Wiebe (wiebe@cs.pitt.edu), for the projects
    "Improving Subjectivity Analysis to Achieve High-Precision Information
    " Extraction" and "Opinions in Automatic Question Answering"
    (www.cs.pitt.edu/~wiebe/projects.html)

To apply, send a statement of interest and your vita. Please send a separate
application to each faculty member whose project(s) you are interested in.
For full consideration, applications should be received no later than
March 1, 2004.

Posted by hwa at 09:29 PM

New LDC corpora

I received announcements for two new LCD corpora (info below). If you would like the lab to get either one (or both), please post a comment to this message.

Posted by hwa at 09:12 PM