Sibyl, a factoid question answering system for spoken documents
Visualitza/Obre
Article principal (518,3Kb) (Accés restringit)
Sol·licita una còpia a l'autor
Què és aquest botó?
Aquest botó permet demanar una còpia d'un document restringit a l'autor. Es mostra quan:
- Disposem del correu electrònic de l'autor
- El document té una mida inferior a 20 Mb
- Es tracta d'un document d'accés restringit per decisió de l'autor o d'un document d'accés restringit per política de l'editorial
Cita com:
hdl:2117/17486
Tipus de documentArticle
Data publicació2012
Condicions d'accésAccés restringit per política de l'editorial
Tots els drets reservats. Aquesta obra està protegida pels drets de propietat intel·lectual i
industrial corresponents. Sense perjudici de les exempcions legals existents, queda prohibida la seva
reproducció, distribució, comunicació pública o transformació sense l'autorització del titular dels drets
Abstract
In this article, we present a factoid question-answering system, Sibyl, specifically tailored for question
answering (QA) on spoken-word documents. This work explores, for the first time, which techniques can be robustly adapted from the usual QA on written documents to the more difficult spoken document scenario.
More specifically, we study new information retrieval (IR) techniques designed or speech, and utilize several levels of linguistic information for the speech-based QA task. These include named-entity detection with phonetic information, syntactic parsing applied to speech transcripts, and the use of coreference resolution.
Sibyl is largely based on supervised machine-learning techniques, with special focus on the answer extraction step, and makes little use of handcrafted knowledge. Consequently, it should be easily adaptable to other
domains and languages. Sibyl and all its modules are extensively evaluated on the European Parliament Plenary Sessions English corpus, comparing manual with automatic transcripts obtained by three different
automatic speech recognition (ASR) systems that exhibit significantly different word error rates. This data belongs to the CLEF 2009 track for QA on speech transcripts. The main results confirm that syntactic
information is very useful for learning to rank question candidates, improving results on both manual and automatic transcripts, unless the ASR quality is very low. At the same time, our experiments on coreference
resolution reveal that the state-of-the-art technology is not mature enough to be effectively exploited for QA with spoken documents. Overall, the performance of Sibyl is comparable or better than the state-of-the-art on this corpus, confirming the validity of our approach.
CitacióComas, P.R.; Turmo, J.; Marquez, L. Sibyl, a factoid question answering system for spoken documents. "ACM transactions on information systems", 2012, vol. 30, núm. 3, p. 19:1-19:40.
ISSN1046-8188
Versió de l'editorhttp://dl.acm.org/citation.cfm?id=2328972&bnc=1
Fitxers | Descripció | Mida | Format | Visualitza |
---|---|---|---|---|
TOIS-ctm.pdf | Article principal | 518,3Kb | Accés restringit |