This paper presents a new approach to spoken document information retrieval for spontaneous speech corpora. Classical approach to this problem is the use of an automatic speech recognizer (ASR) combined with standard information retrieval techniques, based on terms or n-grams. However, state-of-the-art large vocabulary continuous ASRs produce transcripts of spontaneous speech with a word error rate of 25% or higher, which is a drawback for retrieval techniques based on terms or n-grams. In order to overcome such a limitation, our method is based on a sequence alignment algorithm drawn from the field of bioinformatics to search
CitationComas, P.R., Turmo, J. "PHAST: Spoken document retrieval based on sequence alignment". 2008.
All rights reserved. This work is protected by the corresponding intellectual and industrial property rights. Without prejudice to any existing legal exemptions, reproduction, distribution, public communication or transformation of this work are prohibited without permission of the copyright holder. If you wish to make any use of the work not provided for in the law, please contact: email@example.com