NaniBD: a set of tools for transcribing and validating speech databases

Nogueiras Rodríguez, Albino; Moreno Bilbao, M. Asunción

Visualitza/Obre

NaniBD a set of tools for transcribing and validating speech databases (86,83Kb)

Veure estadístiques d'ús d'UPCommons

Estadístiques de LA Referencia / Recolecta

Cita com:

Mostra el registre d'ítem complet

Nogueiras Rodríguez, Albino

Moreno Bilbao, M. Asunción

Tipus de documentText en actes de congrés

Data publicació1998

EditorEuropean Language Resources Association (ELRA)

Condicions d'accésAccés obert

Attribution-NonCommercial-NoDerivs 3.0 Spain

Llevat que s'hi indiqui el contrari, els continguts d'aquesta obra estan subjectes a la llicència de Creative Commons : Reconeixement-NoComercial-SenseObraDerivada 3.0 Espanya

Abstract

This paper describes NaniBD, a set of tools designed for transcribing and validating speech databases, developed at the Signal Processing Group (GPS) of the Department of Signal Theory and Communications of the Polytechnic University of Catalonia (TSC/UPC). The main purpose of its development was the need of a revision system in order to validate and annotate the Spanish corpus of SpeechDat (II) in the speech processing environment available at GPS. Despite of this, NaniBD is designed as a general-purpose system that might fit any other database, idiom or speech processing system. So far, the system has been used to revise some 200,000 speech files from three different corpora. In this paper we will focus our attention to the actual implementation used in the transcription of a SpeechDat (II) specifications compatible Catalonian corpus. 1000 speakers, each of them uttering 44 files, compose this corpus. In this application, we use speech-noise detection, automatic recognition of spontaneous prompts, digit and letter to text translation and access to an external database in order to minimise the amount of time spent by human operators in the revision procedure.

CitacióNogueiras, A.; Moreno, M. NaniBD: a set of tools for transcribing and validating speech databases. A: International Conference on Language Resources and Evaluation. "LREC 1998: 1st International Conference on Language Resources and Evaluation: proceedings". Granada: European Language Resources Association (ELRA), 1998.

URIhttp://hdl.handle.net/2117/22969

Versió de l'editorhttp://www.coli.uni-saarland.de/~regneri/courses/res4cl-07/papers/Nog98b.pdf

Col·leccions

Veure estadístiques d'ús d'UPCommons

Mostra el registre d'ítem complet

Fitxers	Descripció	Mida	Format	Visualitza
NaniBD a set of ... ating speech databases.pdf	NaniBD a set of tools for transcribing and validating speech databases	86,83Kb	PDF	Visualitza/Obre

UPCommons. Portal del coneixement obert de la UPC

NaniBD: a set of tools for transcribing and validating speech databases

Visualitza/Obre

Explora