<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" xmlns:dc="http://purl.org/dc/elements/1.1/" version="2.0">
  <channel>
    <title>DSpace Community:</title>
    <link>http://hdl.handle.net/2117/3746</link>
    <description />
    <pubDate>Thu, 23 May 2013 03:22:29 GMT</pubDate>
    <dc:date>2013-05-23T03:22:29Z</dc:date>
    <itunes:owner>
      <itunes:email>webmaster.bupc@upc.edu</itunes:email>
      <itunes:name>Universitat Politècnica de Catalunya. Servei de Biblioteques i Documentació</itunes:name>
    </itunes:owner>
    <itunes:explicit>no</itunes:explicit>
    <itunes:keywords />
    <item>
      <title>Multi-dialectal Spanish speech recognition</title>
      <link>http://hdl.handle.net/2117/19167</link>
      <description>Title: Multi-dialectal Spanish speech recognition
Authors: Nogueiras Rodríguez, Albino; Caballero Galeote, Mónica; Moreno Bilbao, M. Asunción
Abstract: Spanish is a global language, spoken in a big number of different countries with a big dialectal variability‥ This paper deals with the suitability of using a single multi-dialectal acoustic modeling for all the Spanish variants spoken in Europe and Latin America. This paper deals with the suitability of using a single multi-dialectal acoustic modeling for all the Spanish variants spoken in Europe and Latin America. The objective is two fold. First, it allows to use all the available databases to jointly train and improve the same system. Second, it allows to use a single system for all the Spanish speakers. The paper describes the rule- based phonetic transcription used for each dialectal variant, the selection of the shared and the specific phonemes to be modeled in a multi-dialectal recognition system, and the results of a multi-dialectal system dealing with dialects in and out of the training set.</description>
      <pubDate>Fri, 10 May 2013 15:52:02 GMT</pubDate>
      <guid isPermaLink="false">http://hdl.handle.net/2117/19167</guid>
      <dc:date>2013-05-10T15:52:02Z</dc:date>
      <itunes:author>Nogueiras Rodríguez, Albino; Caballero Galeote, Mónica; Moreno Bilbao, M. Asunción</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:keywords />
      <itunes:summary>Spanish is a global language, spoken in a big number of different countries with a big dialectal variability‥ This paper deals with the suitability of using a single multi-dialectal acoustic modeling for all the Spanish variants spoken in Europe and Latin America. This paper deals with the suitability of using a single multi-dialectal acoustic modeling for all the Spanish variants spoken in Europe and Latin America. The objective is two fold. First, it allows to use all the available databases to jointly train and improve the same system. Second, it allows to use a single system for all the Spanish speakers. The paper describes the rule- based phonetic transcription used for each dialectal variant, the selection of the shared and the specific phonemes to be modeled in a multi-dialectal recognition system, and the results of a multi-dialectal system dealing with dialects in and out of the training set.</itunes:summary>
    </item>
    <item>
      <title>Multidialectal acoustic modeling: a comparative study</title>
      <link>http://hdl.handle.net/2117/19163</link>
      <description>Title: Multidialectal acoustic modeling: a comparative study
Authors: Caballero, Mónica; Moreno Bilbao, M. Asunción; Nogueiras Rodríguez, Albino
Abstract: In this paper, multidialectal acoustic modeling based on shar-&#xD;
ing data across dialects is addressed. A comparative study of&#xD;
different methods of combining data based on decision tree&#xD;
clustering algorithms is presented. Approaches evolved differ&#xD;
in the way of evaluating the similarity of sounds between di-&#xD;
alects, and the decision tree structure applied. Proposed systems&#xD;
are tested with Spanish dialects across Spain and Latin Amer-&#xD;
ica. All multidialectal proposed systems improve monodialectal&#xD;
performance using data from another dialect but it is shown that&#xD;
the way to share data is critical. The best combination between&#xD;
similarity measure and tree structure achieves an improvement&#xD;
of 7% over the results obtained with monodialectal systems.</description>
      <pubDate>Fri, 10 May 2013 13:49:17 GMT</pubDate>
      <guid isPermaLink="false">http://hdl.handle.net/2117/19163</guid>
      <dc:date>2013-05-10T13:49:17Z</dc:date>
      <itunes:author>Caballero, Mónica; Moreno Bilbao, M. Asunción; Nogueiras Rodríguez, Albino</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:keywords />
      <itunes:summary>In this paper, multidialectal acoustic modeling based on shar-&#xD;
ing data across dialects is addressed. A comparative study of&#xD;
different methods of combining data based on decision tree&#xD;
clustering algorithms is presented. Approaches evolved differ&#xD;
in the way of evaluating the similarity of sounds between di-&#xD;
alects, and the decision tree structure applied. Proposed systems&#xD;
are tested with Spanish dialects across Spain and Latin Amer-&#xD;
ica. All multidialectal proposed systems improve monodialectal&#xD;
performance using data from another dialect but it is shown that&#xD;
the way to share data is critical. The best combination between&#xD;
similarity measure and tree structure achieves an improvement&#xD;
of 7% over the results obtained with monodialectal systems.</itunes:summary>
    </item>
    <item>
      <title>Les tecnologies de la parla: lloc de trobada, difícil però necessària, entre lingüística i tecnologia</title>
      <link>http://hdl.handle.net/2117/18708</link>
      <description>Title: Les tecnologies de la parla: lloc de trobada, difícil però necessària, entre lingüística i tecnologia
Authors: Nadeu Camprubí, Climent
Abstract: La recerca i el desenvolupament d'aplicacions en tecnologies de la parla han deixat força de banda els coneixements lingüístics. Les raons són diverses, com veurem, però la dificultat del treball&#xD;
conjunt de lingüistes i tecnòlegs no significa que aquest no sigui necessari per arribar més lluny en els objectius de la pròpia tecnologia.</description>
      <pubDate>Mon, 08 Apr 2013 14:39:43 GMT</pubDate>
      <guid isPermaLink="false">http://hdl.handle.net/2117/18708</guid>
      <dc:date>2013-04-08T14:39:43Z</dc:date>
      <itunes:author>Nadeu Camprubí, Climent</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:keywords />
      <itunes:summary>La recerca i el desenvolupament d'aplicacions en tecnologies de la parla han deixat força de banda els coneixements lingüístics. Les raons són diverses, com veurem, però la dificultat del treball&#xD;
conjunt de lingüistes i tecnòlegs no significa que aquest no sigui necessari per arribar més lluny en els objectius de la pròpia tecnologia.</itunes:summary>
    </item>
    <item>
      <title>Building synthetic voices in the META-NET framework</title>
      <link>http://hdl.handle.net/2117/18568</link>
      <description>Title: Building synthetic voices in the META-NET framework
Authors: Garcia Casademont, Emília; Bonafonte Cávez, Antonio; Moreno Bilbao, M. Asunción
Abstract: METANET&#xD;
4&#xD;
U is a European project aiming at supporting language technology for European languages and multilingualism. It&#xD;
is a project in the META-NET Network of Excellence, a cluster of projects aiming at fostering the mission of META, which is the&#xD;
Multilingual Europe Technology Alliance, dedicated to building the technological foundations of a multilingual European information&#xD;
society. This paper describe the resources produced at our lab to provide Synthethic voices. Using existing 10h corpus for a male&#xD;
and a female Spanish speakers, voices have been developed to be used in Festival, both with unit-selection and with statistical-based&#xD;
technologies. Furthermore, using data produced for supporting research on intra and inter-lingual voice conversion, four bilingual&#xD;
voices (English/Spanish) have been developed. The paper describes these resources which are available through META. Furthermore,&#xD;
an evaluation is presented to compare different synthesis techniques, influence of amount of data in statistical speech synthesis and the&#xD;
effect of sharing data in bilingual voices</description>
      <pubDate>Wed, 03 Apr 2013 11:02:54 GMT</pubDate>
      <guid isPermaLink="false">http://hdl.handle.net/2117/18568</guid>
      <dc:date>2013-04-03T11:02:54Z</dc:date>
      <itunes:author>Garcia Casademont, Emília; Bonafonte Cávez, Antonio; Moreno Bilbao, M. Asunción</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:keywords />
      <itunes:summary>METANET&#xD;
4&#xD;
U is a European project aiming at supporting language technology for European languages and multilingualism. It&#xD;
is a project in the META-NET Network of Excellence, a cluster of projects aiming at fostering the mission of META, which is the&#xD;
Multilingual Europe Technology Alliance, dedicated to building the technological foundations of a multilingual European information&#xD;
society. This paper describe the resources produced at our lab to provide Synthethic voices. Using existing 10h corpus for a male&#xD;
and a female Spanish speakers, voices have been developed to be used in Festival, both with unit-selection and with statistical-based&#xD;
technologies. Furthermore, using data produced for supporting research on intra and inter-lingual voice conversion, four bilingual&#xD;
voices (English/Spanish) have been developed. The paper describes these resources which are available through META. Furthermore,&#xD;
an evaluation is presented to compare different synthesis techniques, influence of amount of data in statistical speech synthesis and the&#xD;
effect of sharing data in bilingual voices</itunes:summary>
    </item>
    <item>
      <title>Accelerating boosting-based face detection on GPUs</title>
      <link>http://hdl.handle.net/2117/18498</link>
      <description>Title: Accelerating boosting-based face detection on GPUs
Authors: Oro, David; Fernández, Carles; Segura, Carlos; Martorell Bofill, Xavier; Hernando Pericás, Francisco Javier
Abstract: The goal of face detection is to determine the&#xD;
presence of faces in arbitrary images, along with their locations&#xD;
and dimensions. As it happens with any graphics workloads,&#xD;
these algorithms benefit from data-level parallelism. Existing&#xD;
parallelization efforts strictly focus on mapping different di-&#xD;
vide and conquer strategies into multicore CPUs and GPUs.&#xD;
However, even the most advanced single-chip many-core pro-&#xD;
cessors to date are still struggling to effectively handle real-&#xD;
time face detection under high-definition video workloads. To&#xD;
address this challenge, face detection algorithms typically avoid&#xD;
computations by dynamically evaluating a boosted cascade&#xD;
of classifiers. Unfortunately, this technique yields a low ALU&#xD;
occupancy in architectures such as GPUs, which heavily rely&#xD;
on large SIMD widths for maximizing data-level parallelism.&#xD;
In this paper we present several techniques to increase the&#xD;
performance of the cascade evaluation kernel, which is the&#xD;
most resource-intensive part of the face detection pipeline.&#xD;
Particularly, the usage of concurrent kernel execution in&#xD;
combination with cascades generated with the GentleBoost&#xD;
algorithm solves the problem of GPU underutilization, and&#xD;
achieves a 5X speedup in 1080p videos on average over&#xD;
the fastest known implementations, while slightly improving&#xD;
the accuracy. Finally, we also studied the parallelization of&#xD;
the cascade training process and its scalability under SMP&#xD;
platforms. The proposed parallelization strategy exploits both&#xD;
task and data-level parallelism and achieves a 3.5X speedup&#xD;
over single-threaded implementations</description>
      <pubDate>Fri, 22 Mar 2013 13:12:26 GMT</pubDate>
      <guid isPermaLink="false">http://hdl.handle.net/2117/18498</guid>
      <dc:date>2013-03-22T13:12:26Z</dc:date>
      <itunes:author>Oro, David; Fernández, Carles; Segura, Carlos; Martorell Bofill, Xavier; Hernando Pericás, Francisco Javier</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:keywords />
      <itunes:summary>The goal of face detection is to determine the&#xD;
presence of faces in arbitrary images, along with their locations&#xD;
and dimensions. As it happens with any graphics workloads,&#xD;
these algorithms benefit from data-level parallelism. Existing&#xD;
parallelization efforts strictly focus on mapping different di-&#xD;
vide and conquer strategies into multicore CPUs and GPUs.&#xD;
However, even the most advanced single-chip many-core pro-&#xD;
cessors to date are still struggling to effectively handle real-&#xD;
time face detection under high-definition video workloads. To&#xD;
address this challenge, face detection algorithms typically avoid&#xD;
computations by dynamically evaluating a boosted cascade&#xD;
of classifiers. Unfortunately, this technique yields a low ALU&#xD;
occupancy in architectures such as GPUs, which heavily rely&#xD;
on large SIMD widths for maximizing data-level parallelism.&#xD;
In this paper we present several techniques to increase the&#xD;
performance of the cascade evaluation kernel, which is the&#xD;
most resource-intensive part of the face detection pipeline.&#xD;
Particularly, the usage of concurrent kernel execution in&#xD;
combination with cascades generated with the GentleBoost&#xD;
algorithm solves the problem of GPU underutilization, and&#xD;
achieves a 5X speedup in 1080p videos on average over&#xD;
the fastest known implementations, while slightly improving&#xD;
the accuracy. Finally, we also studied the parallelization of&#xD;
the cascade training process and its scalability under SMP&#xD;
platforms. The proposed parallelization strategy exploits both&#xD;
task and data-level parallelism and achieves a 3.5X speedup&#xD;
over single-threaded implementations</itunes:summary>
    </item>
    <item>
      <title>La lengua española en la era digital : The Spanish language in the digital age</title>
      <link>http://hdl.handle.net/2117/18456</link>
      <description>Title: La lengua española en la era digital : The Spanish language in the digital age
Authors: Melero, Maite; Badía, Toni; Moreno Bilbao, M. Asunción</description>
      <pubDate>Thu, 21 Mar 2013 13:52:33 GMT</pubDate>
      <guid isPermaLink="false">http://hdl.handle.net/2117/18456</guid>
      <dc:date>2013-03-21T13:52:33Z</dc:date>
      <itunes:author>Melero, Maite; Badía, Toni; Moreno Bilbao, M. Asunción</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:keywords />
    </item>
    <item>
      <title>La llengua catalana a l'era digital : The Catalan language in the digital age</title>
      <link>http://hdl.handle.net/2117/18454</link>
      <description>Title: La llengua catalana a l'era digital : The Catalan language in the digital age
Authors: Moreno Bilbao, M. Asunción; Bel, Nùria; García, Emília; Vallverdú Bayés, Francesc; Revilla, Eva</description>
      <pubDate>Thu, 21 Mar 2013 13:19:10 GMT</pubDate>
      <guid isPermaLink="false">http://hdl.handle.net/2117/18454</guid>
      <dc:date>2013-03-21T13:19:10Z</dc:date>
      <itunes:author>Moreno Bilbao, M. Asunción; Bel, Nùria; García, Emília; Vallverdú Bayés, Francesc; Revilla, Eva</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:keywords />
    </item>
    <item>
      <title>The TALP-UPC phrase-based translation systems for WMT12: morphology simplification and domain adaptation</title>
      <link>http://hdl.handle.net/2117/18301</link>
      <description>Title: The TALP-UPC phrase-based translation systems for WMT12: morphology simplification and domain adaptation
Authors: Formiga Fanals, Lluís; Henríquez Quintana, Carlos Alberto; Hernández Huerta, Adolfo; Mariño Acebal, José Bernardo; Monte Moreno, Enrique; Rodríguez Fonollosa, José Adrián
Abstract: This paper describes the UPC participation in&#xD;
the WMT 12 evaluation campaign. All sys-&#xD;
tems presented are based on standard phrase-&#xD;
based Moses systems. Variations adopted sev-&#xD;
eral improvement techniques such as mor-&#xD;
phology simplification and generation and do-&#xD;
main adaptation. The morphology simpli-&#xD;
fication overcomes the data sparsity prob-&#xD;
lem when translating into morphologically-&#xD;
rich languages such as Spanish by translat-&#xD;
ing first to a morphology-simplified language&#xD;
and secondly leave the morphology gener-&#xD;
ation to an independent classification task.&#xD;
The domain adaptation approach improves the&#xD;
SMT system by adding new translation units&#xD;
learned from MT-output and reference align-&#xD;
ment. Results depict an improvement on TER,&#xD;
METEOR, NIST and BLEU scores compared&#xD;
to our baseline system, obtaining on the of-&#xD;
ficial test set more benefits from the domain&#xD;
adaptation approach than from the morpho-&#xD;
logical generalization method.</description>
      <pubDate>Thu, 14 Mar 2013 14:52:57 GMT</pubDate>
      <guid isPermaLink="false">http://hdl.handle.net/2117/18301</guid>
      <dc:date>2013-03-14T14:52:57Z</dc:date>
      <itunes:author>Formiga Fanals, Lluís; Henríquez Quintana, Carlos Alberto; Hernández Huerta, Adolfo; Mariño Acebal, José Bernardo; Monte Moreno, Enrique; Rodríguez Fonollosa, José Adrián</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:keywords />
      <itunes:summary>This paper describes the UPC participation in&#xD;
the WMT 12 evaluation campaign. All sys-&#xD;
tems presented are based on standard phrase-&#xD;
based Moses systems. Variations adopted sev-&#xD;
eral improvement techniques such as mor-&#xD;
phology simplification and generation and do-&#xD;
main adaptation. The morphology simpli-&#xD;
fication overcomes the data sparsity prob-&#xD;
lem when translating into morphologically-&#xD;
rich languages such as Spanish by translat-&#xD;
ing first to a morphology-simplified language&#xD;
and secondly leave the morphology gener-&#xD;
ation to an independent classification task.&#xD;
The domain adaptation approach improves the&#xD;
SMT system by adding new translation units&#xD;
learned from MT-output and reference align-&#xD;
ment. Results depict an improvement on TER,&#xD;
METEOR, NIST and BLEU scores compared&#xD;
to our baseline system, obtaining on the of-&#xD;
ficial test set more benefits from the domain&#xD;
adaptation approach than from the morpho-&#xD;
logical generalization method.</itunes:summary>
    </item>
    <item>
      <title>Improving English to Spanish out-of-domain translations by morphology generalization and generation</title>
      <link>http://hdl.handle.net/2117/18280</link>
      <description>Title: Improving English to Spanish out-of-domain translations by morphology generalization and generation
Authors: Formiga Fanals, Lluís; Hernández Huerta, Adolfo; Mariño Acebal, José Bernardo; Monte Moreno, Enrique
Abstract: This paper presents a detailed study of a&#xD;
method for morphology generalization and&#xD;
generation to address out-of-domain translations&#xD;
in English-to-Spanish phrase-based MT.&#xD;
The paper studies whether the morphological&#xD;
richness of the target language causes poor&#xD;
quality translation when translating out-ofdomain.&#xD;
In detail, this approach first translates&#xD;
into Spanish simplified forms and then&#xD;
predicts the final inflected forms through a&#xD;
morphology generation step based on shallow&#xD;
and deep-projected linguistic information&#xD;
available from both the source and targetlanguage&#xD;
sentences. Obtained results highlight&#xD;
the importance of generalization, and&#xD;
therefore generation, for dealing with out-ofdomain&#xD;
data.</description>
      <pubDate>Wed, 13 Mar 2013 16:37:55 GMT</pubDate>
      <guid isPermaLink="false">http://hdl.handle.net/2117/18280</guid>
      <dc:date>2013-03-13T16:37:55Z</dc:date>
      <itunes:author>Formiga Fanals, Lluís; Hernández Huerta, Adolfo; Mariño Acebal, José Bernardo; Monte Moreno, Enrique</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:keywords />
      <itunes:summary>This paper presents a detailed study of a&#xD;
method for morphology generalization and&#xD;
generation to address out-of-domain translations&#xD;
in English-to-Spanish phrase-based MT.&#xD;
The paper studies whether the morphological&#xD;
richness of the target language causes poor&#xD;
quality translation when translating out-ofdomain.&#xD;
In detail, this approach first translates&#xD;
into Spanish simplified forms and then&#xD;
predicts the final inflected forms through a&#xD;
morphology generation step based on shallow&#xD;
and deep-projected linguistic information&#xD;
available from both the source and targetlanguage&#xD;
sentences. Obtained results highlight&#xD;
the importance of generalization, and&#xD;
therefore generation, for dealing with out-ofdomain&#xD;
data.</itunes:summary>
    </item>
    <item>
      <title>Dealing with input noise in statistical machine translation</title>
      <link>http://hdl.handle.net/2117/18279</link>
      <description>Title: Dealing with input noise in statistical machine translation
Authors: Formiga Fanals, Lluís; Rodríguez Fonollosa, José Adrián
Abstract: Misspelled words have a direct impact on the final quality obtained by Statistical Machine&#xD;
Translation (SMT) systems as the input becomes noisy and unpredictable. This paper presents&#xD;
some improvement strategies for translating real-life noisy input. The proposed strategies&#xD;
are based on a preprocessing step consisting in a character-based translator (MT) from noisy&#xD;
into cleaned text. The use of a character-level translator allows us to provide various spelling&#xD;
alternatives in a lattice format to the final bilingual translator. Therefore, the final MT is the&#xD;
one that decides the best path to be translated. The different hypotheses are obtained under&#xD;
the assumption of a noisy channel model for this task. This paper shows the experiments done&#xD;
with real-life noisy input and a standard phrase-based SMT system from English into Spanish.</description>
      <pubDate>Wed, 13 Mar 2013 16:29:05 GMT</pubDate>
      <guid isPermaLink="false">http://hdl.handle.net/2117/18279</guid>
      <dc:date>2013-03-13T16:29:05Z</dc:date>
      <itunes:author>Formiga Fanals, Lluís; Rodríguez Fonollosa, José Adrián</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:keywords />
      <itunes:summary>Misspelled words have a direct impact on the final quality obtained by Statistical Machine&#xD;
Translation (SMT) systems as the input becomes noisy and unpredictable. This paper presents&#xD;
some improvement strategies for translating real-life noisy input. The proposed strategies&#xD;
are based on a preprocessing step consisting in a character-based translator (MT) from noisy&#xD;
into cleaned text. The use of a character-level translator allows us to provide various spelling&#xD;
alternatives in a lattice format to the final bilingual translator. Therefore, the final MT is the&#xD;
one that decides the best path to be translated. The different hypotheses are obtained under&#xD;
the assumption of a noisy channel model for this task. This paper shows the experiments done&#xD;
with real-life noisy input and a standard phrase-based SMT system from English into Spanish.</itunes:summary>
    </item>
    <item>
      <title>Correcting input noise in SMT as a char-based translation problem</title>
      <link>http://hdl.handle.net/2117/18275</link>
      <description>Title: Correcting input noise in SMT as a char-based translation problem
Authors: Formiga Fanals, Lluís; Rodríguez Fonollosa, José Adrián
Abstract: Misspelled words have a direct impact on the final quality obtained by Statistical Machine Translation (SMT) systems as the input becomes noisy and unpredictable. This paper presents some improvement strategies for translating real-life noisy input. The proposed strategies are based on a preprocessing step consisting in a character-based translator.</description>
      <pubDate>Wed, 13 Mar 2013 15:57:38 GMT</pubDate>
      <guid isPermaLink="false">http://hdl.handle.net/2117/18275</guid>
      <dc:date>2013-03-13T15:57:38Z</dc:date>
      <itunes:author>Formiga Fanals, Lluís; Rodríguez Fonollosa, José Adrián</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:keywords />
      <itunes:summary>Misspelled words have a direct impact on the final quality obtained by Statistical Machine Translation (SMT) systems as the input becomes noisy and unpredictable. This paper presents some improvement strategies for translating real-life noisy input. The proposed strategies are based on a preprocessing step consisting in a character-based translator.</itunes:summary>
    </item>
    <item>
      <title>On the use of agglomerative and spectral clustering in speaker diarization of meetings</title>
      <link>http://hdl.handle.net/2117/18147</link>
      <description>Title: On the use of agglomerative and spectral clustering in speaker diarization of meetings
Authors: Hernando Pericás, Francisco Javier
Abstract: In this paper, we present a clustering algorithm for speaker&#xD;
diarization based on spectral clustering. State-of-the-art diariza-&#xD;
tion systems are based on agglomerative hierarchical clustering&#xD;
using Bayesian Information Criterion and other statistical met-&#xD;
rics among clusters which results in a high computational cost&#xD;
and in a time demanding approach. Our proposal avoids the use&#xD;
of such metrics applying Euclidean distances on the eigenvec-&#xD;
tors computed from the normalized graph Laplacian. A hybrid&#xD;
system is proposed in which HMM/GMM modelling and Viterbi&#xD;
alignment are still applied, but the BIC for merging and stop-&#xD;
ping criterion are substituted by a spectral clustering algorithm.&#xD;
Once an initial segmentation is obtained and the clustering align-&#xD;
ment is computed using the Viterbi algorithm, the remaining&#xD;
clusters are modeled by stacking the means of the Gaussians in&#xD;
a super vector. In such a space single value decomposition of&#xD;
the associated normalized graph Laplacian is computed. Most&#xD;
similar clusters are merged based on the Euclidean distances&#xD;
in resulting eigenspace. Cluster number estimation is based on&#xD;
analyzing eigenstructure of the similarity matrix by selecting&#xD;
a threshold on the eigenvalues gap. In experiments, this ap-&#xD;
proach has obtained a comparable performance to the traditional&#xD;
AHC+BIC approach on the Rich Transcription conference eval-&#xD;
uation data. Although it still relies on Gaussian modelling of&#xD;
clusters and Viterbi alignment, the proposed approach leads to a&#xD;
system which runs several times faster than traditional one.</description>
      <pubDate>Fri, 08 Mar 2013 12:50:19 GMT</pubDate>
      <guid isPermaLink="false">http://hdl.handle.net/2117/18147</guid>
      <dc:date>2013-03-08T12:50:19Z</dc:date>
      <itunes:author>Hernando Pericás, Francisco Javier</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:keywords />
      <itunes:summary>In this paper, we present a clustering algorithm for speaker&#xD;
diarization based on spectral clustering. State-of-the-art diariza-&#xD;
tion systems are based on agglomerative hierarchical clustering&#xD;
using Bayesian Information Criterion and other statistical met-&#xD;
rics among clusters which results in a high computational cost&#xD;
and in a time demanding approach. Our proposal avoids the use&#xD;
of such metrics applying Euclidean distances on the eigenvec-&#xD;
tors computed from the normalized graph Laplacian. A hybrid&#xD;
system is proposed in which HMM/GMM modelling and Viterbi&#xD;
alignment are still applied, but the BIC for merging and stop-&#xD;
ping criterion are substituted by a spectral clustering algorithm.&#xD;
Once an initial segmentation is obtained and the clustering align-&#xD;
ment is computed using the Viterbi algorithm, the remaining&#xD;
clusters are modeled by stacking the means of the Gaussians in&#xD;
a super vector. In such a space single value decomposition of&#xD;
the associated normalized graph Laplacian is computed. Most&#xD;
similar clusters are merged based on the Euclidean distances&#xD;
in resulting eigenspace. Cluster number estimation is based on&#xD;
analyzing eigenstructure of the similarity matrix by selecting&#xD;
a threshold on the eigenvalues gap. In experiments, this ap-&#xD;
proach has obtained a comparable performance to the traditional&#xD;
AHC+BIC approach on the Rich Transcription conference eval-&#xD;
uation data. Although it still relies on Gaussian modelling of&#xD;
clusters and Viterbi alignment, the proposed approach leads to a&#xD;
system which runs several times faster than traditional one.</itunes:summary>
    </item>
    <item>
      <title>Measuring acoustic reduction in feature space</title>
      <link>http://hdl.handle.net/2117/18130</link>
      <description>Title: Measuring acoustic reduction in feature space
Authors: Rodríguez Fonollosa, José Adrián; Schulz, Henrik
Abstract: Modelling varying speaking style remains a challenge to sta&#xD;
te&#xD;
of the art speech recognition and synthesis systems. Vowel a&#xD;
nd consonant&#xD;
reduction have been identified as correlative to speaking st&#xD;
yle variation,&#xD;
but still lack a common measurement. The reduction phenomen&#xD;
aare&#xD;
often observed without consideration of coarticulation an&#xD;
dassimilation&#xD;
e&#xD;
!&#xD;
ects, and as a result of speaking rate variability. We present an analy-&#xD;
sis of acoustic reduction in Mel Frequency cepstral coe&#xD;
"&#xD;
cie&#xD;
nts (MFCC)&#xD;
feature space of phonemes, estimate duration and determine&#xD;
the degree&#xD;
of correlation between duration reduction and feature spac&#xD;
ereduction&#xD;
for two di&#xD;
!&#xD;
erent speaking styles present in broadcast news a&#xD;
nd conversa-&#xD;
tional recordings. We analyse the feature space reduction o&#xD;
fconsonants&#xD;
and vowels in context in a syllable environment</description>
      <pubDate>Thu, 07 Mar 2013 13:07:38 GMT</pubDate>
      <guid isPermaLink="false">http://hdl.handle.net/2117/18130</guid>
      <dc:date>2013-03-07T13:07:38Z</dc:date>
      <itunes:author>Rodríguez Fonollosa, José Adrián; Schulz, Henrik</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:keywords />
      <itunes:summary>Modelling varying speaking style remains a challenge to sta&#xD;
te&#xD;
of the art speech recognition and synthesis systems. Vowel a&#xD;
nd consonant&#xD;
reduction have been identified as correlative to speaking st&#xD;
yle variation,&#xD;
but still lack a common measurement. The reduction phenomen&#xD;
aare&#xD;
often observed without consideration of coarticulation an&#xD;
dassimilation&#xD;
e&#xD;
!&#xD;
ects, and as a result of speaking rate variability. We present an analy-&#xD;
sis of acoustic reduction in Mel Frequency cepstral coe&#xD;
"&#xD;
cie&#xD;
nts (MFCC)&#xD;
feature space of phonemes, estimate duration and determine&#xD;
the degree&#xD;
of correlation between duration reduction and feature spac&#xD;
ereduction&#xD;
for two di&#xD;
!&#xD;
erent speaking styles present in broadcast news a&#xD;
nd conversa-&#xD;
tional recordings. We analyse the feature space reduction o&#xD;
fconsonants&#xD;
and vowels in context in a syllable environment</itunes:summary>
    </item>
    <item>
      <title>GCC-PHAT based head orientation estimation</title>
      <link>http://hdl.handle.net/2117/18034</link>
      <description>Title: GCC-PHAT based head orientation estimation
Authors: Segura, Carlos; Hernando Pericás, Francisco Javier
Abstract: This work presents a novel two-step algorithm to estimate the&#xD;
orientation of speakers in a smart-room environment equipped&#xD;
with microphone arrays. First the position of the speaker is&#xD;
estimated by the SRP-PHAT algorithm, and the time delay of&#xD;
arrival for each microphone pair with respect to the detected&#xD;
position is computed. In the second step, the value of the cross-&#xD;
correlation at the estimated time delay is used as the fundamen-&#xD;
tal characteristic from where to derive the speaker orientation. The proposed method performs consistently better than other state-of-the-art acoustic techniques with a purposely recorded database and the CLEAR head pose database.</description>
      <pubDate>Fri, 01 Mar 2013 12:14:03 GMT</pubDate>
      <guid isPermaLink="false">http://hdl.handle.net/2117/18034</guid>
      <dc:date>2013-03-01T12:14:03Z</dc:date>
      <itunes:author>Segura, Carlos; Hernando Pericás, Francisco Javier</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:keywords />
      <itunes:summary>This work presents a novel two-step algorithm to estimate the&#xD;
orientation of speakers in a smart-room environment equipped&#xD;
with microphone arrays. First the position of the speaker is&#xD;
estimated by the SRP-PHAT algorithm, and the time delay of&#xD;
arrival for each microphone pair with respect to the detected&#xD;
position is computed. In the second step, the value of the cross-&#xD;
correlation at the estimated time delay is used as the fundamen-&#xD;
tal characteristic from where to derive the speaker orientation. The proposed method performs consistently better than other state-of-the-art acoustic techniques with a purposely recorded database and the CLEAR head pose database.</itunes:summary>
    </item>
    <item>
      <title>Detection and handling of overlapping speech for speaker diarization</title>
      <link>http://hdl.handle.net/2117/18033</link>
      <description>Title: Detection and handling of overlapping speech for speaker diarization
Authors: Zelenak, Martin; Hernando Pericás, Francisco Javier
Abstract: This thesis concerns the detection of overlapping speech segments and its further application for the improvement of speaker diarization performance. We propose the use of three spatial cross-correlation-based parameters for overlap detection on distant microphone channel data. Spatial features from dierent microphone pairs are fused by means of principal component analysis or by an approach involving a multilayer perceptron. In addition, we investigate the possibility of employing long-term prosodic information. The most suitable subset of candidate prosodic features is determined by a two-step mRMR feature selection algorithm. For segments including detected overlapping speech the speaker diarization system picks a second speaker label, and such segments are&#xD;
also discarded from the model training. The proposed overlap labeling technique is integrated in the Viterbi-decoding part of the diarization algorithm.</description>
      <pubDate>Fri, 01 Mar 2013 11:47:44 GMT</pubDate>
      <guid isPermaLink="false">http://hdl.handle.net/2117/18033</guid>
      <dc:date>2013-03-01T11:47:44Z</dc:date>
      <itunes:author>Zelenak, Martin; Hernando Pericás, Francisco Javier</itunes:author>
      <itunes:explicit>no</itunes:explicit>
      <itunes:keywords />
      <itunes:summary>This thesis concerns the detection of overlapping speech segments and its further application for the improvement of speaker diarization performance. We propose the use of three spatial cross-correlation-based parameters for overlap detection on distant microphone channel data. Spatial features from dierent microphone pairs are fused by means of principal component analysis or by an approach involving a multilayer perceptron. In addition, we investigate the possibility of employing long-term prosodic information. The most suitable subset of candidate prosodic features is determined by a two-step mRMR feature selection algorithm. For segments including detected overlapping speech the speaker diarization system picks a second speaker label, and such segments are&#xD;
also discarded from the model training. The proposed overlap labeling technique is integrated in the Viterbi-decoding part of the diarization algorithm.</itunes:summary>
    </item>
  </channel>
</rss>

