Now showing items 1-9 of 9

    • Auto-encoding nearest neighbor i-vectors for speaker verification 

      Khan, Umair; India Massana, Miquel Àngel; Hernando Pericás, Francisco Javier (International Speech Communication Association (ISCA), 2019)
      Conference lecture
      Open Access
      In the last years, i-vectors followed by cosine or PLDA scoringtechniques were the state-of-the-art approach in speaker veri-fication. PLDA requires labeled background data, and thereexists a significant performance gap ...
    • DNN speaker embeddings using autoencoder pre-training 

      Khan, Umair; Hernando Pericás, Francisco Javier (Institute of Electrical and Electronics Engineers (IEEE), 2019)
      Conference lecture
      Restricted access - publisher's policy
      Over the last years, i-vectors have been the state-of-the-art approach in speaker recognition. Recent improvements in deep learning have increased the discriminative quality of i-vectors. However, deep learning architectures ...
    • I-vector transformation using k-nearest neighbors for speaker verification 

      Khan, Umair; India Massana, Miquel Àngel; Hernando Pericás, Francisco Javier (Institute of Electrical and Electronics Engineers (IEEE), 2020)
      Conference report
      Restricted access - publisher's policy
      Probabilistic Linear Discriminant Analysis (PLDA) is the most efficient backend for i-vectors. However, it requires labeled background data which can be difficult to access in practice. Unlike PLDA, cosine scoring avoids ...
    • Restricted Boltzmann Machine vectors for speaker clustering 

      Khan, Umair; Safari, Pooyan; Hernando Pericás, Francisco Javier (International Speech Communication Association (ISCA), 2018)
      Conference lecture
      Open Access
      Restricted Boltzmann Machines (RBMs) have been used both in the front-end and backend of speaker verification systems. In this work, we apply RBMs as a front-end in the context of speaker clustering. Speakers' utterances ...
    • Restricted Boltzmann machine vectors for speaker clustering and tracking tasks in TV broadcast shows 

      Khan, Umair; Safari, Pooyan; Hernando Pericás, Francisco Javier (Multidisciplinary Digital Publishing Institute, 2019-07-09)
      Article
      Open Access
      Restricted Boltzmann Machines (RBMs) have shown success in both the front-end and backend of speaker verification systems. In this paper, we propose applying RBMs to the front-end for the tasks of speaker clustering and ...
    • Self-supervised deep learning approaches to speaker recognition 

      Khan, Umair (Universitat Politècnica de Catalunya, 2021-01-11)
      Doctoral thesis
      Open Access
      In speaker recognition, i-vectors have been the state-of-the-art unsupervised technique over the last few years, whereas x-vectors is becoming the state-of-the-art supervised technique, these days. Recent advances in Deep ...
    • Speaker tracking system using speaker boundary detection 

      Khan, Umair (Universitat Politècnica de Catalunya, 2016-11)
      Master thesis
      Open Access
      This thesis is about a research conducted in the area of Speaker Recognition. The application is concerned to the automatic detection and tracking of target speakers in meetings, conferences, telephone conversations and ...
    • The UPC speaker verification system submitted to VoxCeleb Speaker Recognition Challenge 2020 (VoxSRC-20) 

      Khan, Umair; Hernando Pericás, Francisco Javier (2020-10-27)
      External research report
      Open Access
      This report describes the submission from Technical University of Catalonia (UPC) to the VoxCeleb Speaker Recognition Challenge (VoxSRC-20) at Interspeech 2020. The final submission is a combination of three systems. ...
    • Unsupervised training of siamese networks for speaker verification 

      Khan, Umair; Hernando Pericás, Francisco Javier (International Speech Communication Association (ISCA), 2020)
      Conference report
      Open Access
      Speaker labeled background data is an essential requirement for most state-of-the-art approaches in speaker recognition, e.g., xvectors and i-vector/PLDA. However, in reality it is difficult to access large amount of labeled ...