Mostra el registre d'ítem simple
Wav2Pix: speech-conditioned face generation using generative adversarial networks
dc.contributor.author | Cardoso Duarte, Amanda |
dc.contributor.author | Roldan, Francisco |
dc.contributor.author | Tubau, Miquel |
dc.contributor.author | Escur, Janna |
dc.contributor.author | Pascual de la Puente, Santiago |
dc.contributor.author | Salvador Aguilera, Amaia |
dc.contributor.author | Mohedano, Eva |
dc.contributor.author | McGuinness, Kevin |
dc.contributor.author | Torres Viñals, Jordi |
dc.contributor.author | Giró Nieto, Xavier |
dc.contributor.other | Universitat Politècnica de Catalunya. Doctorat en Teoria del Senyal i Comunicacions |
dc.contributor.other | Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors |
dc.contributor.other | Universitat Politècnica de Catalunya. Departament de Teoria del Senyal i Comunicacions |
dc.date.accessioned | 2019-07-30T07:53:22Z |
dc.date.issued | 2019 |
dc.identifier.citation | Cardoso, A. [et al.]. Wav2Pix: speech-conditioned face generation using generative adversarial networks. A: IEEE International Conference on Acoustics, Speech, and Signal Processing. "2019 IEEE International Conference on Acoustics, Speech, and Signal Processing: proceedings: May 12-17, 2019: Brighton Conference Centre, Brighton, United Kingdom". Institute of Electrical and Electronics Engineers (IEEE), 2019, p. 8633-8637. |
dc.identifier.isbn | 978-1-4799-8131-1 |
dc.identifier.other | https://imatge.upc.edu/web/publications/wav2pix-speech-conditioned-face-generation-using-generative-adversarial-networks |
dc.identifier.uri | http://hdl.handle.net/2117/167073 |
dc.description.abstract | Speech is a rich biometric signal that contains information about the identity, gender and emotional state of the speaker. In this work, we explore its potential to generate face images of a speaker by conditioning a Generative Adversarial Network (GAN) with raw speech input. We propose a deep neural network that is trained from scratch in an end-to-end fashion, generating a face directly from the raw speech waveform without any additional identity information (e.g reference image or one-hot encoding). Our model is trained in a self-supervised approach by exploiting the audio and visual signals naturally aligned in videos. With the purpose of training from video data, we present a novel dataset collected for this work, with high-quality videos of youtubers with notable expressiveness in both the speech and visual signals. |
dc.format.extent | 5 p. |
dc.language.iso | eng |
dc.publisher | Institute of Electrical and Electronics Engineers (IEEE) |
dc.subject | Àrees temàtiques de la UPC::Informàtica::Intel·ligència artificial::Aprenentatge automàtic |
dc.subject | Àrees temàtiques de la UPC::Enginyeria de la telecomunicació::Processament del senyal::Reconeixement de formes |
dc.subject.lcsh | Machine learning |
dc.subject.lcsh | Computer vision |
dc.subject.other | Face |
dc.subject.other | Videos |
dc.subject.other | Generators |
dc.subject.other | Visualization |
dc.subject.other | Feature extraction |
dc.subject.other | Generative adversarial networks |
dc.subject.other | Deep learning |
dc.subject.other | Adversarial learning |
dc.subject.other | Face synthesis |
dc.subject.other | Computer vision. |
dc.title | Wav2Pix: speech-conditioned face generation using generative adversarial networks |
dc.type | Conference lecture |
dc.subject.lemac | Aprenentatge automàtic |
dc.subject.lemac | Visió per ordinador |
dc.contributor.group | Universitat Politècnica de Catalunya. VEU - Grup de Tractament de la Parla |
dc.contributor.group | Universitat Politècnica de Catalunya. CAP - Grup de Computació d'Altes Prestacions |
dc.contributor.group | Universitat Politècnica de Catalunya. GPI - Grup de Processament d'Imatge i Vídeo |
dc.identifier.doi | 10.1109/ICASSP.2019.8682970 |
dc.description.peerreviewed | Peer Reviewed |
dc.relation.publisherversion | https://ieeexplore.ieee.org/document/8682970 |
dc.rights.access | Restricted access - publisher's policy |
local.identifier.drac | 25136752 |
dc.description.version | Postprint (published version) |
dc.relation.projectid | info:eu-repo/grantAgreement/EC/H2020/713673/EU/Innovative doctoral programme for talented early-stage researchers in Spanish host organisations excellent in the areas of Science, Technology, Engineering and Mathematics (STEM)./INPhINIT |
dc.relation.projectid | info:eu-repo/grantAgreement/MINECO//TEC2015-69266-P/ES/TECNOLOGIAS DE APRENDIZAJE PROFUNDO APLICADAS AL PROCESADO DE VOZ Y AUDIO/ |
dc.relation.projectid | info:eu-repo/grantAgreement/MINECO/1PE/TEC2016-75976-R |
dc.relation.projectid | info:eu-repo/grantAgreement/MINECO//TIN2015-65316-P/ES/COMPUTACION DE ALTAS PRESTACIONES VII/ |
dc.date.lift | 10000-01-01 |
local.citation.author | Cardoso, A.; Roldan, F.; Tubau, M.; Escur, J.; Pascual, S.; Salvador, A.; Mohedano, E.; McGuinness, K.; Torres, J.; Giro, X. |
local.citation.contributor | IEEE International Conference on Acoustics, Speech, and Signal Processing |
local.citation.publicationName | 2019 IEEE International Conference on Acoustics, Speech, and Signal Processing: proceedings: May 12-17, 2019: Brighton Conference Centre, Brighton, United Kingdom |
local.citation.startingPage | 8633 |
local.citation.endingPage | 8637 |