WAV2PIX: Speech-conditioned face generation using generative adversarial networks

Duarte, Amanda; Roldan, Francisco; Tubau, Miquel; Escur, Janna; Pascual, Santiago; Salvador, Amaia; Mohedano, Eva; McGuinness, Kevin; Torres, Jordi; Giró-i-Nieto, Xavier

Duarte, Amanda, Roldan, Francisco, Tubau, Miquel ORCID: 0000-0003-1971-5797, Escur, Janna, Pascual, Santiago, Salvador, Amaia ORCID: 0000-0002-9908-1685, Mohedano, Eva, McGuinness, Kevin ORCID: 0000-0003-1336-6477, Torres, Jordi ORCID: 0000-0003-1963-7418 and Giró-i-Nieto, Xavier ORCID: 0000-0002-9935-5332 (2019) WAV2PIX: Speech-conditioned face generation using generative adversarial networks. In: 44th International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 12 -17 May, 2019, Brighton, UK.

Abstract
Metadata
Downloads
Documents
Metrics

[+][-]

Abstract

Speech is a rich biometric signal that contains information about the identity, gender and emotional state of the speaker. In this work, we explore its potential to generate face images of a speaker by conditioning a Generative Adversarial Network (GAN) with raw speech input. We propose a deep neural network that is trained from scratch in an end-to-end fashion, generating a face directly from the raw speech waveform without any additional identity information (e.g reference image or one-hot encoding). Our model is trained in a self-supervised fashion by exploiting the audio and visual signals naturally aligned in videos. With the purpose of training from video data, we present a novel dataset collected for this work, with high-quality videos of ten youtubers with notable expressiveness in both the speech and visual signals.

Metadata

Item Type:	Conference or Workshop Item (Poster)
Event Type:	Conference
Refereed:	Yes
Uncontrolled Keywords:	deep learning; adversarial learning; face synthesis; computer vision
Subjects:	Computer Science > Image processing Computer Science > Machine learning Computer Science > Digital video
DCU Faculties and Centres:	DCU Faculties and Schools > Faculty of Engineering and Computing > School of Electronic Engineering Research Institutes and Centres > INSIGHT Centre for Data Analytics
Published in:	ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). .
Official URL:	https://doi.org/10.1109/ICASSP.2019.8682970
Copyright Information:	© 2019 The Authors
Use License:	This item is licensed under a Creative Commons Attribution-NonCommercial-Share Alike 3.0 License. View License
Funders:	“la Caixa” Foundation funded by the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement No. 713673, Spanish Ministry of Economy and Competitivity and the European Regional Development fund under contracts TEC 2015-69266-P and TEC 2016-75976-R (MINECO/FEDER, UE), Science Foundation Ireland (SFI) under grant number SFI/15/SIRG/3283
ID Code:	23188
Deposited On:	16 May 2019 14:46 by Kevin Mcguinness . Last Modified 01 Mar 2022 15:46

Documents

Full text available as:

Preview

PDF - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
4MB

Metrics

Downloads

Downloads per month over past year

Archive Staff Only: edit this record

DORAS | DCU Research Repository

WAV2PIX: Speech-conditioned face generation using generative adversarial networks

Altmetric Badge

Dimensions Badge

Downloads