Eyes and ears together: new task for multimodal spoken
content analysis

Moriya, Yasufumi; Sanabria, Ramon; Metze, Florian; Jones, Gareth J.F.

Moriya, Yasufumi, Sanabria, Ramon, Metze, Florian ORCID: 0000-0002-6663-8600 and Jones, Gareth J.F. ORCID: 0000-0003-2923-8365 (2018) Eyes and ears together: new task for multimodal spoken content analysis. In: MediaEval’18, 29-31 Oct 2018, Sophia Antipolis, France.

Abstract
Metadata
Downloads
Documents

[+][-]

Abstract

Human speech processing is often a multimodal process combining audio and visual processing. Eyes and Ears Together proposes two benchmark multimodal speech processing tasks: (1) multimodal automatic speech recognition (ASR) and (2) multimodal co-reference resolution on the spoken multimedia. These tasks are motivated by our desire to address the difficulties of ASR for multimedia spoken content. We review prior work on the integration of multimodal signals into speech processing for multimedia data, introduce a multimedia dataset for our proposed tasks, and outline these tasks.

Metadata

Item Type:	Conference or Workshop Item (Paper)
Event Type:	Conference
Refereed:	Yes
Uncontrolled Keywords:	Human speech processing
Subjects:	UNSPECIFIED
DCU Faculties and Centres:	DCU Faculties and Schools > Faculty of Engineering and Computing > School of Computing Research Institutes and Centres > ADAPT
Published in:	Larson, Martha, Arora, Piyush, Demarty, Claire-Hélène and Riegler, Michael, (eds.) Working Notes Proceedings of the MediaEval 2018 Workshop. 2283. CEUR-WS.
Publisher:	CEUR-WS
Official URL:	http://ceur-ws.org/Vol-2283/MediaEval_18_paper_59....
Copyright Information:	©2018 The Authors
Use License:	This item is licensed under a Creative Commons Attribution-NonCommercial-Share Alike 3.0 License. View License
ID Code:	23384
Deposited On:	30 May 2019 15:31 by INVALID USER. Last Modified 31 Jul 2019 08:51

Documents

Full text available as:

[thumbnail of Eyes_and_Ears_Together_-_New_Task_for_Multimodal_Spoken_Content_Analysis[1].pdf]

Preview

PDF - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
776kB

Downloads

Downloads per month over past year

Archive Staff Only: edit this record

DORAS | DCU Research Repository

Eyes and ears together: new task for multimodal spoken content analysis

Downloads