Login (DCU Staff Only)
Login (DCU Staff Only)

DORAS | DCU Research Repository

Explore open access research and scholarly works from DCU

Advanced Search

MultiNews: a web collection of an aligned multimodal and multilingual corpus

Afli, Haithem orcid logoORCID: 0000-0002-7449-4707, Lohar, Pintu orcid logoORCID: 0000-0002-5328-1585 and Way, Andy orcid logoORCID: 0000-0001-5736-5930 (2017) MultiNews: a web collection of an aligned multimodal and multilingual corpus. In: Workshop on Curation and Applications of Parallel and Comparable Corpora, 27 Nov- 1 Dec 2017, Taipei, Taiwan. ISBN 978-1-948087-05-6

Abstract
Integrating Natural Language Processing (NLP) and computer vision is a promising effort. However, the applicability of these methods directly depends on the availability of a specific multimodal data that includes images and texts. In this paper, we present a collection of a Multimodal corpus of comparable document and their images in 9 languages from the web news articles of Euronews website.1 This corpus has found widespread use in the NLP community in Multilingual and multimodal tasks. Here, we focus on its acquisition of the images and text data and their multilingual alignment.
Metadata
Item Type:Conference or Workshop Item (Paper)
Event Type:Workshop
Refereed:Yes
Subjects:Computer Science > Machine translating
DCU Faculties and Centres:DCU Faculties and Schools > Faculty of Engineering and Computing > School of Computing
Research Institutes and Centres > ADAPT
Published in: Afli, Haithem and Liu, Chao-Hong, (eds.) Proceedings of the Workshop on Curation and Applications of Parallel and Comparable Corpora. . Asian Federation of Natural Language Processing. ISBN 978-1-948087-05-6
Publisher:Asian Federation of Natural Language Processing
Official URL:https://www.aclweb.org/anthology/W17-56
Copyright Information:© 2017 AFNLP
Use License:This item is licensed under a Creative Commons Attribution-NonCommercial-Share Alike 3.0 License. View License
Funders:Science Foundation Ireland through ADAPT Centre (Grant 13/RC/2106) (www.adaptcentre.ie) at Dublin City University
ID Code:23356
Deposited On:24 May 2019 15:12 by Thomas Murtagh . Last Modified 05 May 2023 16:28
Documents

Full text available as:

[thumbnail of MultiNews_-_A_Web_collection_of_an_Aligned_Multimodal_and_Multilingual_Corpus[1].pdf]
Preview
PDF - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
780kB
Downloads

Downloads

Downloads per month over past year

Archive Staff Only: edit this record