MultiNews:
a web collection of an aligned multimodal and multilingual corpus
Afli, HaithemORCID: 0000-0002-7449-4707, Lohar, PintuORCID: 0000-0002-5328-1585 and Way, AndyORCID: 0000-0001-5736-5930
(2017)
MultiNews:
a web collection of an aligned multimodal and multilingual corpus.
In: Workshop on Curation and Applications of Parallel and Comparable Corpora, 27 Nov- 1 Dec 2017, Taipei, Taiwan.
ISBN 978-1-948087-05-6
Integrating Natural Language Processing
(NLP) and computer vision is a promising
effort. However, the applicability of these
methods directly depends on the availability of a specific multimodal data that includes images and texts. In this paper, we
present a collection of a Multimodal corpus of comparable document and their images in 9 languages from the web news articles of Euronews website.1 This corpus
has found widespread use in the NLP community in Multilingual and multimodal
tasks. Here, we focus on its acquisition
of the images and text data and their multilingual alignment.
Afli, Haithem and Liu, Chao-Hong, (eds.)
Proceedings of the Workshop on Curation and Applications of Parallel and Comparable Corpora.
.
Asian Federation of Natural Language Processing. ISBN 978-1-948087-05-6