Login (DCU Staff Only)
Login (DCU Staff Only)

DORAS | DCU Research Repository

Explore open access research and scholarly works from DCU

Advanced Search

Developing a dataset for evaluating approaches for document expansion with images

Ganguly, Debasis orcid logoORCID: 0000-0003-0050-7138, Calixto, Iacer and Jones, Gareth J.F. orcid logoORCID: 0000-0003-2923-8365 (2016) Developing a dataset for evaluating approaches for document expansion with images. In: Tenth International Conference on Language Resources and Evaluation (LREC 2016), 23-28 May 2016, Portorož, Slovenia. ISBN 978-2-9517408-9-1

Abstract
Motivated by the adage that a “picture is worth a thousand words” it can be reasoned that automatically enriching the textual content of a document with relevant images can increase the readability of a document. Moreover, features extracted from the additional image data inserted into the textual content of a document may, in principle, be also be used by a retrieval engine to better match the topic of a document with that of a given query. In this paper, we describe our approach of building a ground truth dataset to enable further research into automatic addition of relevant images to text documents. The dataset is comprised of the official ImageCLEF 2010 collection (a collection of images with textual metadata) to serve as the images available for automatic enrichment of text, a set of 25 benchmark documents that are to be enriched, which in this case are children’s short stories, and a set of manually judged relevant images for each query story obtained by the standard procedure of depth pooling. We use this benchmark dataset to evaluate the effectiveness of standard information retrieval methods as simple baselines for this task. The results indicate that using the whole story as a weighted query, where the weight of each query term is its tf-idf value, achieves an precision of 0.1714 within the top 5 retrieved images on an average.
Metadata
Item Type:Conference or Workshop Item (Paper)
Event Type:Conference
Refereed:Yes
Uncontrolled Keywords:Image Retrieval; Document Augmentation with Images
Subjects:Computer Science > Information retrieval
DCU Faculties and Centres:DCU Faculties and Schools > Faculty of Engineering and Computing > School of Computing
Research Institutes and Centres > ADAPT
Published in: Calzolari, Nicoletta, Choukri, Khalid and Declerck, Thierry, (eds.) Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016). . European Language Resources Association (ELRA). ISBN 978-2-9517408-9-1
Publisher:European Language Resources Association (ELRA)
Official URL:https://www.aclweb.org/anthology/L16-1299
Copyright Information:©2016 LREC. Proceedings are licensed under a Creative Commons Attribution-NonCommercial 4.0 International License
Use License:This item is licensed under a Creative Commons Attribution-NonCommercial-Share Alike 3.0 License. View License
Funders:Science Foundation Ireland (SFI) as a part of the ADAPT Centre at DCU (Grant No: 13/RC/2106).
ID Code:23381
Deposited On:29 May 2019 15:52 by Thomas Murtagh . Last Modified 29 May 2019 15:52
Documents

Full text available as:

[thumbnail of Developing_a_Dataset_for_Evaluating_Approaches_for_Document_Expansion_with_Images[1].pdf]
Preview
PDF - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
2MB
Metrics

Altmetric Badge

Dimensions Badge

Downloads

Downloads

Downloads per month over past year

Archive Staff Only: edit this record