Login (DCU Staff Only)
Login (DCU Staff Only)

DORAS | DCU Research Repository

Explore open access research and scholarly works from DCU

Advanced Search

Utilisation of metadata fields and query expansion in cross-lingual search of user-generated Internet video

Khwileh, Ahmad, Ganguly, Debasis orcid logoORCID: 0000-0003-0050-7138 and Jones, Gareth J.F. orcid logoORCID: 0000-0002-4033-9135 (2016) Utilisation of metadata fields and query expansion in cross-lingual search of user-generated Internet video. Journal of Artificial Intelligence Research, 55 . pp. 249-281. ISSN 1943-5037

Abstract
Recent years have seen signicant eorts in the area of Cross Language Information Retrieval (CLIR) for text retrieval. This work initially focused on formally published content, but more recently research has begun to concentrate on CLIR for informal social media content. However, despite the current expansion in online multimedia archives, there has been little work on CLIR for this content. While there has been some limited work on Cross-Language Video Retrieval (CLVR) for professional videos, such as documentaries or TV news broadcasts, there has to date, been no signicant investigation of CLVR for the rapidly growing archives of informal user generated (UGC) content. Key differences between such UGC and professionally produced content are the nature and structure of the textual UGC metadata associated with it, as well as the form and quality of the content itself. In this setting, retrieval eectiveness may not only suer from translation errors common to all CLIR tasks, but also recognition errors associated with the automatic speech recognition (ASR) systems used to transcribe the spoken content of the video and with the informality and inconsistency of the associated user-created metadata for each video. This work proposes and evaluates techniques to improve CLIR effectiveness of such noisy UGC content. Our experimental investigation shows that dierent sources of evidence, e.g. the content from dierent elds of the structured metadata, significantly affect CLIR effectiveness. Results from our experiments also show that each metadata eld has a varying robustness to query expansion (QE) and hence can have a negative impact on the CLIR eectiveness. Our work proposes a novel adaptive QE technique that predicts the most reliable source for expansion and shows how this technique can be effective for improving CLIR effectiveness for UGC content.
Metadata
Item Type:Article (Published)
Refereed:Yes
Uncontrolled Keywords:Cross-language information retrieval; Speech retrieval; Internet video
Subjects:Computer Science > Machine translating
Computer Science > Information storage and retrieval systems
Computer Science > Multimedia systems
Computer Science > Information retrieval
DCU Faculties and Centres:DCU Faculties and Schools > Faculty of Engineering and Computing > School of Computing
Research Institutes and Centres > ADAPT
Publisher:AI Access Foundation Inc.
Official URL:http://dx.doi.org/10.1613/jair.4775
Copyright Information:© 2016 AI Access Foundation Inc.
Use License:This item is licensed under a Creative Commons Attribution-NonCommercial-Share Alike 3.0 License. View License
Funders:Science Foundation Ireland
ID Code:21237
Deposited On:22 Jun 2016 11:09 by Gareth Jones . Last Modified 22 Jul 2019 15:13
Documents

Full text available as:

[thumbnail of AK-DG-GJ-jair.pdf]
Preview
PDF - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
833kB
Downloads

Downloads

Downloads per month over past year

Archive Staff Only: edit this record