Login (DCU Staff Only)
Login (DCU Staff Only)

DORAS | DCU Research Repository

Explore open access research and scholarly works from DCU

Advanced Search

An LDA-smoothed relevance model for document expansion: a case study for spoken document retrieval

Ganguly, Debasis orcid logoORCID: 0000-0003-0050-7138, Leveling, Johannes orcid logoORCID: 0000-0003-0603-4191 and Jones, Gareth J.F. orcid logoORCID: 0000-0003-2923-8365 (2013) An LDA-smoothed relevance model for document expansion: a case study for spoken document retrieval. In: 36th international ACM SIGIR conference on Research and development in information retrieval (SIGIR 2013), 28 July - 1 Aug 2013, Dublin, Ireland.

Document expansion (DE) in information retrieval (IR) involves modifying each document in the collection by introducing additional terms into the document. It is particularly useful to improve retrieval of short and noisy documents where the additional terms can improve the description of the document content. Existing approaches to DE assume that documents to be expanded are from a single topic. In the case of multi-topic documents this can lead to a topic bias in terms selected for DE and hence may result in poor retrieval quality due to the lack of coverage of the original document topics in the expanded document. This paper proposes a new DE technique providing a more uniform selection and weighting of DE terms from all constituent topics. We show that our proposed method significantly outperforms the most recently reported relevance model based DE method on a spoken document retrieval task for both manual and automatic speech recognition transcripts.
Item Type:Conference or Workshop Item (Paper)
Event Type:Conference
Uncontrolled Keywords:Document Expansion; Topic Modelling
Subjects:Computer Science > Information retrieval
DCU Faculties and Centres:Research Institutes and Centres > Centre for Next Generation Localisation (CNGL)
DCU Faculties and Schools > Faculty of Engineering and Computing > School of Computing
Published in: Proceedings of ACM SIGIR 2013. .
Use License:This item is licensed under a Creative Commons Attribution-NonCommercial-Share Alike 3.0 License. View License
Funders:Science Foundation Ireland
ID Code:20375
Deposited On:15 Jan 2015 14:48 by Gareth Jones . Last Modified 25 Oct 2018 09:46

Full text available as:

[thumbnail of sp033-ganguly.pdf]
PDF - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader


Downloads per month over past year

Archive Staff Only: edit this record