Login (DCU Staff Only)
Login (DCU Staff Only)

DORAS | DCU Research Repository

Explore open access research and scholarly works from DCU

Advanced Search

Utilizing sub-topical structure of documents for information retrieval.

Ganguly, Debasis orcid logoORCID: 0000-0003-0050-7138, Leveling, Johannes orcid logoORCID: 0000-0003-0603-4191 and Jones, Gareth J.F. orcid logoORCID: 0000-0003-2923-8365 (2011) Utilizing sub-topical structure of documents for information retrieval. In: 4th Workshop for Ph.D. Students in Information and Knowledge Management (PIKM 2011) at CIKM 2011, 28 Oct 2011, Glasgow, Scotland.

Abstract
Text segmentation in natural language processing typically refers to the process of decomposing a document into constituent subtopics. Our work centers on the application of text segmentation techniques within information retrieval (IR) tasks. For example, for scoring a document by combining the retrieval scores of its constituent segments, exploiting the proximity of query terms in documents for ad-hoc search, and for question answering (QA), where retrieved passages from multiple documents are aggregated and presented as a single document to a searcher. Feedback in ad hoc IR task is shown to benefit from the use of extracted sentences instead of terms from the pseudo relevant documents for query expansion. Retrieval effectiveness for patent prior art search task is enhanced by applying text segmentation to the patent queries. Another aspect of our work involves augmenting text segmentation techniques to produce segments which are more readable with less unresolved anaphora. This is particularly useful for QA and snippet generation tasks where the objective is to aggregate relevant and novel information from multiple documents satisfying user information need on one hand, and ensuring that the automatically generated content presented to the user is easily readable without reference to the original source document.
Metadata
Item Type:Conference or Workshop Item (Paper)
Event Type:Workshop
Refereed:Yes
Uncontrolled Keywords:Document Segmentation; Query Segmentation
Subjects:Computer Science > Information retrieval
DCU Faculties and Centres:Research Institutes and Centres > Centre for Next Generation Localisation (CNGL)
DCU Faculties and Schools > Faculty of Engineering and Computing > School of Computing
Use License:This item is licensed under a Creative Commons Attribution-NonCommercial-Share Alike 3.0 License. View License
ID Code:16518
Deposited On:27 Oct 2011 10:22 by Shane Harper . Last Modified 25 Oct 2018 10:26
Documents

Full text available as:

[thumbnail of Utilizing_sub-topical_structure_of_documents_for_Information_Retrieval.pdf]
Preview
PDF - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
186kB
Downloads

Downloads

Downloads per month over past year

Archive Staff Only: edit this record