Browse DORAS
Browse Theses
Search
Latest Additions
Creative Commons License
Except where otherwise noted, content on this site is licensed for use under a:

Examining and improving the effectiveness of relevance feedback for retrieval of scanned text documents

M. Lam-Adesina, Adenike and Jones, Gareth J.F. (2006) Examining and improving the effectiveness of relevance feedback for retrieval of scanned text documents. Information Processing and Management, 42 (3). pp. 633-649. ISSN 0306-4573

Full text available as:

[img]PDF - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
174Kb

Abstract

Important legacy paper documents are digitized and collected in online accessible archives. This enables the preservation, sharing, and significantly the searching of these documents. The text contents of these document images can be transcribed automatically using OCR systems and then stored in an information retrieval system. However, OCR systems make errors in character recognition which have previously been shown to impact on document retrieval behaviour. In particular relevance feedback query-expansion methods, which are often effective for improving electronic text retrieval, are observed to be less reliable for retrieval of scanned document images. Our experimental examination of the effects of character recognition errors on an ad hoc OCR retrieval task demonstrates that, while baseline information retrieval can remain relatively unaffected by transcription errors, relevance feedback via query expansion becomes highly unstable. This paper examines the reason for this behaviour, and introduces novel modifications to standard relevance feedback methods. These methods are shown experimentally to improve the effectiveness of relevance feedback for errorful OCR transcriptions. The new methods combine similar recognised character strings based on term collection frequency and a string edit-distance measure. The techniques are domain independent and make no use of external resources such as dictionaries or training data.

Item Type:Article (Published)
Refereed:Yes
Subjects:Computer Science > Information retrieval
DCU Faculties and Centres:Research Initiatives and Centres > Centre for Digital Video Processing (CDVP)
DCU Faculties and Schools > Faculty of Engineering and Computing > School of Computing
Publisher:Elsevier
Official URL:http://www.sciencedirect.com/science?_ob=MImg&_imagekey=B6VC8-4HCMSWK-2-4&_cdi=5948&_user=78294&_pii=S0306457305000774&_origin=browse&_coverDate=05%2F31%2F2006&_sk=999579996&view=c&wchp=dGLbVlz-zSkWb&md5=7c1d9b5fa11bdeb3053e6bdc85db25ed&ie=/sdarticle.pdf
Copyright Information:© Elsevier Ltd. 2006.
Use License:This item is licensed under a Creative Commons Attribution-NonCommercial-Share Alike 3.0 License. View License
ID Code:16282
Deposited On:17 May 2011 10:59 by Shane Harper. Last Modified 17 May 2011 10:59

Download statistics

Archive Staff Only: edit this record