Login (DCU Staff Only)
Login (DCU Staff Only)

DORAS | DCU Research Repository

Explore open access research and scholarly works from DCU

Advanced Search

Spoken content retrieval beyond pipeline integration of automatic speech recognition and information retrieval

Racca, David (2018) Spoken content retrieval beyond pipeline integration of automatic speech recognition and information retrieval. PhD thesis, Dublin City University.

Abstract
The dramatic increase in the creation of multimedia content is leading to the development of large archives in which a substantial amount of the information is in spoken form. Efficient access to this information requires effective spoken content retrieval (SCR) methods. Traditionally, SCR systems have focused on a pipeline integration of two fundamental technologies: transcription using automatic speech recognition (ASR) and search supported using text-based information retrieval (IR). Existing SCR approaches estimate the relevance of a spoken retrieval item based on the lexical overlap between a user’s query and the textual transcriptions of the items. However, the speech signal contains other potentially valuable non-lexical information that remains largely unexploited by SCR approaches. Particularly, acoustic correlates of speech prosody, that have been shown useful to identify salient words and determine topic changes, have not been exploited by existing SCR approaches. In addition, the temporal nature of multimedia content means that accessing content is a user intensive, time consuming process. In order to minimise user effort in locating relevant content, SCR systems could suggest playback points in retrieved content indicating the locations where the system believes relevant information may be found. This typically requires adopting a segmentation mechanism for splitting documents into smaller “elements” to be ranked and from which suitable playback points could be selected. Existing segmentation approaches do not generalise well to every possible information need or provide robustness to ASR errors. This thesis extends SCR beyond the standard ASR and IR pipeline approach by: (i) exploring the utilisation of prosodic information as complementary evidence of topical relevance to enhance current SCR approaches; (ii) determining elements of content that, when retrieved, minimise user search effort and provide increased robustness to ASR errors; and (iii) developing enhanced evaluation measures that could better capture the factors that affect user satisfaction in SCR.
Metadata
Item Type:Thesis (PhD)
Date of Award:November 2018
Refereed:No
Supervisor(s):Jones, Gareth J.F.
Uncontrolled Keywords:spoken content retrieval; speech search; prosody; unstructured speech search; speech search evaluation
Subjects:Computer Science > Artificial intelligence
Computer Science > Information retrieval
Computer Science > Interactive computer systems
Computer Science > Machine learning
Computer Science > Information storage and retrieval systems
DCU Faculties and Centres:DCU Faculties and Schools > Faculty of Engineering and Computing > School of Computing
Research Institutes and Centres > ADAPT
Use License:This item is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 3.0 License. View License
Funders:Science Foundation Ireland
ID Code:22473
Deposited On:22 Nov 2018 12:01 by Gareth Jones . Last Modified 13 Dec 2019 15:28
Documents

Full text available as:

[thumbnail of DRacca_Thesis_Final.pdf]
Preview
PDF - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
8MB
Downloads

Downloads

Downloads per month over past year

Archive Staff Only: edit this record