Eskevich, Maria ORCID: 0000-0002-1242-0753 (2014) Towards effective retrieval of spontaneous conversational spoken content. PhD thesis, Dublin City University.
Abstract
The continuing development in the technologies available for recording and storage of multimedia content means that the volume of archived digital material is growing rapidly. While some of it is formally structured and edited, increasing amounts of it are user generated and informal.
We report an extensive investigation into effectiveness of speech search for challenging informally structured spoken content archives and the development of methods that address the identified challenges. We explore the relationship between automatic speech recognition (ASR) accuracy, automated segmentation of the informal content into semantically focused retrieval units and retrieval behaviour. We introduce new evaluation metrics designed to assess retrieval results according to different aspects of the user experience.
Our studies concentrate on three types of data that contain natural conversations: lectures, meetings and Internet TV. Our experiments provide a deep understanding of the challenges and issues related to spoken content retrieval (SCR). For all these types of data, effective segmentation of the spoken content is demonstrated to significantly improve search effectiveness.
SCR output consists of audio or video files, even if the system is based on their textual representation. Thus these result lists are difficult to browse through, since the user has to listen to the audio content or watch the video segments. Therefore, it is important to start the playback as close to the beginning of the relevant content (jump-in point) in a segment as possible.
Based on our analysis of the issues relating to retrieval success and failure, we report a study of methods to improve retrieval effectiveness from the perspective of content ranking and access to relevant content in retrieved materials. The methods explored in this thesis examine alternative segmentation strategies, content expansion based on internal and external information sources, and exploration of the utilization of acoustic information corresponding to the ASR transcripts.
Metadata
Item Type: | Thesis (PhD) |
---|---|
Date of Award: | November 2014 |
Refereed: | No |
Supervisor(s): | Jones, Gareth J.F. |
Uncontrolled Keywords: | Speech retrieval; Spoken content retrieval |
Subjects: | Computer Science > Information storage and retrieval systems Computer Science > Multimedia systems Computer Science > Information retrieval |
DCU Faculties and Centres: | Research Institutes and Centres > Centre for Digital Video Processing (CDVP) Research Institutes and Centres > Centre for Next Generation Localisation (CNGL) DCU Faculties and Schools > Faculty of Engineering and Computing > School of Computing |
Use License: | This item is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 3.0 License. View License |
Funders: | Science Foundation Ireland |
ID Code: | 20197 |
Deposited On: | 26 Nov 2014 10:50 by Gareth Jones . Last Modified 10 Oct 2018 09:11 |
Documents
Full text available as:
Preview |
PDF
- Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
12MB |
Downloads
Downloads
Downloads per month over past year
Archive Staff Only: edit this record