Login (DCU Staff Only)
Login (DCU Staff Only)

DORAS | DCU Research Repository

Explore open access research and scholarly works from DCU

Advanced Search

Utilization of multimodal interaction signals for automatic summarisation of academic presentations

Curtis, Keith (2018) Utilization of multimodal interaction signals for automatic summarisation of academic presentations. PhD thesis, Dublin City University.

Abstract
Multimedia archives are expanding rapidly. For these, there exists a shortage of retrieval and summarisation techniques for accessing and browsing content where the main information exists in the audio stream. This thesis describes an investigation into the development of novel feature extraction and summarisation techniques for audio-visual recordings of academic presentations. We report on the development of a multimodal dataset of academic presentations. This dataset is labelled by human annotators to the concepts of presentation ratings, audience engagement levels, speaker emphasis, and audience comprehension. We investigate the automatic classification of speaker ratings and audience engagement by extracting audio-visual features from video of the presenter and audience and training classifiers to predict speaker ratings and engagement levels. Following this, we investigate automatic identi�cation of areas of emphasised speech. By analysing all human annotated areas of emphasised speech, minimum speech pitch and gesticulation are identified as indicating emphasised speech when occurring together. Investigations are conducted into the speaker's potential to be comprehended by the audience. Following crowdsourced annotation of comprehension levels during academic presentations, a set of audio-visual features considered most likely to affect comprehension levels are extracted. Classifiers are trained on these features and comprehension levels could be predicted over a 7-class scale to an accuracy of 49%, and over a binary distribution to an accuracy of 85%. Presentation summaries are built by segmenting speech transcripts into phrases, and using keywords extracted from the transcripts in conjunction with extracted paralinguistic features. Highest ranking segments are then extracted to build presentation summaries. Summaries are evaluated by performing eye-tracking experiments as participants watch presentation videos. Participants were found to be consistently more engaged for presentation summaries than for full presentations. Summaries were also found to contain a higher concentration of new information than full presentations.
Metadata
Item Type:Thesis (PhD)
Date of Award:November 2018
Refereed:No
Supervisor(s):Jones, Gareth J.F. and Campbell, Nick
Uncontrolled Keywords:Video Summarisation, Feature Classification, Evaluation, Eye Tracking
Subjects:Computer Science > Interactive computer systems
Computer Science > Multimedia systems
Computer Science > Image processing
Computer Science > Digital video
Computer Science > Information retrieval
DCU Faculties and Centres:DCU Faculties and Schools > Faculty of Engineering and Computing > School of Computing
Use License:This item is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 3.0 License. View License
Funders:Science Foundation Ireland
ID Code:22411
Deposited On:16 Nov 2018 16:23 by Gareth Jones . Last Modified 13 Dec 2019 15:29
Documents

Full text available as:

[thumbnail of Thesis.pdf]
Preview
PDF - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
34MB
Downloads

Downloads

Downloads per month over past year

Archive Staff Only: edit this record