Utilization of multimodal interaction signals for automatic summarisation of academic presentations

Curtis, Keith (2018) Utilization of multimodal interaction signals for automatic summarisation of academic presentations. PhD thesis, Dublin City University.

Abstract
Metadata
Downloads
Documents

[+][-]

Abstract

Multimedia archives are expanding rapidly. For these, there exists a shortage of retrieval and summarisation techniques for accessing and browsing content where the main information exists in the audio stream. This thesis describes an investigation into the development of novel feature extraction and summarisation techniques for audio-visual recordings of academic presentations. We report on the development of a multimodal dataset of academic presentations. This dataset is labelled by human annotators to the concepts of presentation ratings, audience engagement levels, speaker emphasis, and audience comprehension. We investigate the automatic classification of speaker ratings and audience engagement by extracting audio-visual features from video of the presenter and audience and training classifiers to predict speaker ratings and engagement levels. Following this, we investigate automatic identi�cation of areas of emphasised speech. By analysing all human annotated areas of emphasised speech, minimum speech pitch and gesticulation are identified as indicating emphasised speech when occurring together. Investigations are conducted into the speaker's potential to be comprehended by the audience. Following crowdsourced annotation of comprehension levels during academic presentations, a set of audio-visual features considered most likely to affect comprehension levels are extracted. Classifiers are trained on these features and comprehension levels could be predicted over a 7-class scale to an accuracy of 49%, and over a binary distribution to an accuracy of 85%. Presentation summaries are built by segmenting speech transcripts into phrases, and using keywords extracted from the transcripts in conjunction with extracted paralinguistic features. Highest ranking segments are then extracted to build presentation summaries. Summaries are evaluated by performing eye-tracking experiments as participants watch presentation videos. Participants were found to be consistently more engaged for presentation summaries than for full presentations. Summaries were also found to contain a higher concentration of new information than full presentations.

Metadata

Item Type:	Thesis (PhD)
Date of Award:	November 2018
Refereed:	No
Supervisor(s):	Jones, Gareth J.F. and Campbell, Nick
Uncontrolled Keywords:	Video Summarisation, Feature Classification, Evaluation, Eye Tracking
Subjects:	Computer Science > Interactive computer systems Computer Science > Multimedia systems Computer Science > Image processing Computer Science > Digital video Computer Science > Information retrieval
DCU Faculties and Centres:	DCU Faculties and Schools > Faculty of Engineering and Computing > School of Computing
Use License:	This item is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 3.0 License. View License
Funders:	Science Foundation Ireland
ID Code:	22411
Deposited On:	16 Nov 2018 16:23 by Gareth Jones . Last Modified 13 Dec 2019 15:29

Documents

Full text available as:

Preview

PDF - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
34MB

Downloads

Downloads per month over past year

Archive Staff Only: edit this record

DORAS | DCU Research Repository

Utilization of multimodal interaction signals for automatic summarisation of academic presentations

Downloads