Login (DCU Staff Only)
Login (DCU Staff Only)

DORAS | DCU Research Repository

Explore open access research and scholarly works from DCU

Advanced Search

Multimodal Deep Learning for Driver Monitoring: Integrating EEG and Vision for Robust Drowsiness Detection and Safety Enhancement

Ur Rahman, Shams (2025) Multimodal Deep Learning for Driver Monitoring: Integrating EEG and Vision for Robust Drowsiness Detection and Safety Enhancement. PhD thesis, Dublin City University.

Abstract
Road accidents remain a major global concern, with driver drowsiness and delayed reaction times recognized as key contributing factors. This thesis advances driver-monitoring research by developing multimodal approaches that integrate electroencephalography (EEG) and vision data to predict reaction times and drowsiness. The investigation first demonstrates that pre-stimulus EEG signals—specifically spectral power in the alpha and theta bands—contain rich information for estimating reaction times to critical road events. Using subject-independent machine-learning pipelines, short EEG windows recorded before event onset effectively differentiate between fast and slow responses. The work then explores the benefits of incorporating vision data as a second modality by fusing EEG signals with camera-based observations of the driver. One branch converts EEG power-spectral-density features into image-like representations for analysis with deep convolutional neural networks and transformer models. Another branch directly integrates raw EEG signals with synchronised video frames through end-toend multimodal transformer architectures. Results indicate that transformers equipped with cross-modal attention capture complex interdependencies between neural and visual cues, yielding significant improvements in driver-drowsiness detection over unimodal approaches. Real-time deployment is addressed by designing and optimising a lightweight pipeline for edge-based processing. This resource-efficient model enables rapid analysis of facial cues under diverse driving conditions, ensuring operation on embedded platforms such as smartphones and automotive edge devices. Extensive evaluations on large-scale simulated datasets confirm the generalisability of the proposed approaches across varied driving scenarios. Experiments reveal that transformer-based fusion significantly enhances predictive performance by effectively combining complementary neural and visual cues. Moreover, the lightweight pipeline maintains high accuracy under stringent computational constraints, enabling real-time, ondevice deployment.
Metadata
Item Type:Thesis (PhD)
Date of Award:1 September 2025
Refereed:No
Supervisor(s):O'Connor, Noel and Healy, Graham
Subjects:Computer Science > Artificial intelligence
Computer Science > Image processing
Computer Science > Machine learning
Computer Science > Digital video
DCU Faculties and Centres:DCU Faculties and Schools > Faculty of Engineering and Computing > School of Computing
DCU Faculties and Schools > Faculty of Engineering and Computing > School of Electronic Engineering
Research Institutes and Centres > INSIGHT Centre for Data Analytics
Use License:This item is licensed under a Creative Commons Attribution-NonCommercial-Share Alike 4.0 License. View License
Funders:Science Foundation Ireland
ID Code:31498
Deposited On:21 Nov 2025 14:37 by Noel Edward O'connor . Last Modified 21 Nov 2025 14:37
Documents

Full text available as:

[thumbnail of shams_thesis_final_version.pdf]
Preview
PDF - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
Creative Commons: Attribution-Noncommercial-No Derivative Works 4.0
3MB
Downloads

Downloads

Downloads per month over past year

Archive Staff Only: edit this record