Multimodal Deep Learning for Driver Monitoring: Integrating EEG and Vision for Robust Drowsiness Detection and Safety Enhancement

Ur Rahman, Shams

Abstract

Road accidents remain a major global concern, with driver drowsiness and delayed reaction times recognized as key contributing factors. This thesis advances driver-monitoring research by developing multimodal approaches that integrate electroencephalography (EEG) and vision data to predict reaction times and drowsiness. The investigation first demonstrates that pre-stimulus EEG signals—specifically spectral power in the alpha and theta bands—contain rich information for estimating reaction times to critical road events. Using subject-independent machine-learning pipelines, short EEG windows recorded before event onset effectively differentiate between fast and slow responses. The work then explores the benefits of incorporating vision data as a second modality by fusing EEG signals with camera-based observations of the driver. One branch converts EEG power-spectral-density features into image-like representations for analysis with deep convolutional neural networks and transformer models. Another branch directly integrates raw EEG signals with synchronised video frames through end-toend multimodal transformer architectures. Results indicate that transformers equipped with cross-modal attention capture complex interdependencies between neural and visual cues, yielding significant improvements in driver-drowsiness detection over unimodal approaches. Real-time deployment is addressed by designing and optimising a lightweight pipeline for edge-based processing. This resource-efficient model enables rapid analysis of facial cues under diverse driving conditions, ensuring operation on embedded platforms such as smartphones and automotive edge devices. Extensive evaluations on large-scale simulated datasets confirm the generalisability of the proposed approaches across varied driving scenarios. Experiments reveal that transformer-based fusion significantly enhances predictive performance by effectively combining complementary neural and visual cues. Moreover, the lightweight pipeline maintains high accuracy under stringent computational constraints, enabling real-time, ondevice deployment.

Item Type:

Thesis (PhD)

Date of Award:

1 September 2025

Refereed:

Supervisor(s):

O'Connor, Noel and Healy, Graham

Subjects:

Computer Science > Artificial intelligence
Computer Science > Image processing
Computer Science > Machine learning
Computer Science > Digital video

DCU Faculties and Centres:

DCU Faculties and Schools > Faculty of Engineering and Computing > School of Computing
DCU Faculties and Schools > Faculty of Engineering and Computing > School of Electronic Engineering
Research Institutes and Centres > INSIGHT Centre for Data Analytics

Use License:

This item is licensed under a Creative Commons Attribution-NonCommercial-Share Alike 4.0 License. View License

Funders:

Science Foundation Ireland

ID Code:

31498

Deposited On:

21 Nov 2025 14:37 by Noel Edward O'connor . Last Modified 21 Nov 2025 14:37

DORAS | DCU Research Repository

Multimodal Deep Learning for Driver Monitoring: Integrating EEG and Vision for Robust Drowsiness Detection and Safety Enhancement

Downloads