Ur Rahman, Shams (2025) Multimodal Deep Learning for Driver Monitoring: Integrating EEG and Vision for Robust Drowsiness Detection and Safety Enhancement. PhD thesis, Dublin City University.
Abstract
Road accidents remain a major global concern, with driver drowsiness and delayed reaction times recognized as key contributing factors. This thesis advances driver-monitoring research by developing multimodal approaches that integrate electroencephalography (EEG) and vision data to predict reaction times and drowsiness. The investigation first demonstrates that pre-stimulus EEG signals—specifically spectral power in the alpha and theta bands—contain rich information for estimating reaction times to critical road events. Using subject-independent machine-learning pipelines, short EEG windows recorded before event onset effectively differentiate between fast and slow responses. The work then explores the benefits of incorporating vision data as a second
modality by fusing EEG signals with camera-based observations of the driver. One branch converts EEG power-spectral-density features into image-like representations for analysis with deep convolutional neural networks and transformer models. Another branch directly integrates raw EEG signals with synchronised video frames through end-toend multimodal transformer architectures. Results indicate that transformers equipped with cross-modal attention capture complex interdependencies between neural and visual cues, yielding significant improvements in driver-drowsiness detection over unimodal approaches. Real-time deployment is addressed by designing and optimising a lightweight pipeline for edge-based processing. This resource-efficient model enables rapid analysis of facial cues under diverse driving conditions, ensuring operation on embedded platforms such as smartphones and automotive edge devices. Extensive evaluations on large-scale simulated datasets confirm the generalisability of the proposed approaches across varied driving scenarios. Experiments reveal that transformer-based fusion significantly enhances predictive performance by effectively combining complementary neural and visual cues. Moreover, the lightweight pipeline maintains high accuracy under stringent computational constraints, enabling real-time, ondevice deployment.
Metadata
| Item Type: | Thesis (PhD) |
|---|---|
| Date of Award: | 1 September 2025 |
| Refereed: | No |
| Supervisor(s): | O'Connor, Noel and Healy, Graham |
| Subjects: | Computer Science > Artificial intelligence Computer Science > Image processing Computer Science > Machine learning Computer Science > Digital video |
| DCU Faculties and Centres: | DCU Faculties and Schools > Faculty of Engineering and Computing > School of Computing DCU Faculties and Schools > Faculty of Engineering and Computing > School of Electronic Engineering Research Institutes and Centres > INSIGHT Centre for Data Analytics |
| Use License: | This item is licensed under a Creative Commons Attribution-NonCommercial-Share Alike 4.0 License. View License |
| Funders: | Science Foundation Ireland |
| ID Code: | 31498 |
| Deposited On: | 21 Nov 2025 14:37 by Noel Edward O'connor . Last Modified 21 Nov 2025 14:37 |
Documents
Full text available as:
Preview |
PDF
- Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
Creative Commons: Attribution-Noncommercial-No Derivative Works 4.0 3MB |
Downloads
Downloads
Downloads per month over past year
Archive Staff Only: edit this record