Quach, Boi Mai
ORCID: 0000-0001-6429-6339
(2025)
Using EEG Signals to Examine Next-Word Predictability of Language Models in Reading.
PhD thesis, Dublin City University.
Abstract
Language Models (LMs) are primarily developed to address different natural language processing (NLP) tasks, and are not intended to reflect human reading comprehension processing. However, since these models are trained with the same written materials that humans also process in reading, they should share many of the same abilities in the reading comprehension processes as a human, particularly in next-word predictability. Comparing the accuracy of different types of LMs and humans in the next-word prediction task demonstrates their predictive capabilities but falls short of confirming whether LMs and the human brain process reading in a similar manner. To address this, the thesis used electroencephalography (EEG) signals to examine next-word predictability of different LMs during reading.
The main contributions of this thesis are as follows. First, we present a comprehensive resource for guiding EEG-based reading experiments, introduce a tailored preprocessing pipeline, and provide DERCo (Dublin EEG-based Reading Experiment Corpus), an openly accessible dataset combining EEG and next-word prediction data. Second, we use DERCo to analyse how the brain responds to various word categories (e.g., content words, function words, and grammatical classes), shedding light on the interaction between top-down and bottom-up processing during reading.
We demonstrate an improved decoding methodology that strengthens our analytical capabilities. Additionally, we highlight the effectiveness of a decoding approach that enhances analytical power compared to the traditional event-related potentials (ERPs) in EEG data analysis. Lastly, leveraging surprisal, an information-theoretic metric, alongside accuracy, we build brain encoding models for different LMs and human prediction production to capture neural responses at the word level. Our evaluation reveals that while more advanced language models exhibit closer align ment with human prediction patterns, they fail to fully reflect the human-like reading processes observed in brain signals.
Metadata
| Item Type: | Thesis (PhD) |
|---|---|
| Date of Award: | 2025 |
| Refereed: | No |
| Supervisor(s): | Healy, Graham and Gurrin, Cathal |
| Subjects: | Biological Sciences > Neuroscience Humanities > Biological Sciences > Neuroscience Computer Science > Artificial intelligence Computer Science > Machine learning Engineering > Biomedical engineering |
| DCU Faculties and Centres: | DCU Faculties and Schools > Faculty of Engineering and Computing > School of Computing |
| Use License: | This item is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 License. View License |
| Funders: | 18/CRT/6183 |
| ID Code: | 31135 |
| Deposited On: | 21 Nov 2025 14:10 by Graham Healy . Last Modified 21 Nov 2025 14:10 |
Documents
Full text available as:
Preview |
PDF
- Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
Creative Commons: Attribution-Noncommercial-No Derivative Works 4.0 14MB |
Downloads
Downloads
Downloads per month over past year
Archive Staff Only: edit this record