Using EEG Signals to Examine Next-Word Predictability of Language Models in Reading

Quach, Boi Mai

Quach, Boi Mai ORCID: 0000-0001-6429-6339 (2025) Using EEG Signals to Examine Next-Word Predictability of Language Models in Reading. PhD thesis, Dublin City University.

Abstract
Metadata
Downloads
Documents

[+][-]

Abstract

Language Models (LMs) are primarily developed to address different natural language processing (NLP) tasks, and are not intended to reflect human reading comprehension processing. However, since these models are trained with the same written materials that humans also process in reading, they should share many of the same abilities in the reading comprehension processes as a human, particularly in next-word predictability. Comparing the accuracy of different types of LMs and humans in the next-word prediction task demonstrates their predictive capabilities but falls short of confirming whether LMs and the human brain process reading in a similar manner. To address this, the thesis used electroencephalography (EEG) signals to examine next-word predictability of different LMs during reading. The main contributions of this thesis are as follows. First, we present a comprehensive resource for guiding EEG-based reading experiments, introduce a tailored preprocessing pipeline, and provide DERCo (Dublin EEG-based Reading Experiment Corpus), an openly accessible dataset combining EEG and next-word prediction data. Second, we use DERCo to analyse how the brain responds to various word categories (e.g., content words, function words, and grammatical classes), shedding light on the interaction between top-down and bottom-up processing during reading. We demonstrate an improved decoding methodology that strengthens our analytical capabilities. Additionally, we highlight the effectiveness of a decoding approach that enhances analytical power compared to the traditional event-related potentials (ERPs) in EEG data analysis. Lastly, leveraging surprisal, an information-theoretic metric, alongside accuracy, we build brain encoding models for different LMs and human prediction production to capture neural responses at the word level. Our evaluation reveals that while more advanced language models exhibit closer align ment with human prediction patterns, they fail to fully reflect the human-like reading processes observed in brain signals.

Metadata

Item Type:	Thesis (PhD)
Date of Award:	2025
Refereed:	No
Supervisor(s):	Healy, Graham and Gurrin, Cathal
Subjects:	Biological Sciences > Neuroscience Humanities > Biological Sciences > Neuroscience Computer Science > Artificial intelligence Computer Science > Machine learning Engineering > Biomedical engineering
DCU Faculties and Centres:	DCU Faculties and Schools > Faculty of Engineering and Computing > School of Computing
Use License:	This item is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 License. View License
Funders:	18/CRT/6183
ID Code:	31135
Deposited On:	21 Nov 2025 14:10 by Graham Healy . Last Modified 21 Nov 2025 14:10

Documents

Full text available as:

[thumbnail of MaiBoiQuach_PhD_Thesis.pdf]

Preview

PDF - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
Creative Commons: Attribution-Noncommercial-No Derivative Works 4.0
14MB

Downloads

Downloads per month over past year

Archive Staff Only: edit this record

DORAS | DCU Research Repository

Using EEG Signals to Examine Next-Word Predictability of Language Models in Reading

Downloads