Login (DCU Staff Only)
Login (DCU Staff Only)

DORAS | DCU Research Repository

Explore open access research and scholarly works from DCU

Advanced Search

Predicting Gestational Diabetes Mellitus from Routinely-Collected Data in Electronic Health Records

Germaine, Mark (2026) Predicting Gestational Diabetes Mellitus from Routinely-Collected Data in Electronic Health Records. PhD thesis, Dublin City University.

Abstract
Machine learning (ML) techniques are increasingly applied to electronic health records (EHRs) for earlier clinical insights. Gestational diabetes mellitus (GDM), currently screened at 24–28 weeks, is ideal candidate for these models because demographic and clinical data are available before screening takes place. Therefore, this thesis examined whether first-trimester and obstetric EHR data can identify women at elevated GDM risk before standard screening. A systematic review and meta-analysis of 38 studies (>2 million pregnancies) established the existing evidence base. Data from the Coombe Hospital EHRs were processed, including validation of GDM diagnoses against a clinical team database. ML and statistical models were developed and internally validated using first-trimester data (n=27,561) and data from previous pregnancies (n=4,005). A novel reciprocal external validation framework was implemented in collaboration with an Australian research group, to assess model transportability without direct data sharing. Finally, a developed prognostic model was prospectively validated in a clinical setting at the Coombe Hospital. This structured progression from foundational data issues to clinical application reflects a deliberate effort to address the multifaceted challenges beyond mere algorithmic performance that often hinder the translation of prognostic models into practice. Findings are reported as result followed by 95% confidence interval. Key findings revealed moderate yet heterogeneous discrimination in the published literature to date (pooled AUC 0.75, 0.71-0.78; I² ~99.6%), with complex algorithms offering no advantage over logistic regression (Chapter 2). Discrepancies in the recording of GDM were found between EHRs and the CTD (14.3% FNR, 2.3% FPR), though this had minor impact on model development (Chapter 4). Incorporating data from previous pregnancies improved model performance relative to first-trimester data (AUC ~0.88 vs ~0.82; intercept ~0.040 vs ~0,035; slope ~1.032 vs 1.016), with past pregnancy alone achieving good performance (AUC ~0.86; intercept 0.050; slope ~0.984) (Chapter 5). External validation highlighted transportability challenges: declining AUC in both Irish and Australian models, with impaired calibration (Chapter 6). The prospective clinical validation showed the prognostic model achieved moderate discrimination (AUC 0.762, 0.681-0.837) and acceptable calibration (intercept 0.21, -0.15-0.57; slope 0.808, 0.53-1.08) in real-world use, resulting in 1 in 5 GDM cases identified 10-12 weeks earlier (Chapter 7). The consistent performance decline from internal to external and prospective validation is consistent with an optimism bias that is present in many prognostic modelling studies and the necessity of rigorous, multi-stage testing. In conclusion, ML models, particularly those leveraging previous pregnancy data, show potential for early GDM risk prediction using EHRs. However, the successful clinical translation of these tools is critically dependent on data quality, multi-stage validation, and consideration of model transportability across populations. This thesis provides a comprehensive framework for developing and evaluating ML models in clinical settings incorporating EHR, highlighting the path from code to clinic.
Metadata
Item Type:Thesis (PhD)
Date of Award:6 January 2026
Refereed:No
Supervisor(s):Healy, Graham, Egan, Brendan and Caton, Simon
Subjects:Computer Science > Machine learning
Medical Sciences > Health
DCU Faculties and Centres:DCU Faculties and Schools > Faculty of Engineering and Computing
DCU Faculties and Schools > Faculty of Engineering and Computing > School of Computing
Use License:This item is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 License. View License
ID Code:32152
Deposited On:14 Apr 2026 13:16 by Graham Healy . Last Modified 14 Apr 2026 13:16
Documents

Full text available as:

[thumbnail of Final PhD_21268937 (1).pdf]
Preview
PDF - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
Creative Commons: Attribution-Noncommercial-No Derivative Works 4.0
10MB
Downloads

Downloads

Downloads per month over past year

Archive Staff Only: edit this record