Login (DCU Staff Only)
Login (DCU Staff Only)

DORAS | DCU Research Repository

Explore open access research and scholarly works from DCU

Advanced Search

Investigating multilingual approaches for parsing universal dependencies

Barry, James orcid logoORCID: 0000-0003-3051-585X (2022) Investigating multilingual approaches for parsing universal dependencies. PhD thesis, Dublin City University.

Abstract
Multilingual dependency parsing encapsulates any attempt to parse multiple languages. It can involve parsing multiple languages in isolation (poly-monolingual), leveraging training data from multiple languages to process any of the included languages (polyglot), or training on one or multiple languages to process a low-resource language with no training data (zero-shot). In this thesis, we explore multilingual dependency parsing across all three paradigms, first analysing whether polyglot training on a number of source languages is beneficial for processing a target language in a zero-shot cross-lingual dependency parsing experiment using annotation projection. The results of this experiment show that polyglot training produces an overall trend of better results on the target language but a highly-related single source language can still be better for transfer. We then look at the role of pretrained language models in processing a moderately low-resource language in Irish. Here, we develop our own monolingual Irish BERT model gaBERT from scratch and compare it to a number of multilingual baselines, showing that developing a monolingual language model for Irish is worthwhile. We then turn to the topic of parsing Enhanced Universal Dependencies (EUD) Graphs, which are an extension of basic Universal Dependencies trees, where we describe the DCU-EPFL submission to the 2021 IWPT shared task on EUD parsing. Here, we developed a multitask model to jointly learn the tasks of basic dependency parsing and EUD graph parsing, showing improvements over a single-task basic dependency parser. Lastly, we revisit the topic of polyglot parsing and investigate whether multiview learning can be applied to the problem of multilingual dependency parsing. Here, we learn different views based on the dataset source. We show that multiview learning can be used to train parsers with multiple datasets, showing a general improvement over single-view baselines.
Metadata
Item Type:Thesis (PhD)
Date of Award:November 2022
Refereed:No
Supervisor(s):Foster, Jennifer and Wagner, Joachim
Subjects:Computer Science > Machine learning
Humanities > Linguistics
DCU Faculties and Centres:DCU Faculties and Schools > Faculty of Engineering and Computing > School of Computing
Research Institutes and Centres > ADAPT
Funders:Science Foundation Ireland
ID Code:27698
Deposited On:17 Nov 2022 13:08 by Jennifer Foster . Last Modified 17 Nov 2022 13:08
Documents

Full text available as:

[thumbnail of James_Barry_Thesis_Hardbound.pdf]
Preview
PDF - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
Creative Commons: Attribution-Noncommercial-No Derivative Works 4.0
2MB
Downloads

Downloads

Downloads per month over past year

Archive Staff Only: edit this record