Login (DCU Staff Only)
Login (DCU Staff Only)

DORAS | DCU Research Repository

Explore open access research and scholarly works from DCU

Advanced Search

Towards a machine-learning architecture for lexical functional grammar parsing

Chrupała, Grzegorz (2008) Towards a machine-learning architecture for lexical functional grammar parsing. PhD thesis, Dublin City University.

Data-driven grammar induction aims at producing wide-coverage grammars of human languages. Initial efforts in this field produced relatively shallow linguistic representations such as phrase-structure trees, which only encode constituent structure. Recent work on inducing deep grammars from treebanks addresses this shortcoming by also recovering non-local dependencies and grammatical relations. My aim is to investigate the issues arising when adapting an existing Lexical Functional Grammar (LFG) induction method to a new language and treebank, and find solutions which will generalize robustly across multiple languages. The research hypothesis is that by exploiting machine-learning algorithms to learn morphological features, lemmatization classes and grammatical functions from treebanks we can reduce the amount of manual specification and improve robustness, accuracy and domain- and language -independence for LFG parsing systems. Function labels can often be relatively straightforwardly mapped to LFG grammatical functions. Learning them reliably permits grammar induction to depend less on language-specific LFG annotation rules. I therefore propose ways to improve acquisition of function labels from treebanks and translate those improvements into better-quality f-structure parsing. In a lexicalized grammatical formalism such as LFG a large amount of syntactically relevant information comes from lexical entries. It is, therefore, important to be able to perform morphological analysis in an accurate and robust way for morphologically rich languages. I propose a fully data-driven supervised method to simultaneously lemmatize and morphologically analyze text and obtain competitive or improved results on a range of typologically diverse languages.
Item Type:Thesis (PhD)
Date of Award:November 2008
Supervisor(s):van Genabith, Josef
Subjects:Computer Science > Machine learning
DCU Faculties and Centres:DCU Faculties and Schools > Faculty of Engineering and Computing > School of Computing
Use License:This item is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 3.0 License. View License
Funders:SFI, GramLab PI Grant Prof. Josef van Genabith, Science Foundation Ireland
ID Code:550
Deposited On:10 Nov 2008 11:21 by Josef Vangenabith . Last Modified 19 Jul 2018 14:41

Full text available as:

[thumbnail of GrzegorzPhDFinal.pdf]
PDF - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader


Downloads per month over past year

Archive Staff Only: edit this record