Schluter, Natalie (2011) Treebank-Based Deep Grammar Acquisition for French Probabilistic Parsing Resources. PhD thesis, Dublin City University.
Abstract
Motivated by the expense in time and other resources to produce hand-crafted grammars, there has been increased interest in wide-coverage grammars automatically obtained from treebanks. In particular, recent years have seen a move
towards acquiring deep (LFG, HPSG and CCG) resources that can represent information absent from simple CFG-type structured treebanks and which are considered to produce more language-neutral linguistic representations, such
as syntactic dependency trees. As is often the case in early pioneering work in natural language processing, English has been the focus of attention in the first efforts towards acquiring treebank-based deep-grammar resources, followed by treatments of, for example, German, Japanese, Chinese and Spanish. However, to date no comparable large-scale automatically acquired deep-grammar resources have been obtained for French. The goal of the research presented in this thesis is to develop, implement, and evaluate treebank-based deep-grammar acquisition techniques for French. Along the way towards achieving this goal, this thesis presents the derivation of a new treebank for French from the Paris 7 Treebank, the Modified French Treebank, a cleaner, more coherent treebank with several transformed structures and new linguistic analyses. Statistical parsers trained on this data outperform those trained on the original Paris 7 Treebank, which has five times the amount of data. The Modified French Treebank is the data source used for the development of treebank-based automatic deep-grammar acquisition for LFG parsing resources
for French, based on an f-structure annotation algorithm for this treebank. LFG CFG-based parsing architectures are then extended and tested, achieving a competitive best f-score of 86.73% for all features. The CFG-based parsing architectures are then complemented with an alternative dependency-based statistical parsing approach, obviating the CFG-based parsing step, and instead directly
parsing strings into f-structures.
Metadata
Item Type: | Thesis (PhD) |
---|---|
Date of Award: | 19 January 2011 |
Refereed: | No |
Additional Information: | Computational Lingusitics, Natural Language Processing |
Supervisor(s): | van Genabith, Josef |
Uncontrolled Keywords: | Treebank-Based Deep LFG Grammar Acquisition |
Subjects: | Computer Science > Computational linguistics |
DCU Faculties and Centres: | Research Institutes and Centres > National Centre for Language Technology (NCLT) DCU Faculties and Schools > Faculty of Engineering and Computing > School of Computing |
Use License: | This item is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 3.0 License. View License |
Funders: | Science Foundation Ireland |
ID Code: | 16077 |
Deposited On: | 06 Apr 2011 15:58 by Josef Vangenabith . Last Modified 19 Jul 2018 14:52 |
Documents
Full text available as:
Preview |
PDF
- Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
659kB |
Downloads
Downloads
Downloads per month over past year
Archive Staff Only: edit this record