Login (DCU Staff Only)
Login (DCU Staff Only)

DORAS | DCU Research Repository

Explore open access research and scholarly works from DCU

Advanced Search

Treebank-Based Deep Grammar Acquisition for French Probabilistic Parsing Resources

Schluter, Natalie (2011) Treebank-Based Deep Grammar Acquisition for French Probabilistic Parsing Resources. PhD thesis, Dublin City University.

Abstract
Motivated by the expense in time and other resources to produce hand-crafted grammars, there has been increased interest in wide-coverage grammars automatically obtained from treebanks. In particular, recent years have seen a move towards acquiring deep (LFG, HPSG and CCG) resources that can represent information absent from simple CFG-type structured treebanks and which are considered to produce more language-neutral linguistic representations, such as syntactic dependency trees. As is often the case in early pioneering work in natural language processing, English has been the focus of attention in the first efforts towards acquiring treebank-based deep-grammar resources, followed by treatments of, for example, German, Japanese, Chinese and Spanish. However, to date no comparable large-scale automatically acquired deep-grammar resources have been obtained for French. The goal of the research presented in this thesis is to develop, implement, and evaluate treebank-based deep-grammar acquisition techniques for French. Along the way towards achieving this goal, this thesis presents the derivation of a new treebank for French from the Paris 7 Treebank, the Modified French Treebank, a cleaner, more coherent treebank with several transformed structures and new linguistic analyses. Statistical parsers trained on this data outperform those trained on the original Paris 7 Treebank, which has five times the amount of data. The Modified French Treebank is the data source used for the development of treebank-based automatic deep-grammar acquisition for LFG parsing resources for French, based on an f-structure annotation algorithm for this treebank. LFG CFG-based parsing architectures are then extended and tested, achieving a competitive best f-score of 86.73% for all features. The CFG-based parsing architectures are then complemented with an alternative dependency-based statistical parsing approach, obviating the CFG-based parsing step, and instead directly parsing strings into f-structures.
Metadata
Item Type:Thesis (PhD)
Date of Award:19 January 2011
Refereed:No
Additional Information:Computational Lingusitics, Natural Language Processing
Supervisor(s):van Genabith, Josef
Uncontrolled Keywords:Treebank-Based Deep LFG Grammar Acquisition
Subjects:Computer Science > Computational linguistics
DCU Faculties and Centres:Research Institutes and Centres > National Centre for Language Technology (NCLT)
DCU Faculties and Schools > Faculty of Engineering and Computing > School of Computing
Use License:This item is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 3.0 License. View License
Funders:Science Foundation Ireland
ID Code:16077
Deposited On:06 Apr 2011 15:58 by Josef Vangenabith . Last Modified 19 Jul 2018 14:52
Documents

Full text available as:

[thumbnail of Thesis-NatalieSchluter2011.pdf]
Preview
PDF - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
659kB
Downloads

Downloads

Downloads per month over past year

Archive Staff Only: edit this record