Oya, Masanori (2010) Treebank-based automatic acquisition of wide coverage, deep linguistic resources for Japanese. Master of Science thesis, Dublin City University.
Abstract
The objective f this thesis is to design, implement and evaluate a methodology for the automatic acquisition of wide-coverage treebank-based deep linguistic resources fr Japanese, as part of the GramLab project which focuses on the automatic treebank-based induction of multilingual resources in the framework of Lexical-Functional Grammar (LFG).
After introducing the basic framework of LFG in Chapter 2, I describe the core syntactic and morphological aspects of Japanese in Chapter 3: non-configurationality; the concept of "bunsetsu" r syntactic units and their dependency relationship represented in Directed Acyclic Graphs (DAGs); topicalisation by a particular particle; and frequent use of zero pronouns with or without over antecedents. Inflecting parts-of-speech and non-inflecting parts-of-speech of Japanese are also described with examples.
In Chapter 4, I provide the linguistic representation of core grammatical features and functions of Japanese in the framework of LFG.I use Directed Acyclic Graphs (DAG) as a framework for the unified representation f surface syntactic, morphological and lexical information in an LFG f-structure.
In Chapters 5 and 6, I describe the automatic annotation algorithm of LFG f-structure functional equations (i.e. labelled dependencies) to the Kyoto Text Corpus version 4.0 (KTC4) and the output of Kurohashi-Nagao Parser (KNP provide unlabelled dependencies only. The method presented in this dissertation also includes zero pronoun identification.
Finally in Chapter 7 I evaluate the performance of the f-structure annotation algorithm with zero-pronoun identification for KTC4 against a manually-corrected Gold Standard of 500 sentences randomly chosen from KTC4. Using KTC4 treebank trees, currently my method achieves a pred-only dependency f-score of 94.72%. The parsing experiments using KNP output yield a pred-only dependency f-score of 82.38%.
Metadata
Item Type: | Thesis (Master of Science) |
---|---|
Date of Award: | March 2010 |
Refereed: | No |
Supervisor(s): | van Genabith, Josef |
Uncontrolled Keywords: | LFG; lexical functional grammar; treebank; Japanese; zero-anaphors; |
Subjects: | Computer Science > Computational linguistics |
DCU Faculties and Centres: | Research Institutes and Centres > National Centre for Language Technology (NCLT) DCU Faculties and Schools > Faculty of Engineering and Computing > School of Computing |
Use License: | This item is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 3.0 License. View License |
Funders: | Science Foundation Ireland |
ID Code: | 15118 |
Deposited On: | 31 Mar 2010 13:28 by Josef Vangenabith . Last Modified 19 Jul 2018 14:49 |
Documents
Full text available as:
Preview |
PDF
- Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
3MB |
Preview |
Image (JPEG) (copyright declaration)
604kB |
Downloads
Downloads
Downloads per month over past year
Archive Staff Only: edit this record