Browse DORAS
Browse Theses
Search
Latest Additions
Creative Commons License
Except where otherwise noted, content on this site is licensed for use under a:

Treebank-based automatic acquisition of wide coverage, deep linguistic resources for Japanese

Oya, Masanori (2010) Treebank-based automatic acquisition of wide coverage, deep linguistic resources for Japanese. Master of Science thesis, Dublin City University.

Full text available as:

[img]
Preview
PDF - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
3076Kb
[img]
Preview
Image (JPEG) (copyright declaration)
590Kb

Abstract

The objective f this thesis is to design, implement and evaluate a methodology for the automatic acquisition of wide-coverage treebank-based deep linguistic resources fr Japanese, as part of the GramLab project which focuses on the automatic treebank-based induction of multilingual resources in the framework of Lexical-Functional Grammar (LFG). After introducing the basic framework of LFG in Chapter 2, I describe the core syntactic and morphological aspects of Japanese in Chapter 3: non-configurationality; the concept of "bunsetsu" r syntactic units and their dependency relationship represented in Directed Acyclic Graphs (DAGs); topicalisation by a particular particle; and frequent use of zero pronouns with or without over antecedents. Inflecting parts-of-speech and non-inflecting parts-of-speech of Japanese are also described with examples. In Chapter 4, I provide the linguistic representation of core grammatical features and functions of Japanese in the framework of LFG.I use Directed Acyclic Graphs (DAG) as a framework for the unified representation f surface syntactic, morphological and lexical information in an LFG f-structure. In Chapters 5 and 6, I describe the automatic annotation algorithm of LFG f-structure functional equations (i.e. labelled dependencies) to the Kyoto Text Corpus version 4.0 (KTC4) and the output of Kurohashi-Nagao Parser (KNP provide unlabelled dependencies only. The method presented in this dissertation also includes zero pronoun identification. Finally in Chapter 7 I evaluate the performance of the f-structure annotation algorithm with zero-pronoun identification for KTC4 against a manually-corrected Gold Standard of 500 sentences randomly chosen from KTC4. Using KTC4 treebank trees, currently my method achieves a pred-only dependency f-score of 94.72%. The parsing experiments using KNP output yield a pred-only dependency f-score of 82.38%.

Item Type:Thesis (Master of Science)
Date of Award:March 2010
Refereed:No
Supervisor(s):van Genabith, Josef
Uncontrolled Keywords:LFG; lexical functional grammar; treebank; Japanese; zero-anaphors;
Subjects:Computer Science > Computational linguistics
DCU Faculties and Centres:Research Initiatives and Centres > National Centre for Language Technology (NCLT)
DCU Faculties and Schools > Faculty of Engineering and Computing > School of Computing
Use License:This item is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 3.0 License. View License
Funders:Science Foundation Ireland
ID Code:15118
Deposited On:31 Mar 2010 14:28 by Josef Vangenabith. Last Modified 15 Jun 2010 16:37

Download statistics

Archive Staff Only: edit this record