Treebank-based automatic acquisition of wide coverage, deep linguistic resources for Japanese

Oya, Masanori (2010) Treebank-based automatic acquisition of wide coverage, deep linguistic resources for Japanese. Master of Science thesis, Dublin City University.

Abstract
Metadata
Downloads
Documents

[+][-]

Abstract

The objective f this thesis is to design, implement and evaluate a methodology for the automatic acquisition of wide-coverage treebank-based deep linguistic resources fr Japanese, as part of the GramLab project which focuses on the automatic treebank-based induction of multilingual resources in the framework of Lexical-Functional Grammar (LFG). After introducing the basic framework of LFG in Chapter 2, I describe the core syntactic and morphological aspects of Japanese in Chapter 3: non-configurationality; the concept of "bunsetsu" r syntactic units and their dependency relationship represented in Directed Acyclic Graphs (DAGs); topicalisation by a particular particle; and frequent use of zero pronouns with or without over antecedents. Inflecting parts-of-speech and non-inflecting parts-of-speech of Japanese are also described with examples. In Chapter 4, I provide the linguistic representation of core grammatical features and functions of Japanese in the framework of LFG.I use Directed Acyclic Graphs (DAG) as a framework for the unified representation f surface syntactic, morphological and lexical information in an LFG f-structure. In Chapters 5 and 6, I describe the automatic annotation algorithm of LFG f-structure functional equations (i.e. labelled dependencies) to the Kyoto Text Corpus version 4.0 (KTC4) and the output of Kurohashi-Nagao Parser (KNP provide unlabelled dependencies only. The method presented in this dissertation also includes zero pronoun identification. Finally in Chapter 7 I evaluate the performance of the f-structure annotation algorithm with zero-pronoun identification for KTC4 against a manually-corrected Gold Standard of 500 sentences randomly chosen from KTC4. Using KTC4 treebank trees, currently my method achieves a pred-only dependency f-score of 94.72%. The parsing experiments using KNP output yield a pred-only dependency f-score of 82.38%.

Metadata

Item Type:	Thesis (Master of Science)
Date of Award:	March 2010
Refereed:	No
Supervisor(s):	van Genabith, Josef
Uncontrolled Keywords:	LFG; lexical functional grammar; treebank; Japanese; zero-anaphors;
Subjects:	Computer Science > Computational linguistics
DCU Faculties and Centres:	Research Initiatives and Centres > National Centre for Language Technology (NCLT) DCU Faculties and Schools > Faculty of Engineering and Computing > School of Computing
Use License:	This item is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 3.0 License. View License
Funders:	Science Foundation Ireland
ID Code:	15118
Deposited On:	31 Mar 2010 13:28 by Josef Vangenabith . Last Modified 19 Jul 2018 14:49

Documents

Full text available as:

Preview	PDF - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader 3MB
Preview	Image (JPEG) (copyright declaration) 604kB

Downloads

Downloads per month over past year

Archive Staff Only: edit this record