Cafferkey, Conor (2008) Exploiting multi-word units in statistical parsing and generation. Master of Science thesis, Dublin City University.
Abstract
Syntactic parsing is an important prerequisite for many natural language processing (NLP) applications. The task refers to the process of generating the tree of syntactic nodes with associated phrase category labels corresponding to a sentence.
Our objective is to improve upon statistical models for syntactic parsing by leveraging multi-word units (MWUs) such as named entities and other classes of multi-word expressions. Multi-word units are phrases that are lexically, syntactically and/or semantically
idiosyncratic in that they are to at least some degree
non-compositional. If such units are identified prior to, or as part of, the parsing process their boundaries can be exploited as islands of certainty within the very large (and often highly ambiguous) search space. Luckily, certain types of MWUs can be readily identified in an automatic fashion (using a variety of techniques) to a near-human
level of accuracy.
We carry out a number of experiments which integrate knowledge about different classes of MWUs in several commonly deployed parsing architectures. In a supplementary set of experiments, we attempt to exploit these units in the converse operation to statistical parsing---statistical generation (in our case, surface realisation from Lexical-Functional Grammar f-structures). We show that, by exploiting knowledge about MWUs, certain classes of parsing and generation decisions are more accurately resolved. This translates to improvements in overall parsing and generation results which, although modest, are demonstrably significant.
Metadata
Item Type: | Thesis (Master of Science) |
---|---|
Date of Award: | November 2008 |
Refereed: | No |
Supervisor(s): | van Genabith, Josef |
Uncontrolled Keywords: | Statistical Parsing; Statistical Generation; Named Entities; Multi-Word Units; |
Subjects: | Computer Science > Computational linguistics |
DCU Faculties and Centres: | DCU Faculties and Schools > Faculty of Engineering and Computing > School of Computing |
Use License: | This item is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 3.0 License. View License |
Funders: | Irish Research Council for Science Engineering and Technology, Microsoft Research |
ID Code: | 615 |
Deposited On: | 10 Nov 2008 11:19 by Josef Vangenabith . Last Modified 16 Nov 2009 17:18 |
Documents
Full text available as:
Preview |
PDF
- Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
597kB |
Downloads
Downloads
Downloads per month over past year
Archive Staff Only: edit this record