Browse DORAS
Browse Theses
Latest Additions
Creative Commons License
Except where otherwise noted, content on this site is licensed for use under a:

Better training for function labeling

Chrupała, Grzegorz and Stroppa, Nicolas and van Genabith, Josef and Dinu, Georgiana (2007) Better training for function labeling. In: RANLP 2007 - Recent Advances in Natural Language Processing Conference, 27-29 September, 2007, Borovets, Bulgaria.

Full text available as:

PDF - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader


Function labels enrich constituency parse tree nodes with information about their abstract syntactic and semantic roles. A common way to obtain function-labeled trees is to use a two-stage architecture where first a statistical parser produces the constituent structure and then a second component such as a classifier adds the missing function tags. In order to achieve optimal results, training examples for machine-learning-based classifiers should be as similar as possible to the instances seen during prediction. However, the method which has been used so far to obtain training examples for the function labeling classifier suffers from a serious drawback: the training examples come from perfect treebank trees, whereas test examples are derived from parser-produced, imperfect trees. We show that extracting training instances from the reparsed training part of the treebank results in better training material as measured by similarity to test instances. We show that our training method achieves statistically significantly higher f-scores on the function labeling task for the English Penn Treebank. Currently our method achieves 91.47% f-score on the section 23 of WSJ, the highest score reported in the literature so far.

Item Type:Conference or Workshop Item (Paper)
Event Type:Conference
Subjects:Computer Science > Machine translating
DCU Faculties and Centres:Research Initiatives and Centres > National Centre for Language Technology (NCLT)
Official URL:
Use License:This item is licensed under a Creative Commons Attribution-NonCommercial-Share Alike 3.0 License. View License
Funders:Science Foundation Ireland, SFI 04/IN/I527
ID Code:15206
Deposited On:17 Feb 2010 15:05 by DORAS Administrator. Last Modified 27 Apr 2010 14:08

Download statistics

Archive Staff Only: edit this record