Browse DORAS
Browse Theses
Search
Latest Additions
Creative Commons License
Except where otherwise noted, content on this site is licensed for use under a:

Comparing the use of edited and unedited text in parser self-training

Foster, Jennifer and Cetinoglu, Ozlem and Wagner, Joachim and VanGenabith, Josef (2011) Comparing the use of edited and unedited text in parser self-training. In: The 12th International Conference on Parsing Technologies (IWPT 2011), 05-07 Oct 2011, Dublin, Ireland. ISBN 978-1-932432-04-6

Full text available as:

[img]
Preview
PDF (camera ready submission) - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
194Kb

Abstract

We compare the use of edited text in the form of newswire and unedited text in the form of discussion forum posts as sources for training material in a self-training experiment involving the Brown reranking parser and a test set of sentences from an online sports discussion forum. We find that grammars induced from the two automatically parsed corpora achieve similar Parseval f-scores, with the grammars induced from the discussion forum material being slightly superior. An error analysis reveals that the two types of grammars do behave differently.

Item Type:Conference or Workshop Item (Paper)
Event Type:Conference
Refereed:Yes
Uncontrolled Keywords:parsing; self-training; domain adaptation; user-generated content
Subjects:Computer Science > Computational linguistics
DCU Faculties and Centres:Research Initiatives and Centres > Centre for Next Generation Localisation (CNGL)
Research Initiatives and Centres > National Centre for Language Technology (NCLT)
DCU Faculties and Schools > Faculty of Engineering and Computing > School of Computing
Published in:Proceedings of the 12th International Conference on Parsing Technologies. . Association for Computational Linguistics. ISBN 978-1-932432-04-6
Publisher:Association for Computational Linguistics
Official URL:http://www.aclweb.org/anthology-new/sigparse
Copyright Information:©2011 Association for Computational Linguistics
Use License:This item is licensed under a Creative Commons Attribution-NonCommercial-Share Alike 3.0 License. View License
Funders:Enterprise Ireland, Science Foundation Ireland
ID Code:16855
Deposited On:28 Feb 2012 16:10 by Joachim Wagner. Last Modified 30 May 2012 10:29

Download statistics

Archive Staff Only: edit this record