Skip to main content
DORAS
DCU Online Research Access Service
Login (DCU Staff Only)
Comparing the use of edited and unedited text in parser self-training

Foster, Jennifer ORCID: 0000-0002-7789-4853, Cetinoglu, Ozlem, Wagner, Joachim ORCID: 0000-0002-8290-3849 and van Genabith, Josef ORCID: 0000-0003-1322-7944 (2011) Comparing the use of edited and unedited text in parser self-training. In: The 12th International Conference on Parsing Technologies (IWPT 2011), 05-07 Oct 2011, Dublin, Ireland. ISBN 978-1-932432-04-6

Full text available as:

[img]
Preview
PDF (camera ready submission) - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
198kB

Abstract

We compare the use of edited text in the form of newswire and unedited text in the form of discussion forum posts as sources for training material in a self-training experiment involving the Brown reranking parser and a test set of sentences from an online sports discussion forum. We find that grammars induced from the two automatically parsed corpora achieve similar Parseval f-scores, with the grammars induced from the discussion forum material being slightly superior. An error analysis reveals that the two types of grammars do behave differently.

Item Type:Conference or Workshop Item (Paper)
Event Type:Conference
Refereed:Yes
Uncontrolled Keywords:parsing; self-training; domain adaptation; user-generated content
Subjects:Computer Science > Computational linguistics
DCU Faculties and Centres:Research Initiatives and Centres > Centre for Next Generation Localisation (CNGL)
Research Initiatives and Centres > National Centre for Language Technology (NCLT)
DCU Faculties and Schools > Faculty of Engineering and Computing > School of Computing
Published in: Proceedings of the 12th International Conference on Parsing Technologies. . Association for Computational Linguistics. ISBN 978-1-932432-04-6
Publisher:Association for Computational Linguistics
Official URL:http://www.aclweb.org/anthology-new/sigparse
Copyright Information:©2011 Association for Computational Linguistics
Use License:This item is licensed under a Creative Commons Attribution-NonCommercial-Share Alike 3.0 License. View License
Funders:Enterprise Ireland, Science Foundation Ireland
ID Code:16855
Deposited On:28 Feb 2012 16:10 by Joachim Wagner . Last Modified 19 Jan 2022 12:48

Downloads

Downloads per month over past year

Archive Staff Only: edit this record

Altmetric
- Altmetric
+ Altmetric
  • Student Email
  • Staff Email
  • Student Apps
  • Staff Apps
  • Loop
  • Disclaimer
  • Privacy
  • Contact Us