Login (DCU Staff Only)
Login (DCU Staff Only)

DORAS | DCU Research Repository

Explore open access research and scholarly works from DCU

Advanced Search

Comparing the use of edited and unedited text in parser self-training

Foster, Jennifer orcid logoORCID: 0000-0002-7789-4853, Cetinoglu, Ozlem, Wagner, Joachim orcid logoORCID: 0000-0002-8290-3849 and van Genabith, Josef orcid logoORCID: 0000-0003-1322-7944 (2011) Comparing the use of edited and unedited text in parser self-training. In: The 12th International Conference on Parsing Technologies (IWPT 2011), 05-07 Oct 2011, Dublin, Ireland. ISBN 978-1-932432-04-6

Abstract
We compare the use of edited text in the form of newswire and unedited text in the form of discussion forum posts as sources for training material in a self-training experiment involving the Brown reranking parser and a test set of sentences from an online sports discussion forum. We find that grammars induced from the two automatically parsed corpora achieve similar Parseval f-scores, with the grammars induced from the discussion forum material being slightly superior. An error analysis reveals that the two types of grammars do behave differently.
Metadata
Item Type:Conference or Workshop Item (Paper)
Event Type:Conference
Refereed:Yes
Uncontrolled Keywords:parsing; self-training; domain adaptation; user-generated content
Subjects:Computer Science > Computational linguistics
DCU Faculties and Centres:Research Institutes and Centres > Centre for Next Generation Localisation (CNGL)
Research Institutes and Centres > National Centre for Language Technology (NCLT)
DCU Faculties and Schools > Faculty of Engineering and Computing > School of Computing
Published in: Proceedings of the 12th International Conference on Parsing Technologies. . Association for Computational Linguistics. ISBN 978-1-932432-04-6
Publisher:Association for Computational Linguistics
Official URL:http://www.aclweb.org/anthology-new/sigparse
Copyright Information:©2011 Association for Computational Linguistics
Use License:This item is licensed under a Creative Commons Attribution-NonCommercial-Share Alike 3.0 License. View License
Funders:Enterprise Ireland, Science Foundation Ireland
ID Code:16855
Deposited On:28 Feb 2012 16:10 by Joachim Wagner . Last Modified 19 Jan 2022 12:48
Documents

Full text available as:

[thumbnail of camera ready submission]
Preview
PDF (camera ready submission) - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
198kB
Metrics

Altmetric Badge

Dimensions Badge

Downloads

Downloads

Downloads per month over past year

Archive Staff Only: edit this record