Login (DCU Staff Only)
Login (DCU Staff Only)

DORAS | DCU Research Repository

Explore open access research and scholarly works from DCU

Advanced Search

From news to comment: Resources and benchmarks for parsing the language of web 2.0

Foster, Jennifer orcid logoORCID: 0000-0002-7789-4853, Cetinoglu, Ozlem, Wagner, Joachim orcid logoORCID: 0000-0002-8290-3849, Le Roux, Joseph, Nivre, Joakim, Hogan, Deirdre and van Genabith, Josef orcid logoORCID: 0000-0003-1322-7944 (2011) From news to comment: Resources and benchmarks for parsing the language of web 2.0. In: The 5th International Joint Conference on Natural Language Processing (IJCNLP), 08-13 Nov 2011, Chiang Mai, Thailand. ISBN 978-974-466-564-5

Abstract
We investigate the problem of parsing the noisy language of social media. We evaluate four all-Street-Journal-trained statistical parsers (Berkeley, Brown, Malt and MST) on a new dataset containing 1,000 phrase structure trees for sentences from microblogs (tweets) and discussion forum posts. We compare the four parsers on their ability to produce Stanford dependencies for these Web 2.0 sentences. We find that the parsers have a particular problem with tweets and that a substantial part of this problem is related to POS tagging accuracy. We attempt three retraining experiments involving Malt, Brown and an in-house Berkeley-style parser and obtain a statistically significant improvement for all three parsers.
Metadata
Item Type:Conference or Workshop Item (Paper)
Event Type:Conference
Refereed:Yes
Uncontrolled Keywords:Parsing; domain adaptation; self-training; up-training; user-generated content; twitter
Subjects:Computer Science > Computational linguistics
DCU Faculties and Centres:Research Institutes and Centres > Centre for Next Generation Localisation (CNGL)
Research Institutes and Centres > National Centre for Language Technology (NCLT)
DCU Faculties and Schools > Faculty of Engineering and Computing > School of Computing
Published in: Proceedings of the 5th International Joint Conference on Natural Language Processing (IJCNLP). . Asian Federation of Natural Language Processing. ISBN 978-974-466-564-5
Publisher:Asian Federation of Natural Language Processing
Official URL:http://www.ijcnlp2011.org/proceeding/IJCNLP2011-MA...
Copyright Information:©2011 Asian Federation of Natural Language Processing
Use License:This item is licensed under a Creative Commons Attribution-NonCommercial-Share Alike 3.0 License. View License
Funders:Enterprise Ireland, Science Foundation Ireland, French Agence Nationale pour la Recherche
ID Code:16854
Deposited On:28 Feb 2012 15:43 by Joachim Wagner . Last Modified 19 Jan 2022 12:48
Documents

Full text available as:

[thumbnail of camera ready submission]
Preview
PDF (camera ready submission) - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
253kB
Downloads

Downloads

Downloads per month over past year

Archive Staff Only: edit this record