#hardtoparse: POS tagging and parsing the twitterverse
Foster, Jennifer and Cetinoglu, Ozlem and Wagner, Joachim and Le Roux, Joseph and Hogan, Stephen and Nivre, Joakim and Hogan, Deirdre and van Genabith, Josef (2011) #hardtoparse: POS tagging and parsing the twitterverse. In: The AAAI-11 Workshop on Analyzing Microtext, 8 Aug 2011, San Francisco, CA..
Full text available as:
We evaluate the statistical dependency parser, Malt, on a new dataset of sentences taken from tweets. We use a version of Malt which is trained on gold standard phrase structure Wall Street Journal (WSJ) trees converted to Stanford labelled dependencies. We observe a drastic drop in performance moving from our in-domain WSJ test set to the new Twitter dataset, much of which has to do with the propagation of part-of-speech tagging errors. Retraining Malt on dependency trees produced by a state-of-the-art phrase structure parser, which has itself been self-trained on Twitter material, results in a signiﬁcant improvement. We analyse this improvement by examining in detail the effect of the retraining on individual dependency types.
|Item Type:||Conference or Workshop Item (Paper)|
|Uncontrolled Keywords:||Malt; Twitter|
|Subjects:||Computer Science > Computational linguistics|
Computer Science > Artificial intelligence
|DCU Faculties and Centres:||UNSPECIFIED|
|Copyright Information:||© 2011 Association for the Advancement of Artificial
|Use License:||This item is licensed under a Creative Commons Attribution-NonCommercial-Share Alike 3.0 License. View License|
|Funders:||Enterprise Ireland, Science Foundation Ireland, Centre for Next Generation Localisation, French Agence Nationale pour la Recherche|
|Deposited On:||09 Aug 2011 12:55 by Joachim Wagner. Last Modified 09 Aug 2011 13:04|
Archive Staff Only: edit this record