#hardtoparse: POS tagging and parsing the twitterverse
Foster, JenniferORCID: 0000-0002-7789-4853, Cetinoglu, Ozlem, Wagner, JoachimORCID: 0000-0002-8290-3849, Le Roux, Joseph, Hogan, Stephen, Nivre, Joakim, Hogan, Deirdre and van Genabith, JosefORCID: 0000-0003-1322-7944
(2011)
#hardtoparse: POS tagging and parsing the twitterverse.
In: The AAAI-11 Workshop on Analyzing Microtext, 8 Aug 2011, San Francisco, CA..
We evaluate the statistical dependency parser, Malt, on a new dataset of sentences taken from tweets. We use a version of Malt which is trained on gold standard phrase structure Wall Street Journal (WSJ) trees converted to Stanford labelled dependencies. We observe a drastic drop in performance moving from our in-domain WSJ test set to the new Twitter dataset, much of which has to do with the propagation of part-of-speech tagging errors. Retraining Malt on dependency trees produced by a state-of-the-art phrase structure parser, which has itself been self-trained on Twitter material, results in a significant improvement. We analyse this improvement by examining in detail the effect of the retraining on individual dependency types.