Comparing the use of edited and unedited text in parser self-training
Foster, JenniferORCID: 0000-0002-7789-4853, Cetinoglu, Ozlem, Wagner, JoachimORCID: 0000-0002-8290-3849 and van Genabith, JosefORCID: 0000-0003-1322-7944
(2011)
Comparing the use of edited and unedited text in parser self-training.
In: The 12th International Conference on Parsing Technologies (IWPT 2011), 05-07 Oct 2011, Dublin, Ireland.
ISBN 978-1-932432-04-6
We compare the use of edited text in the form of newswire and unedited text in the form of discussion forum posts as sources for training material in a self-training experiment involving the Brown reranking parser and a test set of sentences from an online sports discussion forum. We find that grammars induced from the two automatically parsed corpora achieve similar Parseval f-scores, with the grammars induced from the discussion forum material being slightly superior. An error analysis reveals that the two types of grammars do behave differently.