Browse DORAS
Browse Theses
Latest Additions
Creative Commons License
Except where otherwise noted, content on this site is licensed for use under a:

Parser-based retraining for domain adaptation of probabilistic generators

Hogan, Deirdre and Foster, Jennifer and Wagner, Joachim and van Genabith, Josef (2008) Parser-based retraining for domain adaptation of probabilistic generators. In: INLG 08 - 5th International Natural Language Generation Conference , 12-14 June 2008, Salt Fork, Ohio, USA.

Full text available as:

PDF - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader


While the effect of domain variation on Penn-treebank- trained probabilistic parsers has been investigated in previous work, we study its effect on a Penn-Treebank-trained probabilistic generator. We show that applying the generator to data from the British National Corpus results in a performance drop (from a BLEU score of 0.66 on the standard WSJ test set to a BLEU score of 0.54 on our BNC test set). We develop a generator retraining method where the domain-specific training data is automatically produced using state-of-the-art parser output. The retraining method recovers a substantial portion of the performance drop, resulting in a generator which achieves a BLEU score of 0.61 on our BNC test data.

Item Type:Conference or Workshop Item (Paper)
Event Type:Conference
Uncontrolled Keywords:Penn-Treebank-trained probabilistic generator;
Subjects:Computer Science > Machine translating
DCU Faculties and Centres:Research Initiatives and Centres > National Centre for Language Technology (NCLT)
DCU Faculties and Schools > Faculty of Engineering and Computing > School of Computing
Publisher:Association for Computational Linguistics
Official URL:
Copyright Information:© 2008 Association for Computational Linguistics
Use License:This item is licensed under a Creative Commons Attribution-NonCommercial-Share Alike 3.0 License. View License
Funders:Enterprise Ireland, EI CFTD/2007/229, Science Foundation Ireland, SFI 04/IN/I527, Irish Research Council for Science Engineering and Technology, IRCSET P/04/232
ID Code:15194
Deposited On:16 Feb 2010 14:59 by DORAS Administrator. Last Modified 27 Apr 2010 12:27

Download statistics

Archive Staff Only: edit this record