Automatic generation of parallel treebanks
Zhechev, Ventsislav and Way, Andy (2008) Automatic generation of parallel treebanks. In: COLING 2008 - 22nd International Conference on Computational Linguistics, 18-22 August 2008, Manchester, UK.
Full text available as:
The need for syntactically annotated data for use in natural language processing has increased dramatically in recent years. This is true especially for parallel treebanks, of which very few exist. The ones that exist are mainly hand-crafted and too small for reliable use in data-oriented applications. In this paper we introduce a novel platform for fast and robust automatic generation of
parallel treebanks. The software we have developed based on this platform has been shown to handle large data sets. We also present evaluation results demonstrating the quality of the derived treebanks and discuss some possible modifications and improvements that can lead to even better
results. We expect the presented platform to help boost research in the field of dataoriented machine translation and lead to advancements in other fields where parallel
treebanks can be employed.
Archive Staff Only: edit this record