Browse DORAS
Browse Theses
Search
Latest Additions
Creative Commons License
Except where otherwise noted, content on this site is licensed for use under a:

Using percolated dependencies for phrase extraction in SMT

Srivastava, Ankit K. and Way, Andy (2009) Using percolated dependencies for phrase extraction in SMT. In: MT Summit XII - The twelfth Machine Translation Summit, 26-30 August 2009, Ottawa, Canada.

Full text available as:

[img]PDF - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
90Kb

Abstract

Statistical Machine Translation (SMT) systems rely heavily on the quality of the phrase pairs induced from large amounts of training data. Apart from the widely used method of heuristic learning of n-gram phrase translations from word alignments, there are numerous methods for extracting these phrase pairs. One such class of approaches uses translation information encoded in parallel treebanks to extract phrase pairs. Work to date has demonstrated the usefulness of translation models induced from both constituency structure trees and dependency structure trees. Both syntactic annotations rely on the existence of natural language parsers for both the source and target languages. We depart from the norm by directly obtaining dependency parses from constituency structures using head percolation tables. The paper investigates the use of aligned chunks induced from percolated dependencies in French–English SMT and contrasts it with the aforementioned extracted phrases. We observe that adding phrase pairs from any other method improves translation performance over the baseline n-gram-based system, percolated dependencies are a good substitute for parsed dependencies, and that supplementing with our novel head percolation-induced chunks shows a general trend toward improving all system types across two data sets up to a 5.26% relative increase in BLEU.

Item Type:Conference or Workshop Item (Paper)
Event Type:Conference
Refereed:Yes
Uncontrolled Keywords:Statistical machine translation;
Subjects:Computer Science > Machine translating
DCU Faculties and Centres:Research Initiatives and Centres > Centre for Next Generation Localisation (CNGL)
Research Initiatives and Centres > National Centre for Language Technology (NCLT)
DCU Faculties and Schools > Faculty of Engineering and Computing > School of Computing
Official URL:http://summitxii.amtaweb.org/
Use License:This item is licensed under a Creative Commons Attribution-NonCommercial-Share Alike 3.0 License. View License
Funders:Science Foundation Ireland, SFI 07/CE/I1142
ID Code:15152
Deposited On:12 Feb 2010 16:01 by DORAS Administrator. Last Modified 10 Nov 2010 16:31

Download statistics

Archive Staff Only: edit this record