Using percolated dependencies for phrase extraction in SMT

Srivastava, Ankit Kumar; Way, Andy

Srivastava, Ankit Kumar and Way, Andy ORCID: 0000-0001-5736-5930 (2009) Using percolated dependencies for phrase extraction in SMT. In: MT Summit XII - The twelfth Machine Translation Summit, 26-30 August 2009, Ottawa, Canada.

Abstract
Metadata
Downloads
Documents

[+][-]

Abstract

Statistical Machine Translation (SMT) systems rely heavily on the quality of the phrase pairs induced from large amounts of training data. Apart from the widely used method of heuristic learning of n-gram phrase translations from word alignments, there are numerous methods for extracting these phrase pairs. One such class of approaches uses translation information encoded in parallel treebanks to extract phrase pairs. Work to date has demonstrated the usefulness of translation models induced from both constituency structure trees and dependency structure trees. Both syntactic annotations rely on the existence of natural language parsers for both the source and target languages. We depart from the norm by directly obtaining dependency parses from constituency structures using head percolation tables. The paper investigates the use of aligned chunks induced from percolated dependencies in French–English SMT and contrasts it with the aforementioned extracted phrases. We observe that adding phrase pairs from any other method improves translation performance over the baseline n-gram-based system, percolated dependencies are a good substitute for parsed dependencies, and that supplementing with our novel head percolation-induced chunks shows a general trend toward improving all system types across two data sets up to a 5.26% relative increase in BLEU.

Metadata

Item Type:	Conference or Workshop Item (Paper)
Event Type:	Conference
Refereed:	Yes
Uncontrolled Keywords:	Statistical machine translation;
Subjects:	Computer Science > Machine translating
DCU Faculties and Centres:	Research Institutes and Centres > Centre for Next Generation Localisation (CNGL) Research Institutes and Centres > National Centre for Language Technology (NCLT) DCU Faculties and Schools > Faculty of Engineering and Computing > School of Computing
Official URL:	http://summitxii.amtaweb.org/
Use License:	This item is licensed under a Creative Commons Attribution-NonCommercial-Share Alike 3.0 License. View License
Funders:	Science Foundation Ireland, SFI 07/CE/I1142
ID Code:	15152
Deposited On:	12 Feb 2010 16:01 by DORAS Administrator . Last Modified 14 Nov 2018 16:33

Documents

Full text available as:

Preview

PDF - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
92kB

Downloads

Downloads per month over past year

Archive Staff Only: edit this record

DORAS | DCU Research Repository

Using percolated dependencies for phrase extraction in SMT

Downloads