Browse DORAS
Browse Theses
Latest Additions
Creative Commons License
Except where otherwise noted, content on this site is licensed for use under a:

Using EMBT to produce foreign language subtitles

Armstrong, Stephen (2007) Using EMBT to produce foreign language subtitles. Master of Science thesis, Dublin City University.

Full text available as:

PDF - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader


Due to limited budgets and an ever-diminishing timeframe for the production of foreign language subtitles, pressure on subtitle companies is at an all-time high. Although translation technologies are ubiquitous in other areas of translation, especially localisation, and have been helping translators work more efficiently for a number of years now (Lagoudaki, 2006), it is strange to note that subtitle companies have been slower to jump on the bandwagon. Recent research from both academia and the industry (O'Hagan, 2003; Carroll, 2004; Gambier, 2005) suggests that the inroads made in natural language processing and machine translation could go a long way to alleviating some of this pressure. In this thesis, we set out to establish how example-based machine translation (EBMT) can be used to speed up the subtitling process, thus improving the throughput of the subtitler, and also as a means of automatically producing foreign language subtitles which subtitle companies may not normally provide, even though they would be extremely helpful for the viewing public. Through the development of the modular corpus-based MT engine, MaTrEx (Stroppa et al., 2006), and the collection of a large amount of subtitle data extracted from over 50 full-length features (Armstrong et al., 2006a), we were able to apply a number of EBMT techniques to produce subtitles for the language directions German-English and English-German. These machine-produced subtitles were evaluated using a range of both well-established automatic metrics common to machine translation as well as some novel manual evaluation strategies. Both automatic metrics and the human evaluation were very useful in the developmental process where we were able to isolate and fix errors made by our system. In addition, through obtaining a human's perspective on the subtitles produced by our system, we were able to gauge the acceptability of these subtitles for public viewing, and have provided a solid grounding for future research into the acceptability of (semi-) automatically generated subtitles.

Item Type:Thesis (Master of Science)
Date of Award:2007
Supervisor(s):Way, Andy
Uncontrolled Keywords:example based machine translating; EBMT; natural language processing; subtitling; films; mchine produced subtitles
Subjects:Computer Science > Machine translating
DCU Faculties and Centres:DCU Faculties and Schools > Faculty of Engineering and Computing > School of Computing
Use License:This item is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 3.0 License. View License
ID Code:17019
Deposited On:15 May 2012 15:45 by Fran Callaghan. Last Modified 15 May 2012 15:45

Download statistics

Archive Staff Only: edit this record