Due to limited budgets and an ever-diminishing timeframe for the production of foreign language subtitles, pressure on subtitle companies is at an all-time high. Although translation technologies are ubiquitous in other areas of translation, especially localisation, and have been helping translators work more efficiently for a number of years now (Lagoudaki, 2006), it is strange to note that subtitle companies have been slower to jump on the bandwagon. Recent research from both academia and the industry (O'Hagan, 2003; Carroll, 2004; Gambier, 2005) suggests that the inroads made in natural language processing and machine translation could go a long way to alleviating some of this pressure.
In this thesis, we set out to establish how example-based machine translation (EBMT) can be used to speed up the subtitling process, thus improving the throughput of the subtitler, and also as a means of automatically producing foreign language subtitles which subtitle companies may not normally provide, even though they would be extremely helpful for the viewing public.
Through the development of the modular corpus-based MT engine, MaTrEx (Stroppa
et al., 2006), and the collection of a large amount of subtitle data extracted from over
50 full-length features (Armstrong et al., 2006a), we were able to apply a number of
EBMT techniques to produce subtitles for the language directions German-English and English-German. These machine-produced subtitles were evaluated using a range of both well-established automatic metrics common to machine translation as well as some novel manual evaluation strategies. Both automatic metrics and the human evaluation were very useful in the developmental process where we were able to isolate and fix errors made by our system. In addition, through obtaining a human's perspective on the subtitles produced by our system, we were able to gauge the acceptability of these subtitles for public viewing, and have provided a solid grounding for future research into the acceptability of (semi-) automatically generated subtitles.
Metadata
Item Type:
Thesis (Master of Science)
Date of Award:
2007
Refereed:
No
Supervisor(s):
Way, Andy
Uncontrolled Keywords:
example based machine translating; EBMT; natural language processing; subtitling; films; mchine produced subtitles