Haque, Rejwanul ORCID: 0000-0003-1680-0099, Hasanuzzaman, Mohammed ORCID: 0000-0003-1838-0091 and Way, Andy ORCID: 0000-0001-5736-5930 (2019) TermEval: an automatic metric for evaluating terminology translation in MT. In: CICLing 2019, the 20th International Conference on Computational Linguistics and Intelligent Text Processing, 07-13 Apr 2019, La Rochelle, France.
Abstract
Terminology translation plays a crucial role in domain-specific machine translation (MT). Preservation of domain-knowledge from source to target
is arguably the most concerning factor for the customers in translation industry,
especially for critical domains such as medical, transportation, military, legal and
aerospace. However, evaluation of terminology translation, despite its huge importance in the translation industry, has been a less examined area in MT research.
Term translation quality in MT is usually measured with domain experts, either in
academia or industry. To the best of our knowledge, as of yet there is no publicly
available solution to automatically evaluate terminology translation in MT. In particular, manual intervention is often needed to evaluate terminology translation
in MT, which, by nature, is a time-consuming and highly expensive task. In fact,
this is unimaginable in an industrial setting where customised MT systems are
often needed to be updated for many reasons (e.g. availability of new training data
or leading MT techniques). Hence, there is a genuine need to have a faster and
less expensive solution to this problem, which could aid the end-users to instantly
identify term translation problems in MT. In this study, we propose an automatic
evaluation metric, TermEval, for evaluating terminology translation in MT. To the
best of our knowledge, there is no gold-standard dataset available for measuring
terminology translation quality in MT. In the absence of gold standard evaluation
test set, we semi-automatically create a gold-standard dataset from English–Hindi
judicial domain parallel corpus.
We trained state-of-the-art phrase-based SMT (PB-SMT) and neural MT (NMT)
models on two translation directions: English-to-Hindi and Hindi-to-English, and
use TermEval to evaluate their performance on terminology translation over the
created gold standard test set. In order to measure the correlation between TermEval scores and human judgments, translations of each source terms (of the gold
standard test set) is validated with human evaluator. High correlation between
TermEval and human judgements manifests the effectiveness of the proposed terminology translation evaluation metric. We also carry out comprehensive manual
evaluation on terminology translation and present our observations.
Metadata
Item Type: | Conference or Workshop Item (Paper) |
---|---|
Event Type: | Conference |
Refereed: | Yes |
Uncontrolled Keywords: | Terminology Translation; Machine Translation; Phrase-Based Statistical Machine Translation; Neural Machine Translation; Term Translation Evaluation |
Subjects: | Computer Science > Machine translating |
DCU Faculties and Centres: | DCU Faculties and Schools > Faculty of Engineering and Computing > School of Computing Research Institutes and Centres > ADAPT |
Copyright Information: | © 2019 The Authors |
ID Code: | 24608 |
Deposited On: | 15 Jun 2020 17:24 by Vidatum Academic . Last Modified 06 Jan 2022 17:36 |
Documents
Full text available as:
Preview |
PDF
- Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
234kB |
Downloads
Downloads
Downloads per month over past year
Archive Staff Only: edit this record