Kandimalla, Akshara, Lohar, Pintu ORCID: 0000-0002-5328-1585, Maji, Souvik Kumar and Way, Andy ORCID: 0000-0001-5736-5930 (2022) Improving English-to-Indian language neural machine translation systems. Information, 13 (5). ISSN 2078-2489
Abstract
Most Indian languages lack sufficient parallel data for Machine Translation (MT) training. In this study, we build English-to-Indian language Neural Machine Translation (NMT) systems using the state-of-the-art transformer architecture. In addition, we investigate the utility of back-translation and its effect on system performance. Our experimental evaluation reveals that the back-translation method helps to improve the BLEU scores for both English-to-Hindi and English-to-Bengali NMT systems. We also observe that back-translation is more useful in improving the quality of weaker baseline MT systems. In addition, we perform a manual evaluation of the translation outputs and observe that the BLEU metric cannot always analyse the MT quality as well as humans. Our analysis shows that MT outputs for the English–Bengali pair are actually better than that evaluated by BLEU metric.
Metadata
Item Type: | Article (Published) |
---|---|
Refereed: | Yes |
Uncontrolled Keywords: | machine translation; back-translation; parallel data |
Subjects: | Computer Science > Machine translating |
DCU Faculties and Centres: | DCU Faculties and Schools > Faculty of Engineering and Computing > School of Computing Research Institutes and Centres > ADAPT |
Publisher: | MDPI |
Official URL: | https://dx.doi.org/10.3390/info13050245 |
Copyright Information: | © 2022 The Authors. |
ID Code: | 27451 |
Deposited On: | 29 Jul 2022 09:22 by Thomas Murtagh . Last Modified 05 May 2023 16:40 |
Documents
Full text available as:
Preview |
PDF
- Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
Creative Commons: Attribution 4.0 233kB |
Metrics
Altmetric Badge
Dimensions Badge
Downloads
Downloads
Downloads per month over past year
Archive Staff Only: edit this record