Skip to main content
DORAS
DCU Online Research Access Service
Login (DCU Staff Only)
Knowledge distillation: a method for making neural machine translation more efficient

Jooste, Wandri, Haque, Rejwanul ORCID: 0000-0003-1680-0099 and Way, Andy ORCID: 0000-0001-5736-5930 (2022) Knowledge distillation: a method for making neural machine translation more efficient. Information, 13 (2). ISSN 2078-2489

Full text available as:

[img]
Preview
PDF - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
861kB

Abstract

Neural machine translation (NMT) systems have greatly improved the quality available from machine translation (MT) compared to statistical machine translation (SMT) systems. However, these state-of-the-art NMT models need much more computing power and data than SMT models, a requirement that is unsustainable in the long run and of very limited benefit in low-resource scenarios. To some extent, model compression—more specifically state-of-the-art knowledge distillation techniques—can remedy this. In this work, we investigate knowledge distillation on a simulated low-resource German-to-English translation task. We show that sequence-level knowledge distillation can be used to train small student models on knowledge distilled from large teacher models. Part of this work examines the influence of hyperparameter tuning on model performance when lowering the number of Transformer heads or limiting the vocabulary size. Interestingly, the accuracy of these student models is higher than that of the teachers in some cases even though the student model training times are shorter in some cases. In a novel contribution, we demonstrate for a specific MT service provider that in the post-deployment phase, distilled student models can reduce emissions, as well as cost purely in monetary terms, by almost 50%

Item Type:Article (Published)
Refereed:Yes
Uncontrolled Keywords:NMT; Green AI; knowledge distillation; CO2 savings
Subjects:Computer Science > Machine translating
DCU Faculties and Centres:DCU Faculties and Schools > Faculty of Engineering and Computing > School of Computing
Research Initiatives and Centres > ADAPT
Publisher:MDPI
Official URL:https://dx.doi.org/10.3390/info13020088
Copyright Information:© 2022 The Authors.
ID Code:27452
Deposited On:28 Jul 2022 16:46 by Thomas Murtagh . Last Modified 23 Mar 2023 16:35

Downloads

Downloads per month over past year

Archive Staff Only: edit this record

Altmetric
- Altmetric
+ Altmetric
  • Student Email
  • Staff Email
  • Student Apps
  • Staff Apps
  • Loop
  • Disclaimer
  • Privacy
  • Contact Us