Skip to main content
DORAS
DCU Online Research Access Service
Login (DCU Staff Only)
Using BabelNet to improve OOV coverage in SMT

Du, Jinhua ORCID: 0000-0002-3267-4881, Way, Andy ORCID: 0000-0001-5736-5930 and Zydron, Andrzej (2016) Using BabelNet to improve OOV coverage in SMT. In: 2016 International Conference on Language Resources and Evaluation, 23-28 May 2016, Portorož, Slovenia. ISBN 978-2-9517408-9-1

Full text available as:

[img]
Preview
PDF - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
144kB

Abstract

Out-of-vocabulary words (OOVs) are a ubiquitous and difficult problem in statistical machine translation (SMT). This paper studies different strategies of using BabelNet to alleviate the negative impact brought about by OOVs. BabelNet is a multilingual encyclopedic dictionary and a semantic network, which not only includes lexicographic and encyclopedic terms, but connects concepts and named entities in a very large network of semantic relations. By taking advantage of the knowledge in BabelNet, three different methods – using direct training data, domain-adaptation techniques and the BabelNet API – are proposed in this paper to obtain translations for OOVs to improve system performance. Experimental results on English–Polish and English–Chinese language pairs show that domain adaptation can better utilize BabelNet knowledge and performs better than other methods. The results also demonstrate that BabelNet is a really useful tool for improving translation performance of SMT systems.

Item Type:Conference or Workshop Item (Paper)
Event Type:Conference
Refereed:Yes
Uncontrolled Keywords:BabelNet; SMT; unknown words; OOVs; domain adaptation
Subjects:Computer Science > Machine translating
DCU Faculties and Centres:DCU Faculties and Schools > Faculty of Engineering and Computing > School of Computing
Research Initiatives and Centres > ADAPT
Published in: Calzolari, Nicoletta and Choukri, Khalid, (eds.) Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016). . European Language Resources Association. ISBN 978-2-9517408-9-1
Publisher:European Language Resources Association
Official URL:http://www.lrec-conf.org/proceedings/lrec2016/pdf/597_Paper.pdf
Copyright Information:© 2019 ELRA
Use License:This item is licensed under a Creative Commons Attribution-NonCommercial-Share Alike 3.0 License. View License
Funders:Science Foundation Ireland through the ADAPT Centre (Grant 13/RC/2106) (www.adaptcentre.ie) at Dublin City University and Trinity College Dublin, Grant 610879 for the Falcon project funded by the European Commission
ID Code:23224
Deposited On:01 May 2019 15:32 by Thomas Murtagh . Last Modified 01 May 2019 15:32

Downloads

Downloads per month over past year

Archive Staff Only: edit this record

Altmetric
- Altmetric
+ Altmetric
  • Student Email
  • Staff Email
  • Student Apps
  • Staff Apps
  • Loop
  • Disclaimer
  • Privacy
  • Contact Us