Login (DCU Staff Only)
Login (DCU Staff Only)

DORAS | DCU Research Repository

Explore open access research and scholarly works from DCU

Advanced Search

Building English-to-Serbian machine translation system for IMDb movie reviews

Way, Andy orcid logoORCID: 0000-0001-5736-5930, Lohar, Pintu orcid logoORCID: 0000-0002-5328-1585 and Popović, Maja orcid logoORCID: 0000-0001-8234-8745 (2019) Building English-to-Serbian machine translation system for IMDb movie reviews. In: Proceedings of the 7th Workshop on Balto-Slavic Natural Language Processing, 2 Aug 2019, Florence,Italy. ISBN 978-1-950737-41-3

Abstract
This paper reports the results of the first experiment dealing with the challenges of building a machine translation system for user-generated content involving a complex South Slavic language. We focus on translation of English IMDb user movie reviews into Serbian, in a low-resource scenario. We explore potentials and limits of (i) phrase-based and neural machine translation systems trained on out-of-domain clean parallel data from news articles (ii) creating additional synthetic in-domain parallel corpus by machine-translating the English IMDb corpus into Serbian. Our main findings are that morphology and syntax are better handled by the neural approach than by the phrase-based approach even in this low-resource mismatched domain scenario, however the situation is different for the lexical aspect, especially for person names. This finding also indicates that in general, machine translation of person names into Slavic languages (especially those which require/allow transcription) should be investigated more systematically.
Metadata
Item Type:Conference or Workshop Item (Paper)
Event Type:Workshop
Refereed:Yes
Subjects:Computer Science > Machine translating
DCU Faculties and Centres:DCU Faculties and Schools > Faculty of Engineering and Computing > School of Computing
Research Institutes and Centres > ADAPT
Published in: Erjavec, Tomaž, Marcińczuk, Michał, Nakov, Preslav and Piskorsk, Jakob, (eds.) Proceedings of the 7th Workshop on Balto-Slavic Natural Language Processing (BSNLP'19). . ACL Anthology. ISBN 978-1-950737-41-3
Publisher:ACL Anthology
Official URL:https://www.aclweb.org/anthology/W19-3715.pdf
Copyright Information:© 2019 Association for Computational Linguistics
Use License:This item is licensed under a Creative Commons Attribution-NonCommercial-Share Alike 3.0 License. View License
Funders:ADAPT Centre for Digital Content Technology at Dublin City University, funded under the Science Foundation Ireland Research Centres Programme (Grant 13/RC/2106) and co-funded under the European Regional Development Fund.
ID Code:23862
Deposited On:21 Oct 2019 12:07 by Andrew Way . Last Modified 05 May 2023 16:32
Documents

Full text available as:

[thumbnail of W19-3715.pdf]
Preview
PDF - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
202kB
Metrics

Altmetric Badge

Dimensions Badge

Downloads

Downloads

Downloads per month over past year

Archive Staff Only: edit this record