Login (DCU Staff Only)
Login (DCU Staff Only)

DORAS | DCU Research Repository

Explore open access research and scholarly works from DCU

Advanced Search

On machine translation of user reviews

Popović, Maja orcid logoORCID: 0000-0001-8234-8745, Way, Andy orcid logoORCID: 0000-0001-5736-5930, Poncelas, Alberto orcid logoORCID: 0000-0002-5089-1687 and Brkić Bakarić, Marija orcid logoORCID: 0000-0003-4079-4012 (2021) On machine translation of user reviews. In: Recent Advances in Natural Language Processing, 1-3 Sept 2021, Online. ISBN 978-954-452-072-4

Abstract
This work investigates neural machine translation (NMT) systems for translating English user reviews into Croatian and Serbian, two similar morphologically complex languages. Two types of reviews are used for testing the systems: IMDb movie reviews and Amazon product reviews. Two types of training data are explored: large out-of-domain bilingual parallel corpora, as well as small synthetic in-domain parallel corpus obtained by machine translation of monolingual English Amazon reviews into the target languages. Both automatic scores and human evaluation show that using the synthetic in-domain corpus together with a selected subset of out-of-domain data is the best option. Separated results on IMDb and Amazon reviews indicate that MT systems perform differently on different review types so that user reviews generally should not be considered as a homogeneous genre. Nevertheless, more detailed research on larger amount of different reviews covering different domains/topics is needed to fully understand these differences.
Metadata
Item Type:Conference or Workshop Item (Paper)
Event Type:Conference
Refereed:Yes
Subjects:Computer Science > Machine translating
DCU Faculties and Centres:UNSPECIFIED
Published in: Proceedings of Recent Advances in Natural Language Processing. . Association for Computational Linguistics (ACL). ISBN 978-954-452-072-4
Publisher:Association for Computational Linguistics (ACL)
Official URL:https://dx.doi.org/10.26615/978-954-452-072-4_124
Copyright Information:© 2020 The Authors.
ID Code:27448
Deposited On:28 Jul 2022 12:39 by Thomas Murtagh . Last Modified 23 May 2023 15:26
Documents

Full text available as:

[thumbnail of 2021.ranlp-1.124.pdf]
Preview
PDF - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
Creative Commons: Attribution 4.0
283kB
Downloads

Downloads

Downloads per month over past year

Archive Staff Only: edit this record