Lohar, Pintu, Afli, Haithem ORCID: 0000-0002-7449-4707 and Way, Andy ORCID: 0000-0001-5736-5930 (2017) Maintaining sentiment polarity in translation of user-generated content. Prague Bulletin of Mathematical Linguistics (108). pp. 73-84. ISSN 1804-0462
Abstract
The advent of social media has shaken the very foundations of how we share information,
with Twitter, Facebook, and Linkedin among many well-known social networking platforms
that facilitate information generation and distribution. However, the maximum 140-character
restriction in Twitter encourages users to (sometimes deliberately) write somewhat informally
in most cases. As a result, machine translation (MT) of user-generated content (UGC) becomes
much more difficult for such noisy texts. In addition to translation quality being affected, this
phenomenon may also negatively impact sentiment preservation in the translation process.
That is, a sentence with positive sentiment in the source language may be translated into a
sentence with negative or neutral sentiment in the target language. In this paper, we analyse
both sentiment preservation and MT quality per sein the context of UGC, focusing especially on
whether sentiment classification helps improve sentiment preservation in MT of UGC. We build
four different experimental setups for tweet translation (i) using a single MT model trained on
the whole Twitter parallel corpus, (ii) using multiple MT models based on sentiment classification, (iii) using MT models including additional out-of-domain data, and (iv) adding MT
models based on the phrase-table fill-up method to accompany the sentiment translation models with an aim of improving MT quality and at the same time maintaining sentiment polarity
preservation. Our empirical evaluation shows that despite a slight deterioration in MT quality,
our system significantly outperforms the Baseline MT system (without using sentiment classification) in terms of sentiment preservation. We also demonstrate that using an MT engine
that conveys a sentiment different from that of the UGC can even worsen both the translation
quality and sentiment preservation.
Metadata
Item Type: | Article (Published) |
---|---|
Refereed: | Yes |
Subjects: | Computer Science > Machine translating |
DCU Faculties and Centres: | DCU Faculties and Schools > Faculty of Engineering and Computing > School of Computing Research Institutes and Centres > ADAPT |
Publisher: | De Gruyter Open |
Official URL: | http://dx.doi.org/10.1515/pralin-2017-0010 |
Copyright Information: | © 2017 PBML. Distributed under CC BY-NC-ND |
Use License: | This item is licensed under a Creative Commons Attribution-NonCommercial-Share Alike 3.0 License. View License |
Funders: | This research is supported by Science Foundation Ireland in the ADAPT Centre (Grant 13/RC/2106) (www.adaptcentre.ie) at Dublin City University. |
ID Code: | 23312 |
Deposited On: | 20 May 2019 08:54 by Thomas Murtagh . Last Modified 20 May 2019 08:54 |
Documents
Full text available as:
Preview |
PDF
- Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
235kB |
Metrics
Altmetric Badge
Dimensions Badge
Downloads
Downloads
Downloads per month over past year
Archive Staff Only: edit this record