Login (DCU Staff Only)
Login (DCU Staff Only)

DORAS | DCU Research Repository

Explore open access research and scholarly works from DCU

Advanced Search

Building machine translation system for software product descriptions using domain-specific sub-corpora extraction

Lohar, Pintu orcid logoORCID: 0000-0002-5328-1585, Popović, Maja orcid logoORCID: 0000-0001-8234-8745 and Habruseva, Tanya (2022) Building machine translation system for software product descriptions using domain-specific sub-corpora extraction. In: 15th biennial conference of the Association for Machine Translation in the Americas (Volume 1: Research Track), 12-16 Sept 2022, Orlando, FL, USA.

Abstract
Building Machine Translation systems for a specific domain requires a sufficiently large and good quality parallel corpus in that domain. However, this is a bit challenging task due to the lack of parallel data in many domains such as economics, science and technology, sports etc. In this work, we build English-to-French translation systems for software product descriptions scraped from LinkedIn website. Moreover, we developed a first-ever test parallel data set of product descriptions. We conduct experiments by building a baseline translation system trained on general domain and then domain-adapted systems using sentence-embedding based corpus filtering and domain-specific sub-corpora extraction. All the systems are tested on our newly developed data set mentioned earlier. Our experimental evaluation reveals that the domain-adapted model based on our proposed approaches outperforms the baseline.
Metadata
Item Type:Conference or Workshop Item (Paper)
Event Type:Conference
Refereed:Yes
Subjects:Computer Science > Machine learning
Computer Science > Machine translating
DCU Faculties and Centres:DCU Faculties and Schools > Faculty of Engineering and Computing > School of Computing
Research Institutes and Centres > ADAPT
Published in: Proceedings of the 15th biennial conference of the Association for Machine Translation in the Americas (Volume 1: Research Track). . Association for Machine Translation in the Americas.
Publisher:Association for Machine Translation in the Americas
Official URL:https://aclanthology.org/2022.amta-research.1
Copyright Information:© 2022 Association for Machine Translation in the Americas
Funders:Linkedin, ADAPT Centre for Digital Content Technology which is funded under the Science Foundation Ireland (SFI) Research Centres Programme (Grant No. 13/RC/2106).
ID Code:28367
Deposited On:25 May 2023 13:56 by Maja Popovic . Last Modified 29 May 2023 13:02
Documents

Full text available as:

[thumbnail of 2022.amta-research.1.pdf]
Preview
PDF - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
Creative Commons: Attribution-Noncommercial 4.0
781kB
Downloads

Downloads

Downloads per month over past year

Archive Staff Only: edit this record