Building machine translation system for software product descriptions using domain-specific sub-corpora extraction

Lohar, Pintu; Popović, Maja; Habruseva, Tanya

Home
Browse By

Author

DCU Faculties and Centres

Theses

Subject

Year

Publication Type

Year of Award

Supervisors
About / FAQ
Statistics
Login (DCU Staff Only)

Building machine translation system for software product descriptions using domain-specific sub-corpora extraction

Lohar, Pintu ORCID: 0000-0002-5328-1585, Popović, Maja ORCID: 0000-0001-8234-8745 and Habruseva, Tanya (2022) Building machine translation system for software product descriptions using domain-specific sub-corpora extraction. In: 15th biennial conference of the Association for Machine Translation in the Americas (Volume 1: Research Track), 12-16 Sept 2022, Orlando, FL, USA.

Abstract
Metadata
Downloads
Documents

[+][-]

Abstract

Building Machine Translation systems for a specific domain requires a sufficiently large and good quality parallel corpus in that domain. However, this is a bit challenging task due to the lack of parallel data in many domains such as economics, science and technology, sports etc. In this work, we build English-to-French translation systems for software product descriptions scraped from LinkedIn website. Moreover, we developed a first-ever test parallel data set of product descriptions. We conduct experiments by building a baseline translation system trained on general domain and then domain-adapted systems using sentence-embedding based corpus filtering and domain-specific sub-corpora extraction. All the systems are tested on our newly developed data set mentioned earlier. Our experimental evaluation reveals that the domain-adapted model based on our proposed approaches outperforms the baseline.

Metadata

Item Type:	Conference or Workshop Item (Paper)
Event Type:	Conference
Refereed:	Yes
Subjects:	Computer Science > Machine learning Computer Science > Machine translating
DCU Faculties and Centres:	DCU Faculties and Schools > Faculty of Engineering and Computing > School of Computing Research Institutes and Centres > ADAPT
Published in:	Proceedings of the 15th biennial conference of the Association for Machine Translation in the Americas (Volume 1: Research Track). . Association for Machine Translation in the Americas.
Publisher:	Association for Machine Translation in the Americas
Official URL:	https://aclanthology.org/2022.amta-research.1
Copyright Information:	© 2022 Association for Machine Translation in the Americas
Funders:	Linkedin, ADAPT Centre for Digital Content Technology which is funded under the Science Foundation Ireland (SFI) Research Centres Programme (Grant No. 13/RC/2106).
ID Code:	28367
Deposited On:	25 May 2023 13:56 by Maja Popovic . Last Modified 29 May 2023 13:02

Documents

Full text available as:

Preview

PDF - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
Creative Commons: Attribution-Noncommercial 4.0
781kB

Downloads

Downloads per month over past year

Archive Staff Only: edit this record

DORAS | DCU Research Repository

Building machine translation system for software product descriptions using domain-specific sub-corpora extraction

Downloads