Login (DCU Staff Only)
Login (DCU Staff Only)

DORAS | DCU Research Repository

Explore open access research and scholarly works from DCU

Advanced Search

Breaking Language Barriers: Reimagining Machine Translation as Style Transfer

Dreano, Soren (2025) Breaking Language Barriers: Reimagining Machine Translation as Style Transfer. PhD thesis, Dublin City University.

Abstract
Machine Translation (MT) has traditionally relied on parallel corpora, posing significant challenges for low-resource languages. This thesis reimagines Machine Translation as a style transfer problem, introducing a novel framework, CycleGN, to enable both translation and style transfer without requiring parallel data sets. Using monolingual corpora, the proposed method seeks to broaden equitable access to language translation systems, while advancing theoretical insights into style transfer. Another contribution of this work is the development of Tokengram_F, a novel metric that extends n-gram analysis to better capture linguistic and contextual nuances in translation evaluation and can estimate the quality of machine-generated sentences in more than 200 different languages. Furthermore, Embed_llama leverages pre-trained Large Language Model (LLM) embeddings to enhance semantic alignment and evaluation accuracy, deepening the work on transfer learning. This thesis also explores text compression through the development of Llamazip, a lossless compression algorithm that uses the predictive capabilities of LLMs. Beyond achieving excellent compression ratios, Llamazip demonstrates innovative applications, such as identifying training set membership of given target text and benchmarking predictive performance. The research presented in this thesis has led to the publication of four peer-reviewed publications, the submission of another one, and we are also in the process of writing a further paper. Ultimately, this work seeks to democratise access to translation technologies by broadening the scope of accessible training data. It aims to contribute to the evolution of language technologies in a multilingual world.
Metadata
Item Type:Thesis (PhD)
Date of Award:May 2025
Refereed:No
Supervisor(s):Molloy, Derek and Murphy, Noel
Subjects:Computer Science > Artificial intelligence
Computer Science > Computer security
Computer Science > Information retrieval
Computer Science > Machine learning
Computer Science > Machine translating
Engineering > Signal processing
DCU Faculties and Centres:DCU Faculties and Schools > Faculty of Engineering and Computing > School of Electronic Engineering
Use License:This item is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 License. View License
Funders:Research Ireland - Centre for Research Training in Machine Learning
ID Code:31116
Deposited On:21 Nov 2025 14:47 by Derek Molloy . Last Modified 21 Nov 2025 14:47
Documents

Full text available as:

[thumbnail of thesis_soren_dreano_no_signature.pdf]
Preview
PDF - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
Creative Commons: Attribution-Noncommercial-No Derivative Works 4.0
12MB
Downloads

Downloads

Downloads per month over past year

Archive Staff Only: edit this record