Dreano, Soren (2025) Breaking Language Barriers: Reimagining Machine Translation as Style Transfer. PhD thesis, Dublin City University.
Abstract
Machine Translation (MT) has traditionally relied on parallel corpora, posing significant challenges for low-resource languages. This thesis reimagines Machine Translation as a style transfer problem, introducing a novel framework, CycleGN, to enable both translation and style transfer without requiring parallel data sets. Using monolingual corpora, the proposed method seeks to broaden equitable access to language translation systems, while advancing theoretical insights into style transfer. Another contribution of this work is the development of Tokengram_F, a novel metric that extends n-gram analysis to better capture linguistic and contextual nuances in translation evaluation and can estimate the quality of machine-generated sentences in more than 200 different languages. Furthermore, Embed_llama leverages pre-trained Large Language Model (LLM) embeddings to enhance semantic alignment and evaluation accuracy, deepening the work on transfer learning.
This thesis also explores text compression through the development of Llamazip, a lossless compression algorithm that uses the predictive capabilities of LLMs. Beyond achieving excellent compression ratios, Llamazip demonstrates innovative applications, such as identifying training set membership of given target text and benchmarking predictive performance.
The research presented in this thesis has led to the publication of four peer-reviewed publications, the submission of another one, and we are also in the process of writing a further paper. Ultimately, this work seeks to democratise access to translation technologies by broadening the scope of accessible training data. It aims to contribute to the evolution of language technologies in a multilingual world.
Metadata
| Item Type: | Thesis (PhD) |
|---|---|
| Date of Award: | May 2025 |
| Refereed: | No |
| Supervisor(s): | Molloy, Derek and Murphy, Noel |
| Subjects: | Computer Science > Artificial intelligence Computer Science > Computer security Computer Science > Information retrieval Computer Science > Machine learning Computer Science > Machine translating Engineering > Signal processing |
| DCU Faculties and Centres: | DCU Faculties and Schools > Faculty of Engineering and Computing > School of Electronic Engineering |
| Use License: | This item is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 License. View License |
| Funders: | Research Ireland - Centre for Research Training in Machine Learning |
| ID Code: | 31116 |
| Deposited On: | 21 Nov 2025 14:47 by Derek Molloy . Last Modified 21 Nov 2025 14:47 |
Documents
Full text available as:
Preview |
PDF
- Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
Creative Commons: Attribution-Noncommercial-No Derivative Works 4.0 12MB |
Downloads
Downloads
Downloads per month over past year
Archive Staff Only: edit this record