Towards an efficient synergistic paradigm for self-supervised visual representation learning

Krishna, Tarun

Krishna, Tarun (2024) Towards an efficient synergistic paradigm for self-supervised visual representation learning. PhD thesis, Dublin City University.

Abstract
Metadata
Downloads
Documents

[+][-]

Abstract

This thesis investigates the latest developments in self-supervised representation learning, which enables learning from a large un-labelled data corpus. The overarching objective of this work is to comprehensively assess, devise and harness self-supervised models with efficiency and effectiveness at the forefront. Taking an initial step in this direction, this research begins by evaluating the efficacy of contrastive models for instance-based image retrieval, demonstrating their capability to encode semantic similarity among instances induced through discriminative learning. Through extensive evaluation on Oxford5k/Oxford5k, Paris6k/rParis6k and INSTRE, it is shown that these models perform comparably with, and in some cases outperform pre-trained supervised baselines, highlighting their potential for building robust image retrieval engines without explicit supervision. Building upon this foundation, this work further delves into the realm of 360° image visual attention modeling, a domain largely unexplored in the context of self-supervised representation learning. More importantly, the solutions proposed for learning have been validated in realistic benchmarks (Salient 360 [Rai et al., 2017], VR-Eye Tracking, Sitzmann) built with datasets gathered from the Web. Further, contributions are made towards optimizing self-supervised learning strategies, particularly addressing challenges such as redundant channel features and computational complexity. Dynamic channel selection methods originally developed for supervised learning are adapted to self-supervised networks, resulting in signifcant reductions in computation without compromising performance. Additionally, a novel perspective is introduced on the synergy between self-supervised learning and dynamic computation paradigms. Through simultaneous learning of dense and gated sub-networks, a generic and efficient architecture is proposed, achieving comparable performance to vanilla self-supervised settings but with reduced computational costs. These approaches are rigorously benchmarked on the CIFAR-10/100, STL-10 and ImageNet-100 datasets. Finally, the conclusion of this thesis summarizes the contribution of this work and discusses some thoughts on directions for future research in this area.

Metadata

Item Type:	Thesis (PhD)
Date of Award:	5 December 2024
Refereed:	No
Additional Information:	Industry collaboration with Xperi
Supervisor(s):	O'Connor, Noel and McGuinness, Kevin
Subjects:	Computer Science > Image processing Computer Science > Machine learning Computer Science > Digital video
DCU Faculties and Centres:	DCU Faculties and Schools > Faculty of Engineering and Computing > School of Electronic Engineering Research Institutes and Centres > INSIGHT Centre for Data Analytics
Use License:	This item is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 License. View License
Funders:	Research Ireland
ID Code:	30567
Deposited On:	10 Mar 2025 15:07 by Noel Edward O'connor . Last Modified 10 Mar 2025 15:07

Documents

Full text available as:

[thumbnail of pre-examination copy_19215253_090924 2.pdf]

Preview

PDF - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
Creative Commons: Attribution-Noncommercial-No Derivative Works 4.0
24MB

Downloads

Downloads per month over past year

Archive Staff Only: edit this record

DORAS | DCU Research Repository

Towards an efficient synergistic paradigm for self-supervised visual representation learning

Downloads