Le Khac, Phuc H ORCID: 0000-0002-0504-5844 (2024) Toward efficient learning of structured representations in computer vision. PhD thesis, Dublin City University.
Abstract
The ability to learn a hierarchical and compact representation from data stands as a fundamental principle behind the rapid growth of Deep Learning, particularly evident in Computer Vision. Despite the significant progress on the perception tasks such as recognition and detection, these models still fall short in terms of reasoning and planning capabilities, and cannot generalise systematically despite being trained with extensive amount of data and compute resources.
How to effectively scale up a representation learning system in terms of computation and data, and extend the capabilities of the visual representations toward high-level tasks is the central research topic of this thesis.
First we focus on contrastive representation learning, a general approach for learning representation by comparison. We survey and analyse more than 100 recent works and provide a framework to categorise and understand research in this direction, not only in the context of self-supervised visual learning but also for other domains and applications.
We then turn towards the problem of object-centric representation learning, a promising approach to learn structured representations in a complex visual scene for planning and reasoning tasks. We first explore using discrete representation for object-centric learning, motivated by the common goal of decomposing the continuous visual signal into individual discrete components.
Understanding the importance and challenges of scaling in learning representations from data, we propose an efficient architecture for decoding object-centric representations, a ubiquitous but memory-intense component present in most object- centric learning methods.
Finally, to address the challenge of learning these object-centric representations in complex and realistic data, we capitalise on the advancements in pre-trained mod- els for visual representations, enabling the learning of higher-level representations. Inspired by human cognitive development, we further study the effects of depth information and geometry contained in these representations, exploring their influence on the process of unsupervised object discovery.
Metadata
Item Type: | Thesis (PhD) |
---|---|
Date of Award: | August 2024 |
Refereed: | No |
Supervisor(s): | Smeaton, Alan F. and Healy, Graham |
Subjects: | Computer Science > Artificial intelligence Computer Science > Machine learning |
DCU Faculties and Centres: | DCU Faculties and Schools > Faculty of Engineering and Computing > School of Computing Research Institutes and Centres > INSIGHT Centre for Data Analytics |
Use License: | This item is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 License. View License |
Funders: | Science Foundation Ireland |
ID Code: | 30242 |
Deposited On: | 18 Nov 2024 11:48 by Alan Smeaton . Last Modified 18 Nov 2024 11:48 |
Documents
Full text available as:
Preview |
PDF
- Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
Creative Commons: Attribution-Noncommercial-No Derivative Works 4.0 17MB |
Downloads
Downloads
Downloads per month over past year
Archive Staff Only: edit this record