Deep neural networks used in computer vision have been shown to exhibit many social biases such as gender bias. Vision Transformers (ViTs) have become increasingly
popular in computer vision applications, outperforming Convolutional Neural Networks
(CNNs) in many tasks such as image classification. However, given that research on
mitigating bias in computer vision has primarily focused on CNNs, it is important to
evaluate the effect of a different network architecture on the potential for bias amplification. In this paper we therefore introduce a novel metric to measure bias in architectures,
Accuracy Difference. We examine bias amplification when models belonging to these
two architectures are used as a part of large multimodal models, evaluating the different image encoders of Contrastive Language Image Pretraining which is an important
model used in many generative models such as DALL-E and Stable Diffusion. Our experiments demonstrate that architecture can play a role in amplifying social biases due
to the different techniques employed by the models for feature extraction and embedding as well as their different learning properties. This research found that ViTs amplified gender bias to a greater extent than CNNs.
Proceedings of the 34th British Machine Vision Conference 2023, {BMVC}.
.
British Machine Vision Association (BMVA). ISBN https://proceedings.bmvc2023.org/629/
<A+> Alliance / Women at the Table as an Inaugural Tech Fellow 2020/2021., Science Foundation Ireland (SFI) under Grant Number SFI/12/RC/2289_2, European Regional Development Fund
ID Code:
29469
Deposited On:
19 Jan 2024 11:39 by
Abhishek Mandal
. Last Modified 19 Jan 2024 11:39