A recurring focus of the deep learning community is to-
wards reducing the labeling effort. Data gathering and
annotation using a search engine is a simple alternative to
generating a fully human-annotated and human-gathered
dataset. Although web crawling is very time efficient, some
of the retrieved images are unavoidably noisy, i.e. incor-
rectly labeled. Designing robust algorithms for training on
noisy data gathered from the web is an important research
perspective that would render the building of datasets eas-
ier. In this paper we conduct a study to understand the type
of label noise to expect when building a dataset using a
search engine. We review the current limitations of state-
of-the-art methods for dealing with noisy labels for image
classification tasks in the case of web noise distribution. We
propose a simple solution to bridge the gap with a fully clean
dataset using Dynamic Softening of Out-of-distribution Sam-
ples (DSOS), which we design on corrupted versions of the
CIFAR-100 dataset, and compare against state-of-the-art
algorithms on the web noise perturbated MiniImageNet and
Stanford datasets and on real label noise datasets: WebVi-
sion 1.0 and Clothing1M. Our work is fully reproducible
https://git.io/JKGcj.