How hard are computer vision datasets? Calibrating dataset difficulty to viewing time

Computer Vision


Published on




Humans outperform object recognizers despite the fact that models perform well
on current datasets. Numerous efforts exist to make more challenging datasets by
scaling up on the web, exploring distribution shift, or adding controls for biases.
The difficulty of each image in each dataset is not independently evaluated, nor
is the concept of dataset difficulty as a whole currently well defined. We develop
a new dataset difficulty metric based on how long humans must view an image
in order to classify a target object. Images whose objects can be recognized in
17ms are considered to be easier than those which require seconds of viewing time.
Using 133,588 judgments on two major datasets, ImageNet and ObjectNet, we
determine the distribution of image difficulties in those datasets, which we find
varies wildly, but significantly undersamples hard images. Rather than hoping
that distribution shift will lead to hard datasets, we should explicitly measure their
difficulty. Analyzing model performance guided by image difficulty reveals that
models tend to have lower performance and a larger generalization gap on harder
images. We release a dataset of difficulty judgments as a complementary metric
to raw performance and other behavioral/neural metrics. Such experiments with
humans allow us to create a metric for progress in object recognition datasets. This
metric can be used to both test the biological validity of models in a novel way, and
develop tools to fill out the missing class of hard examples as datasets are being

Please cite our work using the BibTeX below.

title={Workshop version: How hard are computer vision datasets? Calibrating dataset difficulty to viewing time},
author={David Mayo and Jesse Cummings and Xinyu Lin and Dan Gutfreund and Boris Katz and Andrei Barbu},
booktitle={SVRHM 2022 Workshop @ NeurIPS },
Close Modal