Study Finds Image Recognition Datasets Skew Toward Simple Images, Inflating Performance Metrics

Researchers found that image recognition datasets are skewed toward less complex images, inflating model performance metrics. A new "minimum viewing time" metric quantifies image difficulty.
Harder images reveal weaknesses in current models, causing a distribution shift not accounted for in evaluations. Tools to compute minimum viewing time enable extending benchmarks.
Larger models improve on simple images but progress less on complex ones. Multimodal models like CLIP move toward more human-like recognition.
The study explores neural correlates of image difficulty and whether complex images use additional brain areas beyond visual processing.
The work addresses challenges in assessing progress toward human-level performance in object recognition and opens new possibilities for understanding and advancing the field.