Study Finds Image Recognition Datasets Skew Toward Simple Images, Inflating Performance Metrics
-
Researchers found that image recognition datasets are skewed toward less complex images, inflating model performance metrics. A new "minimum viewing time" metric quantifies image difficulty.
-
Harder images reveal weaknesses in current models, causing a distribution shift not accounted for in evaluations. Tools to compute minimum viewing time enable extending benchmarks.
-
Larger models improve on simple images but progress less on complex ones. Multimodal models like CLIP move toward more human-like recognition.
-
The study explores neural correlates of image difficulty and whether complex images use additional brain areas beyond visual processing.
-
The work addresses challenges in assessing progress toward human-level performance in object recognition and opens new possibilities for understanding and advancing the field.