AI Scientists Conduct Experiments to Explain Other AI Systems

Researchers developed AI "agents" that can conduct experiments to explain the behavior of other AI systems and neural networks.
The "automated interpretability agent" (AIA) makes hypotheses, runs tests, and iteratively refines its understanding of other AI systems.
The new "FIND" benchmark provides test functions resembling real neural networks, with ground truth descriptions, to evaluate interpretation methods.
Initial results show AIAs can outperform other methods but still fail to accurately describe nearly half the functions.
The goal is to develop AIAs that can audit neural networks to diagnose issues before deployment, with human oversight.