- Sumability

• LAION-5B, a leading AI training data set, found to contain over 1,000 images of child sexual abuse • Massive data sets needed to train AIs inevitably contain harmful content scraped from across the internet • Researcher Abeba Birhane previously found sexist, pornographic and rape imagery in earlier LAION data sets • Problematic data leads to problematic AI models; scale comes at the cost of quality and auditability • Open-sourcing data sets, as horrible as they may be, is necessary to understand and improve them