Concerns Raised Over Potentially Harmful Content in AI Training Data

Over 1,000 images of child sexual abuse were found in the LAION 5B dataset used to train AI image generators like Stable Diffusion.
The presence of this content could make it easier for AIs to generate new abusive images or deepfakes.
The findings raise concerns about the lack of transparency around training data for new generative AI tools.
The nonprofit behind the dataset has taken it offline to review the findings and remove any additional abusive content.
The report recommends restricting large web-scraped datasets to research settings only and using more curated data to train publicly available AI models.