Concerns Raised Over Potentially Harmful Content in AI Training Data
-
Over 1,000 images of child sexual abuse were found in the LAION 5B dataset used to train AI image generators like Stable Diffusion.
-
The presence of this content could make it easier for AIs to generate new abusive images or deepfakes.
-
The findings raise concerns about the lack of transparency around training data for new generative AI tools.
-
The nonprofit behind the dataset has taken it offline to review the findings and remove any additional abusive content.
-
The report recommends restricting large web-scraped datasets to research settings only and using more curated data to train publicly available AI models.