AI Training Data Lacks Transparency, Raising Concerns Over Privacy and Bias
-
AI models are trained on massive datasets scraped from the public internet, including copyrighted and private material, without transparency.
-
Web scrapers can access public sites and profiles, paywalled content, pirated materials, and leaked personal data.
-
Lack of transparency around training data raises issues related to copyright, privacy, and bias.
-
Marginalized groups are underrepresented in web data, skewing AI.
-
There are few options currently to protect personal data from being used to train AI systems.