Study Finds Ethical and Legal Issues in Many AI Data Sets
-
Researchers uncover ethical and legal risks in popular AI data sets, finding issues like improper licensing and lack of attribution.
-
Audit looked at over 1,800 specialized fine-tuning data sets on sites like Hugging Face and GitHub.
-
About 70% of data sets didn't specify a license or mislabeled permissions more permissive than intended.
-
Proper licensing is important so developers know potential copyright restrictions and requirements.
-
Data sets often lack representation of languages from the Global South compared to English and Western European languages.