OpenAI's Use of YouTube to Train AI Fuels Questions on Data Sourcing
-
OpenAI likely uses vast amounts of YouTube videos to train AI models like Sora, despite Google's restrictions on downloading videos at scale.
-
It's unclear how OpenAI accesses enough YouTube videos given Google's throttling and bans on commercial scraping.
-
Quality training data is crucial for developing powerful AI models, sparking a global race to amass text, images, videos.
-
Accessing YouTube videos in ways that violate policies may not be illegal due to fair use laws and precedents.
-
OpenAI declines to disclose details on training data sources; an employee said its data team's methods are closely guarded.