Tech Firms Push Boundaries in Scramble for AI Training Data
-
OpenAI researchers created a tool called Whisper to transcribe YouTube videos to get more conversational text to train AI systems, potentially violating YouTube's terms of service.
-
OpenAI president Greg Brockman personally helped collect over 1 million hours of YouTube videos to train the GPT-4 system.
-
Meta executives discussed buying publisher Simon & Schuster to get long-form text and gathering copyrighted data from across the internet to train AI systems.
-
Tech companies are cutting corners and debating bending laws to get the data needed to advance AI technology.
-
Negotiating licenses with publishers and content creators would take too long so tech firms are finding ways to get data through other means.