AI Companies Search for New Data Sources as Models Grow Hungrier
-
AI companies are running out of training data as they build ever-larger models, burning through available data on the internet.
-
Companies like OpenAI are looking into controversial alternative data sources like YouTube transcripts and synthetic, AI-generated data.
-
Researchers warn that training models on synthetic data leads to poor performance ("model collapse").
-
Companies like Anthropic claim to be developing higher quality synthetic data, but details are scarce.
-
One solution is for AI companies to stop trying to create ever-larger models that require massive datasets.