AI Companies Explore Potential of Synthetic Data, But Face Risks of Data Quality and Model Reliability

AI companies are looking into "synthetic data" to address training data shortage, but it's unclear if it will work
Models built on synthetic data can become "inbred" and develop issues ("Habsburg AI")
In one study, an AI model blew up after just 5 generations of training on synthetic data ("Model Autophagy Disorder")
OpenAI and Anthropic are trying a 2-model system to check synthetic data accuracy
Anthropic admits Claude 3 was trained on "data we generate internally", but the technology is still very unproven