Researchers Build Large AI Models Without Copyrighted Data, Proving Feasible Alternative

• OpenAI claimed training AI without copyrighted data was "impossible," but new models show it can be done legally • Nonprofit Fairly Trained certified the first large language model, KL3M, built without copyright infringement
• KL3M was trained on legal documents to create AI for summarizing and drafting contracts • Researchers released Common Corpus, the largest public domain AI dataset yet at 500 million tokens
• While limited, these models demonstrate leading AI can be developed legally and ethically