Posted 2/14/2024, 2:00:00 PM
Websites Wrestle with Keeping Value from AI Crawlers That Don't Play by Old Rules
- robots.txt is a 30-year-old text file that lets websites control which crawlers can access their sites, but has no legal authority
- It was created as a mutual agreement between sites and search engines to balance value and problems from crawling
- Recently, AI models have changed the equation by extracting huge value from sites' data with no reciprocity
- Many major sites like BBC and NY Times now block AI crawlers, seeing them as stealing rather than trading value
- Robots.txt relies on goodwill so lacks teeth against unscrupulous crawlers, leading some to call for stronger crawler governance