Topics

Posted 9/29/2023, 1:36:00 AM

Medium Updates Robots.txt to Block AI Scrapers, Seeks New Protections for Writers' Content

Medium is updating its robots.txt file to block AI training data scrapers like GPTBot from copying its content without permission.
Medium CEO Tony Stubblebine says the default answer for AI companies wanting to use Medium content for training is now "no" without consent.
Blocking scrapers via robots.txt has limitations, so Medium will also send cease and desist letters to crawlers taking content without permission.
Stubblebine says copyright law may not protect writers from AI models trained on and replicating their work.
Medium is open to allowing its content to be used for AI training if protocols for crediting and compensating writers are established.

theregister.com

Relevant topic timeline:

12/5/2022

AI Homework

The main topic of the passage is the impact of OpenAI's ChatGPT on society, particularly in the context of education and homework. The key points are: 1. ChatGPT, a language model developed by OpenAI, has gained significant interest and usage since its launch. 2. ChatGPT's ability to generate text has implications for homework and education, as it can provide answers and content for students. 3. The use of AI-generated content raises questions about the nature of knowledge and the role of humans as editors rather than interrogators. 4. The impact of ChatGPT on platforms like Stack Overflow has led to temporary bans on using AI-generated text for posts. 5. The author suggests that the future of AI lies in the "sandwich" workflow, where humans prompt and edit AI-generated content to enhance creativity and productivity.

8/14/2023

The New York Times prohibits AI vendors from devouring its content

Main topic: The New York Times updates its terms of service to prohibit scraping its articles and images for AI training. Key points: 1. The updated terms of service prohibit the use of Times content for training any AI model without express written permission. 2. The content is only for personal, non-commercial use and does not include training AI systems. 3. Prior written consent from the NYT is required to use the content for software program development, including training AI systems.

8/14/2023

Why the Great AI Backlash Came for a Tiny Startup You’ve Probably Never Heard Of

The main topic of the article is the backlash against AI companies that use unauthorized creative work to train their models. Key points: 1. The controversy surrounding Prosecraft, a linguistic analysis site that used scraped data from pirated books without permission. 2. The debate over fair use and copyright infringement in relation to AI projects. 3. The growing concern among writers and artists about the use of generative AI tools to replace human creative work and the push for individual control over how their work is used.

8/17/2023

AP, other news organizations develop standards for use of artificial intelligence in newsrooms

Main Topic: The Associated Press (AP) has issued guidelines on artificial intelligence (AI) and its use in news content creation, while also encouraging staff members to become familiar with the technology. Key Points: 1. AI cannot be used to create publishable content and images for AP. 2. Material produced by AI should be vetted carefully, just like material from any other news source. 3. AP's Stylebook chapter advises journalists on how to cover AI stories and includes a glossary of AI-related terminology. Note: The article also mentions concerns about AI replacing human jobs, the licensing of AP's archive by OpenAI, and ongoing discussions between AP and its union regarding AI usage in journalism. However, these points are not the main focus and are only briefly mentioned.

8/21/2023

Generative AI Is Scraping Your Data. So, Now What?

### Summary AI models like ChatGPT have advantages in terms of automation and productivity, but they also pose risks to content and data privacy. Content scraping, although beneficial for data aggregation and reducing bias, can be used for malicious purposes. ### Facts - Content scraping, when combined with machine learning, can help reduce news bias and save costs through automation. - However, there are risks associated with content scraping, such as data being sold on the Dark Web or used for fake identities and misinformation. - Scraper bots, including fake "Googlebots," pose a significant threat by evading detection and carrying out malicious activities. - ChatGPT and similar language models are trained on data scraped from the internet, which raises concerns about attribution and copyright issues. - AI innovation is progressing faster than laws and regulations, making scraping activity fall into a gray area. - To prevent AI models from training on your data, blocking the Common Crawler bot is a starting point, but more sophisticated scraping methods exist. - Putting content behind a paywall can prevent scraping but may limit organic views and annoy human readers. - Companies may need to use advanced techniques to detect and block scrapers as developers become more secretive about their crawler identity. - OpenAI and Google could potentially build datasets using search engine scraper bots, making opting out of data collection more difficult. - Companies should decide if they want their data to be scraped and define what is fair game for AI chatbots, while staying vigilant against evolving scraping technology. ### Emoji - 💡 Content scraping has benefits and risks - 🤖 Bot-generated traffic poses threats - 🖇️ Attribution and copyright issues arise from scraping - 🛡️ Companies need to defend against evolving scraping technology

8/21/2023

Generative AI Is Scraping Your Data. So, Now What?

Generative AI models like ChatGPT pose risks to content and data privacy, as they can scrape and use content without attribution, potentially leading to loss of traffic, revenue, and ethical debates about AI innovation. Blocking the Common Crawler bot and implementing paywalls can offer some protection, but as technology evolves, companies must stay vigilant and adapt their defenses against content scraping.

8/23/2023

Media Companies Seek New Rules to Protect Content from AI Training

Major media organizations are calling for new laws to protect their content from being used by AI tools without permission, expressing concerns over unauthorized scraping and the potential for AI to produce false or biased information.

8/24/2023

Here’s how news organisations are using AI in journalism

The Associated Press has released guidance on the use of AI in journalism, stating that while it will continue to experiment with the technology, it will not use it to create publishable content and images, raising questions about the trustworthiness of AI-generated news. Other news organizations have taken different approaches, with some openly embracing AI and even advertising for AI-assisted reporters, while smaller newsrooms with limited resources see AI as an opportunity to produce more local stories.

8/29/2023

Disney, The New York Times and CNN are among a dozen major media companies blocking access to ChatGPT as they wage a cold war on A.I.

Leading news organizations, including CNN, The New York Times, and Reuters, have blocked OpenAI's web crawler, GPTBot, from scanning their content, as they fear the potential impact of the company's artificial intelligence technology on the already struggling news industry. Other media giants, such as Disney, Bloomberg, and The Washington Post, have also taken this defensive measure to safeguard their intellectual property rights and prevent AI models, like ChatGPT, from using their content to train their bots.

8/30/2023

Thorny AI ownership questions have Copyright Office seeking public input

The US Copyright Office has initiated a public comment period to explore the intersection of AI technology and copyright laws, including issues related to copyrighted materials used to train AI models, copyright protection for AI-generated content, liability for infringement, and the impact of AI mimicking human voices or styles. Comments can be submitted until November 15.

8/31/2023

Dezeen's policy on AI

Dezeen, an online architecture and design resource, has outlined its policy on the use of artificial intelligence (AI) in text and image generation, stating that while they embrace new technology, they do not publish stories that use AI-generated text unless it is focused on AI and clearly labeled as such, and they favor publishing human-authored illustrations over AI-generated images.

9/12/2023

Blogger Earns $115K Using AI to Boost Productivity and Grow Audience

Blogger Samantha North uses AI tools to generate ideas and elements of her blogs, but still values the importance of human expertise and experience in creating valuable content for her readers.

9/20/2023

Authors Guild Sues OpenAI Over Copyright Infringement for Using Books to Train ChatGPT

The Authors Guild, representing prominent fiction authors, has filed a lawsuit against OpenAI, alleging copyright infringement and the unauthorized use of their works to train AI models like ChatGPT, which generates summaries and analyses of their novels, interfering with their economic prospects. This case could determine the legality of using copyrighted material to train AI systems.

9/22/2023

Amazon Requires Disclosure of AI-Generated Books Amid Controversy Over ChatGPT Works

Amazon has introduced new guidelines requiring publishers to disclose the use of AI in content submitted to its Kindle Direct Publishing platform, in an effort to curb unauthorized AI-generated books and copyright infringement. Publishers are now required to inform Amazon about AI-generated content, but AI-assisted content does not need to be disclosed. High-profile authors have recently joined a class-action lawsuit against OpenAI, the creator of the AI chatbot, for alleged copyright violations.

10/3/2023

Tech Giants Compete for Data to Train AI Models, Microsoft CEO Says

Big tech firms, including Google and Microsoft, are engaged in a competition to acquire content and data for training AI models, according to Microsoft CEO Satya Nadella, who testified in an antitrust trial against Google and highlighted the race for content among tech firms. Microsoft has committed to assuming copyright liability for users of its AI-powered Copilot, addressing concerns about the use of copyrighted materials in training AI models.

10/20/2023

News Outlets Push for Payment From AI Companies for Using Content to Train ChatGPT

Newspapers and other data owners are demanding payment from AI companies like OpenAI, which have freely used news stories to train their generative AI models, in order to access their content and increase traffic to their websites.

10/20/2023

Big Name Authors Sue OpenAI Alleging AI Models Infringe on Copyrights

A group of prominent authors, including Douglas Preston, John Grisham, and George R.R. Martin, are suing OpenAI for copyright infringement over its AI system, ChatGPT, which they claim used their works without permission or compensation, leading to derivative works that harm the market for their books; the publishing industry is increasingly concerned about the unchecked power of AI-generated content and is pushing for consent, credit, and fair compensation when authors' works are used to train AI models.

10/23/2023

AI Writing Tools Spark Debate at Frankfurt Book Fair

The impact of AI on publishing is causing concerns regarding copyright, the quality of content, and ownership of AI-generated works, although some authors and industry players feel the threat is currently minimal due to the low quality of AI-written books. However, concerns remain about legal issues, such as copyright ownership and AI-generated content in translation.

10/23/2023

AI Writing Tools Flood Amazon With Computer-Generated Books, Raising Concerns Over Quality and Legal Issues

The publishing industry is grappling with concerns about the impact of AI on copyright, as well as the quality and ownership of AI-generated content, although some authors and industry players believe that AI writing still has a long way to go before it can fully replace human authors.

10/23/2023

AI Faces Backlash Over Training Data Scraped from Internet

Writers and artists are filing lawsuits over the use of copyrighted work in training large AI models, raising concerns about data sources and privacy, and the potential for bias in the generated content.

10/24/2023

Writers Debate Ethics of Chatbots Training on Books Without Permission

Special status is being sought by writers to protect their employment from technological progress, as they argue that software creators should obtain permission and pay fees to train AI language models with their work, even when copyright laws are not violated.

Topics

Posted 9/29/2023, 1:36:00 AM

Medium Updates Robots.txt to Block AI Scrapers, Seeks New Protections for Writers' Content

Medium is updating its robots.txt file to block AI training data scrapers like GPTBot from copying its content without permission.
Medium CEO Tony Stubblebine says the default answer for AI companies wanting to use Medium content for training is now "no" without consent.
Blocking scrapers via robots.txt has limitations, so Medium will also send cease and desist letters to crawlers taking content without permission.
Stubblebine says copyright law may not protect writers from AI models trained on and replicating their work.
Medium is open to allowing its content to be used for AI training if protocols for crediting and compensating writers are established.

theregister.com

Relevant topic timeline:

12/5/2022

AI Homework

8/14/2023

The New York Times prohibits AI vendors from devouring its content

8/14/2023

Why the Great AI Backlash Came for a Tiny Startup You’ve Probably Never Heard Of

8/17/2023

AP, other news organizations develop standards for use of artificial intelligence in newsrooms

8/21/2023

Generative AI Is Scraping Your Data. So, Now What?

8/21/2023

Generative AI Is Scraping Your Data. So, Now What?

8/23/2023

Media Companies Seek New Rules to Protect Content from AI Training

8/24/2023

Here’s how news organisations are using AI in journalism

8/29/2023

Disney, The New York Times and CNN are among a dozen major media companies blocking access to ChatGPT as they wage a cold war on A.I.

8/30/2023

Thorny AI ownership questions have Copyright Office seeking public input

8/31/2023

Dezeen's policy on AI

9/12/2023

Blogger Earns $115K Using AI to Boost Productivity and Grow Audience

Blogger Samantha North uses AI tools to generate ideas and elements of her blogs, but still values the importance of human expertise and experience in creating valuable content for her readers.

9/20/2023

Authors Guild Sues OpenAI Over Copyright Infringement for Using Books to Train ChatGPT

9/22/2023

Amazon Requires Disclosure of AI-Generated Books Amid Controversy Over ChatGPT Works

10/3/2023

Tech Giants Compete for Data to Train AI Models, Microsoft CEO Says

10/20/2023

News Outlets Push for Payment From AI Companies for Using Content to Train ChatGPT

10/20/2023

Big Name Authors Sue OpenAI Alleging AI Models Infringe on Copyrights

10/23/2023

AI Writing Tools Spark Debate at Frankfurt Book Fair

10/23/2023

AI Writing Tools Flood Amazon With Computer-Generated Books, Raising Concerns Over Quality and Legal Issues

10/23/2023

AI Faces Backlash Over Training Data Scraped from Internet

10/24/2023

Writers Debate Ethics of Chatbots Training on Books Without Permission