Legal grey zone deepens as AI training on copyrighted work sparks controversy

The evolving legal battles surrounding AI models’ use of proprietary material highlight an industry operating in a regulatory limbo, with companies pushing boundaries amidst court disputes and controversial findings on data reproduction.

The question of how far “fair use” can stretch is now at the centre of the AI copyright fight. As large language models have moved from research labs into everyday use, one of the earliest and most persistent accusations against companies such as OpenAI and Anthropic has been that they built profitable systems on the back of other people’s creative work. Their defenders argue that training on books, articles and other protected material is transformative and therefore lawful; critics say the output can compete directly with the original works and, in some cases, reproduce them far too closely.

That legal argument has already produced mixed results in the United States. In June 2025, a federal judge in San Francisco ruled that Anthropic’s use of books to train Claude without permission fell within fair use, comparing the process to a reader learning from existing writing in order to create something new. But the same ruling also found that Anthropic’s storage of pirated books in a central library amounted to copyright infringement, leaving the company exposed to a separate damages trial. Reporting at the time suggested the decision strengthened the case for training itself, while still drawing a clear line around how the material was obtained.

The dispute is made more complex by the behaviour of the models themselves. Research reported in early 2026 by Stanford and Yale found that leading systems, including OpenAI’s GPT, Anthropic’s Claude, Google’s Gemini and xAI’s Grok, can reproduce substantial stretches of copyrighted material almost verbatim. That finding undermines the repeated claim by AI firms that their models do not retain training data in a way that resembles copying. For authors and publishers, the worry is not only that their work may have been used without consent, but that the systems may be capable of generating close substitutes that compete with the originals.

Anthropic’s own recent troubles underline how unsettled the sector remains. The company said it had identified more than 24,000 fraudulent accounts involved in large-scale “distillation” attacks on Claude, with activity linked to Chinese AI firms including DeepSeek, Moonshot AI and MiniMax. Separately, it agreed to a $1.5 billion settlement in a class-action case over the use of pirated books in training its models, in what is being described as a landmark payout. The company has also said it does not, by default, use customer prompts and responses to train its systems unless users opt in, reflecting the wider pressure on AI developers to prove they can use data more responsibly than their critics allege.

What emerges is an industry still operating in a legal grey zone, even as courts begin to define its boundaries. For now, AI companies appear willing to push copyright law to its limits first and negotiate later, often only after being challenged in court. That sequence has sharpened the sense among writers, artists and publishers that the technology sector spent years treating protected material as a free resource, before turning to settlements and safety measures once the legal risks became impossible to ignore.

Source Reference Map

Inspired by headline at: ^[1]

Sources by paragraph:

Source: Noah Wire Services

Noah Fact Check Pro

The draft above was created using the information available at the time the story first
emerged. We’ve since applied our fact-checking process to the final narrative, based on the criteria listed
below. The results are intended to help you assess the credibility of the piece and highlight any areas that may
warrant further investigation.

Freshness check

Score:
7

Notes:
The article references events from June 2025, including a federal judge’s ruling on Anthropic’s use of books for AI training and the identification of fraudulent accounts targeting Claude. These events are well-documented in multiple reputable sources. However, the article was published on April 30, 2026, indicating a delay of nearly 11 months in reporting. This significant lag raises concerns about the freshness of the information presented. Additionally, the article appears to be a translation of content from other sources, which may have led to the delay. The lack of new insights or developments since the original events further diminishes the article’s timeliness. Given these factors, the freshness score is reduced.

Quotes check

Score:
6

Notes:
The article includes direct quotes attributed to various sources. However, upon searching for the earliest known usage of these quotes, it appears that they have been used in earlier publications, suggesting potential reuse of content. The lack of independently verifiable sources for some quotes raises concerns about their authenticity. Without access to the original sources or direct verification, the credibility of these quotes cannot be fully confirmed. Therefore, the quotes check score is moderate.

Source reliability

Score:
5

Notes:
The article originates from Capital.bg, a Bulgarian news outlet. While it may be reputable within its niche, its international reach and recognition are limited. The article relies heavily on secondary sources, including press releases and reports from other media outlets, which may have their own biases or inaccuracies. The lack of original reporting or firsthand accounts diminishes the overall reliability of the source. Given these factors, the source reliability score is moderate.

Plausibility check

Score:
7

Notes:
The events described in the article, such as the legal ruling on Anthropic’s use of books for AI training and the identification of fraudulent accounts targeting Claude, are plausible and have been reported by multiple reputable sources. However, the significant delay in reporting and the reliance on secondary sources without new insights or developments since the original events raise questions about the article’s originality and depth. The lack of specific factual anchors, such as names, institutions, and dates, further diminishes the article’s credibility. Therefore, while the claims are plausible, the overall plausibility score is moderate.

Overall assessment

Verdict (FAIL, OPEN, PASS): FAIL

Confidence (LOW, MEDIUM, HIGH): MEDIUM

Summary:
The article presents information on events from June 2025, including a federal judge’s ruling on Anthropic’s use of books for AI training and the identification of fraudulent accounts targeting Claude. However, the article was published on April 30, 2026, indicating a significant delay in reporting. The content appears to be recycled from other sources, with a lack of original reporting or new insights. The reliance on secondary sources and the inability to independently verify some quotes further diminish the article’s credibility. Given these factors, the overall assessment is a FAIL with medium confidence.