Who Owns AI Training Data? The Copyright Fight Continues

Who Owns AI Training Data? The Copyright Fight Continues

AI Companies Say Fair Use. Creators Say Pay Us. The Fight Is Not Over.

The question of who owns AI training data is one of the most contentious issues in tech. AI companies argue that scraping publicly available content to train models falls under fair use. Creators, publishers, and their advocates disagree. Patreon CEO Jack Conte called the fair use argument “bogus” in March 2026, saying creators should be paid when their work trains AI models. A publisher pulled a horror novel over AI concerns. And the Trump administration’s AI framework uses “fair use” language that favors the companies doing the training. The copyright fight around AI is intensifying, not settling.

Where the Copyright Battle Stands

  • Patreon CEO Jack Conte called AI companies’ fair use defense “bogus”
  • A publisher pulled the horror novel “Shy Girl” over concerns about AI involvement
  • Trump’s AI framework uses “fair use” language that mirrors AI company defenses
  • Anthropic settled a $1.5 billion copyright case with writers in September 2025
  • A growing number of lawsuits target AI companies over their training data practices

The Fair Use Argument and Its Limits

Fair use is a legal doctrine that allows limited use of copyrighted material without permission for purposes like criticism, commentary, and research. AI companies argue that training models on copyrighted text is transformative enough to qualify. The models do not reproduce the original works. They learn patterns from them and generate new text.

Creators see it differently. When an AI model trained on your writing can produce text that competes with your writing, the “transformative” argument feels thin. If the model never needed your work, there would be no reason to train on it. The value flows from creator to company, and nothing flows back.

Patreon CEO Jack Conte said creators should be paid when their work is used to train AI models, calling the fair use argument “bogus” and pushing for direct creator compensation.

The Publishing Industry Takes a Stand

The decision by a publisher to pull the horror novel “Shy Girl” over AI concerns signals that the publishing industry is increasingly wary. Whether the concern was about AI-generated content, AI-assisted writing, or training data sourced from copyrighted books, the result is the same: publishers are treating AI involvement as a risk factor that can stop a book from going to press.

This has a chilling effect on authors who use AI tools in any part of their workflow, even for brainstorming or editing. The line between “AI-assisted” and “AI-generated” is blurry, and publishers are erring on the side of caution.

Where Federal Policy Falls

The Trump administration’s AI framework attempts a middle ground. It says Congress should “protect the rights of creators” while also recognizing fair use for AI training. In practice, this means the federal government is unlikely to impose strict requirements on AI companies to license training data.

For creators, this is a problem. Without legislative mandates for compensation or consent, the only recourse is litigation. Lawsuits are expensive, slow, and uncertain. Anthropic’s $1.5 billion copyright settlement from September 2025 was notable, but not every creator has the resources to sue a well-funded AI lab.

What Would Fair Compensation Look Like

Some proposals include collective licensing models, where AI companies pay into a pool distributed to creators whose work was used in training. Others suggest opt-in systems where creators choose to license their work and receive royalties based on usage. Neither approach has gained enough traction to become industry standard.

Until the courts or Congress settle the question, the copyright fight will continue to simmer. AI companies will keep training on available data. Creators will keep pushing back. And the value of the work that feeds these models will remain the central point of conflict.