Every executive whose company produces content — publishing, media, design, code, research — is watching the AI copyright battle with a mix of frustration and uncertainty. On one side: the scrapers. On the other: the lawsuits. Two years in, nobody has a framework that works.
A new paper from MIT Sloan and USC Marshall proves why both sides are economically broken — and proposes a market mechanism that solves the problem without destroying either creators or AI progress.
The finding that matters most: strong copyright protection creates something the authors call an “originality penalty.” The more original and valuable your work, the more the AI learns from you, the more its output resembles yours, and the less your future work is worth. The creators who should benefit most from copyright are the ones hurt most by it.
Why Both Sides of the Copyright Debate Are Wrong
Three market failures copyright law cannot fix, regardless of how the courts rule
The curse of precision
The correlated content externality
Three Market Failures Copyright Cannot Fix
The AI copyright debate has been stuck on a single binary question: is training on public data fair use or is it infringement? Every lawsuit, every licensing deal, every legislative hearing comes back to that yes-or-no. This paper proves the binary itself is the wrong frame.
The authors model the AI training data economy as a market with three structural failures that copyright law cannot fix, regardless of how the courts rule. These three failures form a triangle. Copyright treats them as separate legal problems. The paper proves they are a single market design problem.
The originality penalty. When an AI model trains on creative works, the value of future human-created content collapses because the model can now produce close substitutes. The most original creators — the ones whose work teaches the model the most — face the steepest value decline. Strong copyright protection does not help, because the damage is not unauthorized copying — it is market substitution.
The curse of precision. Better AI models produce more homogeneous output. As models improve, they converge on the same optimal answers. This makes all human output less differentiated, which lowers the total value paid to all creators. The paradox: the better AI gets, the harder it is for any creator to claim compensation, because the model’s output looks less and less like any single source.
The correlated content externality. One creator’s licensing decision affects every similar creator’s bargaining power. If the New York Times signs a deal with an AI company, that sets the terms for every other news organization. If one photographer licenses their portfolio, that creates a substitute for every other photographer’s work. Individual creators cannot solve this — their decisions are interdependent, but they negotiate alone.
The Proposed Solution: A Market, Not a Law
The authors propose a data intermediary — think of it as an exchange for AI training data, managed by a neutral market maker. Creators submit their content. AI companies bid for access. The intermediary selects an optimal set of training data that maximizes the value of the model while fairly compensating the creators whose work contributed.
The key innovation is that compensation is not based on individual content sales — which would collapse under the substitution problem — but on a market mechanism that prices the marginal contribution of each creator’s work to the final model’s performance. Creators whose work shifts the model’s capabilities get paid more. Creators whose work the model already absorbed get paid less. The market discovers the price, rather than a court or a regulator.
This mechanism — which the authors call optimal data inclusion — simultaneously solves all three market failures. It eliminates the originality penalty because compensation reflects contribution rather than exclusivity. It overcomes the curse of precision because the market prices differentiation. It resolves the correlated content externality because the intermediary aggregates decisions rather than leaving them to individual creators.
Creators submit content to the intermediary
Publishers, studios, design firms, and research institutions contribute their content portfolios. The market maker catalogs them with provenance and contribution tracking.
AI companies bid for access
Rather than bilateral deals with individual publishers, AI companies compete through the intermediary. The market aggregates demand and discovers the true price of different content types.
Marginal contribution is priced, not exclusivity
Compensation reflects how much each creator’s work actually shifted the model’s capabilities. Highly original work that teaches the model something new earns more. Work the model could already replicate earns less.
All three market failures resolved simultaneously
Optimal data inclusion eliminates the originality penalty, overcomes the curse of precision, and resolves the correlated content externality. The paper proves this with 7 theorems and 41 propositions — a mathematical proof, not a policy argument.
What This Means in Practice
For content-producing organizations — publishers, studios, design firms, research institutions — the originality penalty means that blocking AI crawlers and asserting copyright may be strategically self-defeating. If your content is valuable enough that models learn from it, the copyright protection that gives you short-term leverage may accelerate the long-term substitution effect that reduces your value. The better strategy is to participate in market-based mechanisms that price your contribution, not exclude it.
For AI companies, the paper provides a rigorous economic argument for why voluntary data licensing deals are structurally fragile. A one-time payment to a publisher for access to their archive does not solve the correlated content externality — the next publisher’s deal changes the economics of the first. The solution is not bilateral licensing but multilateral market mechanisms that aggregate supply and demand.
For regulators and policy makers, the paper offers a clear alternative to the binary copyright framework. Instead of deciding whether training is fair use, design a market where the price of training data is discovered through competition rather than litigation.
Key Takeaways for Strategy and Policy Leaders
The copyright binary is economically indefensible
Both “free access” and “strong protection” create worse outcomes than market-based alternatives. Every executive involved in AI policy discussions should shift the conversation from “is it legal?” to “how should training data markets work?”
Know whether your content is increasing or decreasing in AI substitutability
An AI Assessment for companies evaluating content strategy should start here. If a model can produce close substitutes for your highest-value output, the originality penalty applies to you. Participate in compensation mechanisms rather than blocking access.
The Chief AI Officer and General Counsel need a joint data intermediary strategy
If market-based data compensation becomes the regulatory direction, organizations need to prepare their content for participation in these markets. That means cataloging training-relevant content, establishing provenance tracking, and understanding the marginal contribution of different content types to model performance.
Bilateral licensing deals are structurally fragile
A one-time payment to a publisher does not solve the correlated content externality. The next publisher’s deal changes the economics of the first. AI companies that rely on bilateral deals are building on unstable ground. The solution is multilateral market infrastructure.
The window for building market infrastructure is open now
The first organizations to design and deploy data intermediaries — or to adapt their content strategies to anticipate them — will set the terms. Those who wait for the copyright cases to settle will find the market designed without them. The CAIERO-CP™ AI Governance certification covers data rights strategy as a core governance competency.
Thanks to the Researchers
Frequently Asked Questions
What does this mean for a Chief AI Officer?
A Chief AI Officer at a content-producing organization faces a strategic decision this paper makes concrete: blocking AI training access may accelerate your organization’s competitive decline rather than protect it. The CAIO’s role is to help leadership understand the originality penalty and position the organization to participate in compensation markets rather than lose influence over them.
Does this apply to companies that don’t see themselves as content companies?
Yes. Any organization that produces proprietary research, internal documentation, code, or specialized knowledge is a content producer under this framework. The originality penalty applies wherever your organization’s output is distinctive enough that AI models would benefit from training on it. An AI Assessment for companies can help identify where this exposure is highest.
How does Silicon Valley Certification Hub approach AI governance for data rights?
Silicon Valley Certification Hub’s CAIERO-CP™ curriculum covers AI governance including data rights strategy, regulatory compliance mapping, and the emerging landscape of AI training data markets. For organizations assessing their current exposure to AI substitution effects, our AI Assessment for companies includes a structured content strategy review.
Is a data intermediary feasible, or is this purely theoretical?
The paper is a theoretical contribution — it proves the mechanism works mathematically but does not implement it. The practical race is just beginning. Several regulatory bodies are evaluating market-based alternatives to copyright litigation. Organizations that prepare their data portfolios for intermediary participation before regulation arrives will have a structural advantage over those who wait for the cases to settle.
What should strategy and legal teams do this quarter?
Start with a content audit: which of your organization’s outputs are highly original and potentially subject to the originality penalty? Map your current copyright and licensing posture against this audit. Identify whether your highest-value content is differentiated enough to command bilateral deals or whether you should position for intermediary participation. This quarter is not the time to wait for the courts — it is the time to build a strategy that does not depend on them.
Want to know how this applies to your company?
At Silicon Valley Certification Hub, we help you align AI + Strategy. Our team works directly with your directors and teams to assess AI readiness, identify gaps, and build a clear path forward — tailored to your business context.
Book a time with our CEO, Alejandro Cuauhtemoc-Mejia
Silicon Valley Certification Hub | 3000 El Camino Real, Building 4, Palo Alto, CA
0 Comments