Policy Bearish 7

Merriam-Webster and Britannica Sue OpenAI Over AI Training 'Theft'

· 3 min read · Verified by 2 sources ·
Share

Key Takeaways

  • Merriam-Webster and Britannica have filed a lawsuit against OpenAI, alleging that ChatGPT was trained on their proprietary reference material without authorization.
  • The plaintiffs claim the AI system 'cannibalizes' their web traffic by providing direct answers that bypass the need for their traditional digital platforms.

Mentioned

Merriam-Webster company Britannica company OpenAI company ChatGPT product AI technology

Key Intelligence

Key Facts

  1. 1Merriam-Webster and Britannica filed a joint lawsuit against OpenAI on March 17, 2026.
  2. 2The plaintiffs allege OpenAI used their proprietary dictionary and encyclopedia content to train ChatGPT without permission.
  3. 3A core claim of the lawsuit is 'traffic cannibalization,' where AI answers prevent users from visiting the publishers' websites.
  4. 4The lawsuit seeks unspecified damages and a permanent injunction against the unauthorized use of their data.
  5. 5This case follows similar high-profile legal actions from The New York Times and the Authors Guild.

Who's Affected

OpenAI
companyNegative
Merriam-Webster
companyPositive
AI Startups
technologyNegative
AI Legal Risk Outlook

Analysis

The legal landscape for generative AI has shifted into a critical new phase as Merriam-Webster and Britannica, two of the world’s most venerable reference publishers, filed a joint lawsuit against OpenAI. The litigation alleges that the San Francisco-based AI giant "stole" proprietary material to train its ChatGPT models, effectively creating a competing product that directly undermines the plaintiffs' primary revenue streams. This case represents a significant escalation in the ongoing conflict between traditional content owners and the builders of large language models (LLMs), moving beyond creative prose into the realm of foundational, authoritative data.

At the heart of the complaint is the concept of "traffic cannibalization." Merriam-Webster and Britannica argue that by ingesting their entire corpora of definitions, etymologies, and encyclopedic entries, OpenAI has built a system that provides users with the same utility without the need to visit the original websites. For reference publishers who rely heavily on ad revenue driven by search engine traffic, this represents an existential threat. If a user asks ChatGPT for a complex definition or a historical summary, and the AI provides it using data derived from these sources, the original publisher loses the opportunity to monetize that inquiry. This "zero-click" reality is a growing concern for the entire digital publishing industry, which has already been reeling from changes in search engine algorithms.

The legal landscape for generative AI has shifted into a critical new phase as Merriam-Webster and Britannica, two of the world’s most venerable reference publishers, filed a joint lawsuit against OpenAI.

This lawsuit follows a precedent set by The New York Times and various authors' guilds, but it carries unique weight because of the nature of the content involved. Dictionaries and encyclopedias are often viewed as collections of facts, which have traditionally been harder to copyright than creative fiction. However, the specific arrangement, selection, and expression of those facts—the "editorial craft"—is legally protected. OpenAI is likely to lean on the "fair use" defense, arguing that its training process creates a transformative new tool rather than a derivative work. Yet, the plaintiffs contend that ChatGPT’s output is often a near-verbatim or highly derivative reflection of their proprietary databases, which they have spent centuries refining and verifying.

What to Watch

For the venture capital community and the broader AI startup ecosystem, this litigation adds to a mounting pile of "legal debt" facing foundational model providers. If the courts rule in favor of the publishers, it could force a radical restructuring of the AI industry’s cost basis. Startups may be required to negotiate expensive licensing deals for high-quality training data, potentially creating a "moat" for well-funded incumbents like OpenAI and Google while pricing out smaller players who cannot afford such overhead. Conversely, a victory for OpenAI would solidify the "fair use" doctrine for AI training, providing a green light for continued aggressive data scraping across the open web.

Investors should watch for whether this case moves toward a settlement similar to those OpenAI has reached with other media outlets like Axel Springer or News Corp. A settlement would signal that OpenAI is willing to pay for "authoritative" data to ensure its models remain accurate and legally compliant. However, if the case goes to trial, it will serve as a bellwether for the future of intellectual property in the age of synthetic intelligence. The outcome will determine whether the "knowledge graph" of the internet remains a public-ish good for AI training or becomes a strictly gated marketplace of licensed assets. As LLMs increasingly move toward providing factual answers rather than just creative assistance, the value of verified, high-fidelity data sources like Merriam-Webster has never been higher, nor has their business model been more vulnerable.

How we covered this story

Every story in our startup coverage is assembled from multiple primary sources, cross-referenced for factual consistency, and scored along three independent dimensions: sentiment, operational impact, and source-cluster confidence. Single-source rumors and unverifiable claims do not pass our editorial gate. When a story shows "Verified by N sources" with N≥2, the development is independently corroborated; when N=1, we mark it explicitly so readers can weigh the signal accordingly.

Impact scoring uses a 1-10 scale weighted toward regulatory, financial, and operational consequence rather than coverage volume. A topic that runs in every outlet but moves no real decisions ranks lower than a niche regulatory filing that reshapes how operators in the startup space have to behave. Read our full methodology for the scoring rubric, our glossary for term definitions, and our trends index for the longitudinal view across the beat.