Dictionary Giant Merriam-Webster Sues OpenAI for Using 100,000 Entries Without Permission

Dictionary Giant Merriam-Webster Sues OpenAI for Using 100,000 Entries Without Permission

2026-03-19 data

Boston, Thursday, 19 March 2026.
America’s most trusted dictionary publisher claims ChatGPT reproduces their definitions word-for-word, cannibalizing website traffic and revenue streams that fund editorial work.

Encyclopedia Britannica and its subsidiary Merriam-Webster filed the lawsuit on Friday, March 15, 2026, in Manhattan federal court, marking the latest escalation in copyright disputes between traditional publishers and artificial intelligence companies [1][2]. The complaint alleges that OpenAI copied more than 100,000 articles, encyclopedia entries, and dictionary entries from online sources to train ChatGPT without authorization [1][3]. The lawsuit claims that OpenAI violates copyright in three distinct ways: large-scale copying of protected material, using that content to train its AI systems, and generating outputs that resemble the original content [1].

Evidence of Verbatim Reproduction

The publishers present compelling evidence of direct copying in their legal filing. ChatGPT’s responses often contain “verbatim or near-verbatim reproductions” of information from the dictionary’s content, according to the lawsuit [1]. One striking example cited in the complaint shows ChatGPT reproducing an identical definition of “plagiarize” from the Merriam-Webster dictionary [3]. Additionally, the AI system reportedly reproduced a specific selection and ordering of quotes from a copyrighted Britannica article about the Hamilton-Burr duel [3]. According to Britannica’s allegations, “GPT-4 itself has ‘memorized’ much of Britannica’s copyrighted content and will output near-verbatim copies of significant portions on demand” [6].

Economic Impact on Traditional Publishers

The core of the publishers’ economic argument centers on traffic cannibalization and lost revenue streams. The lawsuit states that “Defendants’ ChatGPT-based AI products free ride on Plaintiffs’ trusted, high-quality content… by cannibalizing traffic to Defendants’ websites with AI-generated summaries of Plaintiffs’ own content” [1][5]. Rather than directing users to the original publisher websites as traditional search engines would, ChatGPT provides summaries that “substitute, and directly compete with, the content from publishers like [Britannica]” [2][6]. This practice threatens the fundamental business model that supports editorial work, as the plaintiffs argue that “OpenAI imperils the very market for the high-quality content that it copies and reproduces” [3]. The publishers attempted to negotiate a licensing agreement with OpenAI in November 2024, but the company refused their overtures [3].

Broader Implications for AI Industry

This lawsuit represents part of a growing trend of copyright litigation against AI companies, with significant financial precedents already established. The New York Times has made similar claims in its ongoing lawsuit against OpenAI, and more than a dozen newspapers across the U.S. and Canada have also sued the company [2]. In September 2025, Anthropic settled a class action lawsuit for using copyrighted books to train its AI models, resulting in a $1.5 billion payout to authors [6]. Encyclopedia Britannica has also filed a separate lawsuit against AI search company Perplexity, which remains pending [2][4]. OpenAI, currently valued at $730 billion, has defended its practices through a spokesperson who stated that “Our models empower innovation, and are trained on publicly available data and grounded in fair use” [3][4]. The publishers are seeking unspecified financial damages and a permanent court injunction to prevent OpenAI from continuing the alleged practices [1][5].

Bronnen


copyright litigation AI training data