Danish national encyclopedia accuses ChatGPT of “the largest theft in world history”
Danmarks Nationalleksikon, the public-funded Danish online encyclopedia, has accused OpenAI of systematically exploiting its content to train ChatGPT while attributing the source without permission, calling it an unprecedented violation of intellectual property rights.
In an interview with Danish broadcaster DR, Ole Kaag Mølgaard, head of secretariat for the encyclopedia—known as Lex—stated that ChatGPT extracts around 8 million articles per month from its database of 250,000 entries, all written by over 4,000 researchers and subject experts. “In the old world, this would be a massive breach of copyright law warranting legal action,” he said. “We are witnessing the largest theft in world history.”
Lex is now developing its own Danish-language chatbot as a direct response, aiming to provide users with verified, researcher-backed information. The project, launched in collaboration with Aarhus University’s Center for Humanities Computing, will initially run as a three-year research initiative with user testing. “We must find new ways to ensure society has access to trustworthy knowledge,” Mølgaard said, framing the chatbot as a “safe haven of reliable information” amid what he describes as an “information crisis.”
The move follows similar legal challenges against AI developers. In September 2023, Encyclopedia Britannica and Merriam-Webster sued Perplexity AI for allegedly scraping and reproducing their content without consent. Earlier this year, Danish media coalition DPCMO also filed a lawsuit against OpenAI over unauthorized use of copyrighted news material.
Lex has formed a partnership with eight major Danish institutions—including the National Museum, Royal Library, and public broadcaster DR—to counter what they describe as systemic misuse of their data by AI models. “This crisis has united institutions that rarely collaborate,” Mølgaard noted. “It may be one positive outcome of the AI challenge we face.”
OpenAI has previously defended its practices, stating in response to the Britannica lawsuit that its language models rely on “publicly available data” under fair use principles, contributing to “human creativity, scientific discovery, and medical research.”
Lex, funded by government grants and private foundations, comprises 14 reference works, including Den Store Danske and Trap Danmark, covering over 245,000 articles. Nearly 60% of Danes use the platform annually, according to Mølgaard.