BLOG

24 March 2025

Copyright Has No Moat? A Snapshot on the US, EU, and Beyond
A digital collage merging an ornate historical painting of a pioneer scene with images of data and wires. The painting is partially glitched, with sections of digital circuitry and matrix-like grids overtaking the classical imagery. In the scene, pioneers on horseback and on foot appear to be embarking on a journey, but the technological distortion dominates the background, blending history with a sci-fi aesthetic.

Image credits: Hanna Barakat  & Archival Images of AI + AIxDESIGN / https://betterimagesofai.org / https://creativecommons.org/licenses/by/4.0/

Original article here.


In the spring of 2023, a leaked internal document from Google made waves across the tech industry with a stark declaration: “We Have No Moat, And Neither Does OpenAI.” The anonymous Google researcher argued that open-source AI models were rapidly catching up to proprietary ones, threatening the competitive advantages of tech giants. The document’s title referred to the medieval concept of a moat — the defensive barrier surrounding a castle that kept invaders at bay. Although many language models may be more accurately described as having “open weights” rather than fully open source, in today’s technological landscape, the metaphorical moats of proprietary AI seem to be dissipating faster than anyone anticipated.



But there’s another moat whose foundations are being undermined: copyright law. Just as open-source AI development has leaped over the moats surrounding proprietary models, generative AI has exposed the fundamental weakness of our intellectual property framework without yet triggering meaningful reform. In an era where AI systems can ingest, synthesise, and regurgitate creative works at scale, copyright’s protective walls haven’t fallen yet, but they appear increasingly porous.

Training on Pirated Libraries: Quis Custodiet Ipsos Copyright Custodes?


As part of an investigation into the Library Genesis (LibGen) dataset, Alex Reisner reports at The Atlantic how recently unsealed court documents revealed that Meta employees, with apparent approval from Mark Zuckerberg himself, downloaded millions of books from LibGen — a massive repository of pirated books and academic papers — to train their Llama models. According to internal communications, Meta employees acknowledged this carried “medium-high legal risk” but proceeded nonetheless, considering it a necessary step to remain competitive.



They weren’t alone. OpenAI has reportedly used LibGen in the past as well. The Atlantic’s reporting shows the scale of this pirated library: “LibGen is enormous, many times larger than Books3, another pirated book collection (…). It includes many millions of articles from top academic-journal publishers such as Elsevier and Sage Publications.”



Meanwhile, Anna’s Archive (the self-described “largest truly open library in human history”) revealed that numerous AI companies have approached them for high-speed access to their collection of 140 million copyrighted texts. While some US companies reconsidered the move after assessing the legal risks, Chinese firms “enthusiastically embraced” the collection. According to Anna and the team, “most of them are LLM companies, and some are data brokers, who will resell our collection. Most are Chinese, though we’ve also worked with companies from the US, Europe, Russia, South Korea, and Japan.”

The EU’s Attempt to Build a New Moat


Regulatory bodies around the world are scrambling to respond. In the EU, the third draft on General Purpose AI (GPAI) Code of Practice attempts to shore up copyright protections with Commitment I.2, which requires AI providers to “put in place a policy to comply with Union law on copyright and related rights”. The measures include:


1. Drawing up, keeping up to date, and implementing a copyright policy

2. Respecting the rights of copyright holders (by only reproducing and extracting lawfully accessible works)

3. Identifying and complying with copyright reservations


Yet, in the light of Anna’s Archive revelations, these measures feel like building sandcastles against a rising tide. With regard to Meta and Libgen, Paul Keller has noted a blind spot in the Code of Practice, since “using bittorrent [what Meta allegedly did] is something different from web-crawling, and while this example might be extreme there are many other ways to obtain data online that are not web-crawling and thus fall outside of the commitments contained in the Code of Practice”.


The measures in the third version of the Code of Practice aim to balance the protection of copyright holders’ rights with the need for innovation in AI development. Yet, the compromise satisfies few fully. Industry acknowledges some progress (e.g., no downstream filters, optional policy publication) but considers that “some provisions still go beyond the requirements of the AI Act”. In contrast, creators and civil society organisations keep advocating for stronger copyright safeguards. In this regard, while retaining core measures from earlier drafts, the third draft has clearly softened its language. Terms like “best efforts” “reasonable efforts” and “reasonable measures” dominate the commitments, which may offer leeway for providers to sidestep strict compliance. These changes fail to address longstanding issues flagged by rightsholder organisations and could certainly dilute fundamental rights protections.

Is Copyright’s Moat Failing?


Has generative AI disrupted copyright laws to the point of making them obsolete? The evidence suggests we could be approaching that tipping point. When the world’s most valuable AI companies are willing to risk major litigation to access training data, and when shadow libraries openly boast about providing illegal materials to AI developers, we’re clearly in uncharted territory.


The US debate around AI and copyright has largely played out in courtrooms through lawsuits against companies like OpenAI and Meta, but as Pamela Samuelson writes “All but one of the generative AI copyright lawsuits is likely years away from being definitively resolved” (the one that has been resolved is Thomson Reuters Enterprise Centre GMBH v. Ross Intelligence Inc). Europe has taken a more proactive regulatory approach, with Commitment I.2 mentioned above requiring GPAI providers to “put in place a policy to comply with Union law on copyright and related rights,” using “state-of-the-art technologies” to identify and respect copyright reservations. The EU’s approach could potentially create a new international standard that balances innovation with creator rights. However, when AI companies can access and train on massive libraries of copyrighted works via Anna’s Archive with minimal consequences means that, effectively, the horse has bolted long before AI Act rules on general-purpose AI will become effective in August 2025.


Whatever solution emerges, it is clear that maintaining the status quo is untenable. When the Google memo declared “We Have No Moat”, it was unwittingly describing not just AI development but the entire landscape of intellectual property law. As Mark Lemley has put it, foundational models “will require us to fundamentally change how we think about creativity and, as a result, how we approach copyright” law.


The question, therefore, is not whether copyright will change — it’s how quickly it will adapt, and who will shape its evolution. Will it be reformed through deliberate policy choices, or will it simply collapse under the weight of widespread noncompliance and technological circumvention?

The Copyright Arms Race


We’re witnessing an IP arms race playing out in real time. OpenAI has even declared that the race is over if AI training is not considered fair use. Companies seem to face the dilemma of risking copyright infringement or falling behind competitors with fewer scruples. This dynamic creates perverse incentives that undermine the rule of law.


Along the lines of OpenAI, Anna’s Archive team suggests that this has even become a matter of national security: “All power blocs are building artificial super-scientists, super-hackers, and super-militaries. Freedom of information is becoming a matter of survival for these countries — even a matter of national security”. The team, citing a FreakTorrent report, notes that China and Japan have already introduced AI exceptions to their copyright laws, giving their domestic companies cover to train on copyrighted materials. This can create a significant competitive advantage for AI companies operating under these or other jurisdictions in Asia.


Copyright’s moat hasn’t disappeared entirely, but the water level is dropping rapidly. The EU’s attempts to shore up protections represent important defensive measures, but they may not address the fundamental mismatch between copyright’s premises and AI’s capabilities. Rather than merely patching the existing system through regulation, we may need to reimagine intellectual property protections from first principles for the age of machine learning. And the task will require global cooperation, not just regional solutions.