Radiant blue algorithmic flowcharts branching into opposite directions on navy backdrop, white geometric nodes connected by gray lines forming contested pathways, clean data visualization aesthetic with intentional breaks
|

Courts Split Over Generative AI Copyright Claims

When copyright law collides with machine learning, the lawyer’s traditional role as guardian of intellectual property rights transforms into something more complex. Counsel must now navigate a legal landscape where the reproduction right confronts computational necessity, where authorship blurs between human and algorithm, and where billion-dollar settlements signal that fair use doctrine alone cannot resolve the fundamental tension between content creators and AI developers.

From Google Books to ChatGPT

The current wave of AI copyright litigation builds on precedent established a decade ago. In Authors Guild v. Google, the Second Circuit held in 2015 that Google’s mass digitization of copyrighted books for its search index constituted fair use. The court found the purpose “highly transformative” because Google used the works to create a research tool rather than to substitute for the originals. That holding has become both shield and sword in current AI disputes.

AI companies invoke Google Books to argue that training models on copyrighted material is similarly transformative. The process extracts patterns and statistical relationships rather than reproducing expressive content, they contend. Authors and publishers counter that generative AI differs fundamentally from search indexing because the models can produce outputs that compete directly with the original works in the marketplace.

For law firms advising either side, the distinction matters. Discovery in AI cases now extends beyond proving copying to demonstrating the nature and purpose of that copying. Firms representing developers must document that training serves a transformative purpose and produces non-substitutive outputs. Firms representing rightsholders must show that AI-generated content displaces demand for the originals or creates derivative works without authorization.

The challenge for counsel is that training data questions are both legal and technical. What constitutes intermediate copying when a model stores weights and parameters rather than text? How should courts measure market harm when the substitution occurs at the level of information rather than expression? These questions do not yield to traditional IP analysis. Lawyers must work with technical experts to explain model architecture, training methodology, and output generation in terms that satisfy judicial scrutiny.

Discovery requests now reach deep into model development, seeking data sources, version histories, training logs, and filtering protocols. Firms representing AI developers must help preserve records that can prove licensed use, public-domain status, or compliance with opt-out mechanisms. Without such evidence, arguments grounded in fair use or technological necessity weaken quickly under cross-examination. The procedural demands of AI litigation require technology clients to maintain documentation practices that most companies do not yet have in place.

A First Judicial Rejection

While AI companies have invoked Google Books as their shield, the first substantial judicial rejection of that defense came in February 2025. In Thomson Reuters v. Ross Intelligence, Judge Stephanos Bibas of the Third Circuit, sitting by designation in the District of Delaware, ruled that an AI legal research startup’s use of Westlaw headnotes to train its competing product infringed copyright and did not constitute fair use.

The case is particularly significant for legal practitioners because it involves the tools of their trade. Thomson Reuters owns Westlaw, the dominant legal research platform, which includes proprietary headnotes that summarize key points of law from judicial opinions. These headnotes are organized by the Key Number System, a taxonomy that allows lawyers to find related cases quickly. Ross Intelligence sought to build an AI-powered legal search engine that would let users ask questions in natural language and receive relevant case citations.

Ross initially asked to license Westlaw’s content but Thomson Reuters refused because Ross was a direct competitor. Ross then commissioned a third party, LegalEase Solutions, to create training materials called Bulk Memos. These memos closely tracked Westlaw’s headnotes in structure and content. Ross used approximately 25,000 Bulk Memos to train its AI system.

Judge Bibas found that Ross had infringed more than 2,200 Westlaw headnotes and rejected the fair use defense on two key grounds. First, he found the use non-transformative because Ross created a competing legal research tool rather than something with a fundamentally different purpose. Second, he found market harm because Ross’s product would compete directly with Westlaw and undermine Thomson Reuters’s ability to license its headnotes for AI training.

The decision creates a stark contrast with Google Books. Where the Second Circuit found Google’s search index transformative because it served research rather than reading purposes, Judge Bibas found Ross’s use non-transformative because it served the same legal research purpose as Westlaw itself. The distinction matters for counsel: AI training may be fair use when it creates a different type of tool, but not when it replicates the original work’s core function in a competing product.

For law firms advising AI developers, Thomson Reuters stands as a warning that fair use analysis turns heavily on competitive positioning. Training data sourced from a direct competitor’s proprietary content presents far greater risk than training on publicly available material. The case also demonstrates that courts will examine whether developers sought licenses before resorting to unlicensed copying. Ross’s rejected license request became evidence of bad faith.

Ross Intelligence has appealed to the Third Circuit, which accepted the case for interlocutory review in June 2025. The appeal argues that Judge Bibas misapplied both the originality standard and the fair use factors. The Third Circuit’s decision will be the first appellate ruling on fair use in AI training and will carry substantial weight across jurisdictions.

Current Litigation: A Fractured Landscape

The New York Times lawsuit against OpenAI and Microsoft, allowed to proceed in March 2025, has become a bellwether for the publishing industry. Judge Sidney Stein of the Southern District of New York denied most of OpenAI’s motions to dismiss, permitting claims for direct copyright infringement, contributory infringement, and trademark dilution to advance. The case tests whether training on copyrighted material constitutes fair use and whether AI-generated outputs create market substitution.

Courts are reaching divergent conclusions on the fair use question. In June 2025, Judge Vince Chhabria of the Northern District of California ruled in favor of Meta in Kadrey v. Meta, a case brought by authors including Sarah Silverman and Ta-Nehisi Coates challenging Meta’s use of their books to train its LLaMA model. Judge Chhabria focused on market harm, finding that the plaintiffs failed to prove Meta’s training affected demand for their original works. Notably, he emphasized that his ruling applied only to the thirteen plaintiffs and explicitly stated it did not establish that Meta’s conduct was lawful, inviting further litigation.

This creates a procedural puzzle for practitioners. The Meta decision turned on the plaintiffs’ failure to present sufficient evidence of market harm rather than a determination that AI training categorically qualifies as fair use. Counsel representing rightsholders must now prepare more robust economic analyses showing how AI outputs displace sales, subscriptions, or licensing revenue. Expert testimony on market substitution has become essential.

Visual artists have mounted parallel challenges. In Andersen v. Stability AI, filed in January 2023, artists alleged that Stability AI, Midjourney, and DeviantArt scraped billions of copyrighted images to train AI image generators including Stable Diffusion. The case remains active and presents questions about whether visual works receive different fair use treatment than textual works. Courts have traditionally afforded stronger protection to highly creative visual art than to factual compilations.

In the visual arts sector, Getty Images pursued claims against Stability AI in the United Kingdom. The case concluded in late 2025 with Getty dropping its main copyright infringement claims due to evidential challenges. Getty won a narrow trademark claim but lost on secondary copyright infringement. The outcome highlights the difficulty rightsholders face in proving jurisdictional training activities and the importance of record-keeping by AI developers.

The entertainment industry has mounted its own challenge. Warner Bros. sued Midjourney in September 2025, joining earlier actions by Disney and Universal against the same AI image generator. The studios allege that Midjourney enables subscribers to create infringing images and videos of copyrighted characters including Superman, Batman, and Bugs Bunny. These cases center on output infringement rather than training data acquisition, presenting a different theory of liability that courts have yet to resolve.

The largest settlement to date came in September 2025, when Anthropic agreed to pay authors $1.5 billion to resolve claims that it downloaded millions of pirated books from shadow libraries to train its Claude chatbot. The settlement amounts to approximately $3,000 per work and includes destruction of the disputed datasets. Significantly, Judge William Alsup had ruled in June 2025 that Anthropic’s training on legitimately obtained books constituted fair use, but that downloading from pirate sites potentially created liability. The settlement resolved only the piracy claims, leaving the fair use holding intact as persuasive authority.

For counsel, the divergent outcomes suggest that fair use may depend less on the act of training itself and more on how the copyrighted works were obtained, whether the AI developer sought licenses, whether the output competes with the original, and whether plaintiffs can demonstrate concrete market harm. These fact-intensive inquiries make early resolution unlikely and increase the importance of discovery strategy.

Advising Clients on Authorship and Ownership

Once a model produces new content, the question shifts from input to authorship. The U.S. Copyright Office has stated that works generated entirely by AI are not registrable and that applicants must disclose any nonhuman contribution. This guidance adds a layer of complexity for counsel drafting agreements in publishing, advertising, or software development. Clients want assurance that what they produce can be owned, registered, and defended.

Contracts should now include statements about human authorship, disclosure of AI assistance, and assignment of ownership for edited outputs. For law firms, this mirrors traditional intellectual property diligence but extends into technology oversight. The safest practice is to ensure that human creative control remains identifiable and documented throughout the process.

Licensing as Risk Management

Not all AI companies have chosen litigation as their path. OpenAI has signed licensing agreements with multiple publishers including the Associated Press, Axel Springer, the Financial Times, and the Washington Post. These arrangements permit model training on archived content and real-time display of summaries with attribution in ChatGPT responses. Publishers receive compensation and technology access in return.

For counsel advising AI companies, licensing offers a proactive alternative to litigation risk. For firms representing content owners, licensing agreements present an opportunity to negotiate terms that protect client interests while participating in the AI economy. Deal structures vary widely, from one-time archival access fees to ongoing revenue shares tied to usage metrics.

Law firms should guide clients through key licensing terms including scope of use, attribution requirements, indemnification provisions, audit rights, and termination clauses. The licensing landscape remains fluid, and early agreements may include renegotiation provisions if competitors secure more favorable terms.

Professional Duties in the AI Era

The duty of competence under Model Rule 1.1 and the obligation to protect confidentiality under Model Rule 1.6 now apply to AI use. Lawyers who rely on public generative platforms must understand how client information is handled. Uploading privileged material into a model that retains user inputs can create ethical exposure and data-security risk. The American Bar Association’s Task Force on Law and Artificial Intelligence has urged lawyers to maintain technological literacy and to verify how third-party systems process information.

Several state bar associations are drafting opinions on the ethical boundaries of generative tools. The shared message is that legal judgment cannot be delegated to algorithms and that due diligence must extend to the technology a lawyer uses. As AI becomes embedded in research and drafting, ethical compliance will depend on transparency and documented human oversight.

Due Diligence and Secondary Liability

Recent litigation involving generative platforms has revived questions about secondary liability. Courts continue to apply reasoning from Sony v. Universal and MGM Studios v. Grokster, which distinguish between tools capable of substantial noninfringing use and those designed to promote infringement. Plaintiffs in ongoing cases are pressing courts to extend those standards to models trained on copyrighted works.

For lawyers counseling clients in technology, media, or entertainment, vendor contracts deserve closer review. Indemnity clauses should specify who is responsible for infringing outputs and under what circumstances. Firms should also assist clients in establishing audit trails that show data provenance and licensing status. Documented oversight may become the most persuasive defense when liability questions arise.

The Push for Transparency

Transparency requirements are advancing faster in Europe than in the United States. The European Union AI Act obliges developers of general-purpose models to publish summaries of their training data. The goal is to help rightsholders identify whether protected works were used. Although the Act does not require disclosure of entire datasets, it sets a standard for traceability that will influence other jurisdictions.

The World Intellectual Property Organization has begun a series of discussions among its member states on copyright and artificial intelligence. These talks focus on how existing international treaties might apply to machine-generated works and to the use of copyrighted materials in model training. Delegates are also examining whether a new global framework is needed to harmonize national approaches. For cross-border counsel, following WIPO’s work offers early insight into how future rules on authorship, data use, and transparency may develop worldwide.

Several U.S. states are considering parallel bills on AI transparency and disclosure. California introduced AB 412 in February 2025, which would require developers to disclose copyrighted materials used in training and provide a mechanism for copyright holders to verify whether their works appear in datasets. Even without federal adoption, these measures signal a policy direction that lawyers should monitor for clients with cross-border operations.

Practical Guidance for Counsel

Lawyers should advise clients to keep detailed records of training data, licensing agreements, and filtering methods. Vendors offering AI tools should verify the legality of their datasets and provide contractual indemnities. Firms should encourage written disclosure of AI assistance in creative or analytical work and require human review before publication. These measures reduce legal exposure and demonstrate good faith compliance.

Generative AI has made provenance as important as originality. For lawyers, this means that documentation is no longer optional. As courts define the boundaries of fair use in machine learning, professional competence will depend on evidence that clients know how their models learn, what they output, and where the data came from.

Sources


This article was prepared for educational and informational purposes only. It does not constitute legal advice and should not be relied upon as such. All cases, statutes, and sources cited are publicly available through official publications and reputable outlets. Readers should consult professional counsel for specific legal or compliance questions related to AI use.

See also: Beyond Human Hands: The Uncertain Copyright of AI-Drafted Law

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *