The Copyright Problem of AI, According to ChatGPT
As the generative AI arms race grows, many have opined on what a future AI-based world will look like, both in terms of risks, opportunities, and other practical effects. Will AI make the world less human? Will it harm or improve our relationships? Should technology be able to make decisions about life and death? Or, you know, could AI destroy the world?
Less attention is paid (at least outside legal publications) to the ill-defined legal foundation on which this arms race lies. Indeed, the question is still very open whether the use of copyrighted works to train generative AI infringes the copyright of those works.
As of the writing of this article, at least 40 federal lawsuits involving copyright claims related to AI have been filed since June 2019, and the number is growing quickly. Most suits have been brought by authors, publishers, artists, or other creators alleging that use of their content in connection with AI infringes their copyright rights, and at least two cases have focused on the human authorship requirement of copyright law, addressing whether a work of art (e.g., a painting) created by AI is protected by copyright given that its authorship was not primarily human. This article focuses on the issue of possible infringement.
In terms of answers, the normally sleepy U.S. Copyright Office has been buzzing with activity, including its hosting a series of listening sessions in the Spring of 2023 on the use of artificial intelligence to generate works in creative fields, and starting a series of reports on legal and policy issues related to artificial intelligence and copyright, the first such report focused on digital replicas[1] and the second focused on copyrightability of works generated by AI.[2] But these efforts have focused on how existing legal frameworks may be used to address these complicated legal questions and have not provided a clear answer to whether unauthorized use of copyrighted material to train AI constitutes infringement.
As for case law, some courts have provided clarity that AI outputs must be exact copies of or substantially similar to purportedly infringed works to sustain a claim that the AI-generated output itself constitutes copyright infringement (e.g., Kadrey v. Meta Platforms, Andersen v. Stability AI, both from the Northern District of California). But there is still very little clarity on whether use of copyrighted material as inputs to train AI constitutes infringement.
In one recent case in the United States District Court for the District of Delaware, Thomson Reuters Enterprise Centre GmbH & West Publishing Corp. v. ROSS Intelligence Inc., a federal judge ruled that ROSS (an AI developer)’s use of copyrighted material to train its AI-based legal research tool was not fair use and therefore was copyright infringement as a matter of law.
Admittedly, the fair use analysis that was at the heart of this case (and that will be for most if not all such copyright infringement cases involving AI) is highly fact-dependent. Such analysis is based on Section 107 of the U.S. Copyright Act, which considers the following four factors:
The purpose and character of the use, including whether such use is of a commercial nature or is for nonprofit educational purposes;
The nature of the copyrighted work;
The amount and substantiality of the portion used in relation to the copyrighted work as a whole; and
The effect of the use upon the potential market for or value of the copyrighted work.
17 U.S. Code § 107
Since each case will be analyzed differently under fair use, it is not possible to categorically answer the question of whether training AI on copyrighted material is copyright infringement. But, just as the Thomson Reuters case above does not likely mean AI tools always are copyright infringement, so too the holding in Authors Guild, Inc. v. Google Inc., 804 F.3d 202 (2nd Cir. 2015), which is sometimes cited to support that AI tools are defensible fair use, does not dictate that will always be the case[3]. It will depend on the facts.[4] And in any event, it is critical to take note that fair use is a defense to copyright infringement, meaning the burden is on technology companies to prove their use is fair in order to overcome claims of infringement.
All that to say, as AI tools and adoption continue to grow, there is still little clarity on these important legal questions. So, one might well ask what AI itself thinks of this important question. I did. I asked ChatGPT this question multiple times over the last couple years, and its answers have been illuminating.
In the Spring of 2023, as ChatGPT was first gaining prominence among the masses, my immediate instinct as an intellectual property lawyer was that AI technology may infringe IP rights. That is when I first approached ChatGPT about this question. Indeed, it was my very first topic of conversation with any AI model.
At that point, I did not understand the technology and had given very little consideration to the legal complexities, so I instinctually assumed that the use of copyrighted material to train ChatGPT would have been encompassed by the exclusive rights granted to a copyright holder as set forth in 17 U.S. Code § 106. (Some argue that it does not.[5]). Thus, my question focused on who owned the source content and not whether its use was infringement. The answer was insightful on nonetheless.
ChatGPT dodged my primary question by stating that its source content came from “a variety of sources,” including publicly available sources and licensed content. More interestingly, ChatGPT plainly stated that the “content generated by ChatGPT belongs to the user who inputs the prompt and generates the output.”
April 23, 2023:
Really? That user need not concern himself or herself at all with what the owner of the source content might think about that? Or better yet, ChatGPT has the rights necessary to grant that user those rights? And, regarding the content that OpenAI used from various providers, while there is mention of licenses from “various providers,” it is interesting to note that nothing is mentioned of the legal basis for using the content not licensed from copyright holders. In any event, when I asked the same question six months later, on September 26, 2023, I got a different response.
ChatGPT again noted that its source content was from publicly available information, and like in the first response, did not limit its answer to only publicly available sources. But this time, it omitted that any content had been licensed, perhaps understanding that fact amounted to a concession that a license may be required in some cases, thereby undermining the argument made by some that, as a rule, “[t]raining a machine learning model with [ ] copyrighted data does not infringe” the exclusive rights of the copyright holder.[6]
Moreover, regarding ownership of the outputs, much like the season, ChatGPT’s answer had changed. In Spring 2023, I was told I owned the output as the user, but in Fall 2023, ChatGPT clarified that it owned the model and punted on the output ownership question, instead pointing back to the fact that “responses are generated based on patterns and information . . . , which comes from various sources and [do] not have a single identifiable owner.”
September 26, 2023
But this response also lasted for only a season. In March 2024, when I asked the very same question, for the third time, ChatGPT provided the third substantively distinct response, this time claiming that OpenAI, the organization behind the development of the GPT models, owned the content generated by ChatGPT.
Regarding the real question I was asking—about source content—ChatGPT was totally silent this time but at least remembered to inform me that I was licensing to OpenAI any content I input into ChatGPT as outlined in its privacy policy and terms of use. Because, you know, they can do whatever they want with it.
March 12, 2024:
Finally, in February 2025, ChatGPT provided yet another answer, seeming to most closely align with the current state of legal prognosticators and to include the appropriate amount of legal hedging.
That is, ChatGPT uses some licensed data and some publicly available texts for training, but it “does not directly copy or store proprietary or copyrighted materials unless they have been licensed,” recognizing that such storage or copying could have legal implications. But really? The presumption of this statement seems to be that ChatGPT can freely use copyrighted material for its commercial purpose of training its AI model irrespective of a license and need only obtain such a license if it takes the additional step to “copy or store” the materials. Talk about missing the forest for the trees.
And, of course, ChatGPT warns that you watch out. Anything that ChatGPT produces is owned by you and to be used at your peril because it is your responsibility to ensure it “doesn’t infringe on third-party rights.”
February 21, 2025:
So again, the legal foundation surrounding AI is still quite unresolved and, as evidenced by AI itself, evolving, if not altogether murky.
Yet, in a rapacious quest to be first and conquer the competition, technology companies and those adopting technology tools may be blowing through these legally significant questions about ownership and rights.
More caution is warranted, both for technology companies creating AI tools and companies and individuals using them.
If you have any questions about this article or need counsel on the issues addressed in it, please contact Ryan June by emailing ryan@ch-llp.com.
[1] See Part I, available at https://copyright.gov/ai/Copyright-and-Artificial-Intelligence-Part-1-Digital-Replicas-Report.pdf.
[2] See Part II, available at https://www.copyright.gov/ai/Copyright-and-Artificial-Intelligence-Part-2-Copyrightability-Report.pdf.
[3] The court’s holding in the Google Books case was based heavily on the court’s finding that Google’s snippet-only use of the books it digitized did not serve as a substantial substitute for such works, and thus, did not seriously diminish the potential market for or value of the copyrighted works. Certain AI applications are arguably substantial substitutes for entire industries generating creative works. Whether that constitutes a substantial substitute for each individual work used to create such AI models is to be adjudicated, but it certainly would not be unreasonable to find as such
[4] Recent court precedent on fair use has increasingly focused on the commercial impact of the purported fair use of the underlying works. This is true for AI cases, such as Thomson Reuters Enterprise Centre GmbH & West Publishing Corp. v. ROSS Intelligence Inc., where the court called the fourth factor [the effect of the use upon the potential market for the value of the copyrighted work] the “single most important element of fair use.” 2025 U.S. Dist. LEXIS 24296, at *27 (D. Del. Feb. 11, 2025) (citation omitted). It is also true of landmark copyright cases unrelated to AI, such as the recent case finding Andy Warhol’s use of a Prince portrait based on Lynn Goldsmith's photo was not fair use because it did not sufficiently transform the original photograph and competed in the same market. Andy Warhol Found. for the Visual Arts, Inc. v. Goldsmith, 598 U.S. 508 (2023). In the Warhol case, the Supreme Court borderline conflated the first and fourth factors, finding that both disfavored fair use because the infringing work functioned for a similar purpose as the original, which was a commercial and competitive purpose. This trajectory does not bode well in favor of fair use for AI, particularly where it displaces the market for the content used to train it.
[5] For example, Jenny Quang argues in the Berkeley Technology Law Journal as follows: “Training a machine learning model with [ ] copyrighted data does not infringe because the data are not redistributed or recommunicated to the public. Copyright protects creative expression, but model training extracts unprotectable ideas and patterns from data. Thus, data mining uses of copyrighted works need not even be subject to a fair use analysis.” Jenny Quang, Does Training AI Violate Copyright Law, 2021 Berkeley Tech. L.J. 1407, 1409.
[6] See id.
© 2025 Ryan June. All rights reserved.