technology that can benefit human society and
promote scientific and technological progress, and
therefore should be included in the fair use system
according to the pursuit of the public interest. And he
believes that the use of works in the training of AI
will not affect the normal use of the original works,
and that as long as it is complemented with other
mechanisms of copyright protection, it can also take
account of the original author's and the interests of the
legislation (Xu, 2024).
The fair use system was initially dominated by the
four-factor determination rule. The 19th-century case
Folsom v. Marsh is considered to be the first case on
fair use in the U.S. In this case, Judge Joseph Story
first proposed a four-factor analytical framework that
would become the foundational standard for
assessing fair use (Justia, 1841). This analytical
framework focuses on four key considerations: the
purpose and character of the use; the nature of the
original copyrighted work; the amount and
substantiality of the portion used; and the effect of the
use on the potential market or value of the work.
Then in the 20th century Campbell case, judges
introduced the "transformative use" criterion,
elevating the fair use doctrine from its traditional
four-factor assessment framework into a new phase
of development. (Justia,1994) Wang Qian, a scholar
in China, believes that "transformative use" refers to
the use of such works by adding new content and
ideas to the original work, so as to make the original
work show a new value. It is not a simple copy of the
content of the original work, but an approach of
realizing the conversion of the work's function or
purpose (Wang, 2021). Also, Chinese scholar Wu
Handong points out that within the criteria for
determining fair use, the rule of "transformative use"
emphasizes whether the new work’s utilization of the
original material demonstrates transformative
purpose and character, rather than rigidly adhering to
restrictions on the nature or quantity of the original
work’s usage (Wu, 2020). The perspectives of the
aforementioned two parties focus more on the
transform of subjective intentions when using the
original work, creating new value based on the factual
content of the original work, which is in perfect
alignment with the essence of artificial intelligence's
corpus training. Artificial intelligence corpus training
is not a direct use of the original work, but through
data analysis to summarize the patterns behind the
original work, so as to form a logical generation mode
and generate new content for the user. That’s the
reason why training method can constitute the so-
called "transformative use" provisions.
Furthermore, Chinese scholar Yi Jiming has
proposed a special version of "transformative use"
criterion. Within this framework, he emphasizes that
such usage not only entails a subjective shift in
purpose but also involves objective technological
innovation, thereby achieving a complete
transformation and value-added enhancement of the
original work's significance (Yi, 2024). This
evaluative approach can help trainers and users pay
more attention to whether the utilization of a work
contributes to the advancement of science and
technology as well as the realization of societal public
interests, thereby fostering the creation and value
enhancement of new technologies. According to that,
Artificial intelligence, a new technology which can
promote scientific and technological progress,
absolutely has rationality for its training process.
What’s more, the author contends that traditional
"transformative use" and the new version of it do not
need to form successive phases but should coexist to
accommodate the widespread implementation of
artificial intelligence. For instance, the criteria for
traditional one could govern the training of most
generative AI systems in producing images, texts, and
similar outputs, while the new standard might apply
to the development of AI models centered on
technological breakthroughs.
Of course, there are different views in the
academic community about whether AI corpus
training can constitute fair use. Chinese scholar Wang
Xuelei acknowledges that the utilization of works
during artificial intelligence training constitutes an
exploration and analysis of the patterns underlying
the works. However, she argues that the object of
such use is the data extracted from original works
rather than the works themselves, thereby eliminating
the necessity for copyright regulation in this context
(Wang, 2025). This study, however, contends that the
data utilized for analysis originates from copyrighted
works rather than existing in isolation. Such works
should not be categorically excluded during
analytical processes. Furthermore, from the
perspective of balancing interests, reducing works to
mere data vessels through disaggregation undermines
the rights and benefits of human authors. Meanwhile,
it must be emphasized that the fair use system
inherently functions as a product of balancing
competing interests. We should keep a balance
between the interest of individual copyright and
public interest of the whole society. Some scholars,
motivated by concerns for copyright holders'
interests, argue that categorizing works created by
human intellectual labor as training materials for
artificial intelligence under fair use provisions would