explore the legal boundaries of TDM behavior under
the current legal framework of China and the
rationality of its application in the fair use system, in
order to propose suggestions for improving China’s
TDM fair use system.
2 INFRINGEMENT RISKS
ASSOCIATED WITH TDM
TDM is a collective activity involving multiple
processes, divided into three stages: data collection,
data processing, and data aggregation and output
(Fan,
2024)
.
2.1 Infringement Risks in the Data
Collection Stage
There is a high risk of infringement of reproduction
rights during the data collection phase of TDM
behavior. At this stage, large-scale text data is often
automatically captured using web crawlers and other
technological means. Although authorized or
unprotected content can be legally collected, the
actual collected data is often mixed data due to the
algorithm's indiscriminate recognition of the data,
and it is difficult to obtain usage licenses one by one,
which can easily constitute infringement of the right
holder's reproduction right
(Fan, 2024). In particular,
long-term storage of source text data for repeated
calls to the behavior of more clearly considered a
violation of the right to copy. In addition, the data
collection process often needs to circumvent the
“Control and utilization” technology protection
measures, such as bypassing access restrictions,
traffic monitoring, etc., which also constitutes a
violation of the right to copy. Even short-term,
indirect temporary copy, because it may cause the
loss of work data and bring potential economic
damage, more and more is included in the protection
of the right to copy
(Ma & Zhao, 2021). Therefore,
in the data collection phase, TDM behavior faces a
substantial legal risk of infringement of the right of
reproduction.
2.2 Infringement Risks in the Data
Processing Stage
In the data processing stage of TDM behavior, the
original data is transformed into a structured form that
can be recognized by the algorithm through data
cleaning, data labeling and data collation, and then
serves the subsequent analysis. However, the
treatment at this stage may involve the adaptation,
translation, modification and reproduction of the
protected works, which may constitute a potential
infringement of copyright. On the one hand, data
cleansing often deletes non-target information such as
advertisements, comments, and codes to delete,
translate, and store the original work, the rights of
reproduction, translation, adaptation and the
protection of the integrity of the work are easily
infringed. On the other hand, data marks may also
infringe the right of deduction by changing the
original expression form by adding labels or notes
(Fan, 2024). In addition, data collation generates
structured data through “Transcoding” and other
means, which is highly homogeneous with the
translation and adaptation of works from the
perspective of external performance and internal
mechanism, therefore, it may constitute a right to the
right of adaptation, translation of the infringement
(Ma & Zhao, 2021). In general, the automatic and
deep processing characteristics of the TDM data
processing stage make it easy to cause the risk of
deductive copyright infringement without
authorization.
2.3 Infringement Risks in the Data
Aggregation and Output Stage
In the TDM behavior, the data summary output stage
mainly includes the collation and external output of
the analysis results, and there are multiple risks of
copyright infringement. First of all, data aggregation
does not usually constitute infringement if it only
involves the Quantitative analysis and independent
expression of the relationship between the original
data, but if the content of the original work itself is
selected and arranged, it may infringe upon the right
of compilation of the copyright owner. Secondly, in
the stage of data output, if the results containing the
content of the original work or its adapted content are
disseminated to the public through the network
platform or other means, it may constitute an
infringement of the right of information network
communication or the right of broadcasting
(Fan,
2024)
. In particular, if the expression content
protected by copyright is embedded in the analysis
results, its network release behavior is easy to touch
the “Copyright law” and “Regulations on the
protection of the right of communication of
Information Network” the relevant provisions of the
protection of the dissemination of property rights
(Chinese Government Website, 2021 & Chinese
Government Website, 2013)
. In summary, in the
stage of TDM data collection, whether it is content