Scholarly citation practices are not immune to
author bias, which takes various forms. Excessive
self-citation, for example, occurs when authors cite
their work excessively, potentially to bolster their
publication count or perceived impact. Coercive
citation involves reviewers or editors pressuring
authors to cite specific references, sometimes
including their own or those from journals.
Citation networks can exhibit phenomena like
citation rings, where groups of researchers
reciprocally inflate each other's citation counts, and
ghost citations, fabricated references to strengthen
arguments or create the illusion of broader support.
Data-related issues, such as inaccurate reference
formatting or ambiguity in author names, further
complicate citation tracking. External influences, like
the bias towards citing studies from prestigious
journals or those funded by entities, also skew citation
practices. Moreover, erratic citation patterns may
emerge as scholars establish foundational works and
methodologies in emerging research fields.
As widely acknowledged, citation analysis is
focused on identifying various anomalies. This
attention stems from the concern raised in the
introduction that these anomalies may originate from
inaccurate references in a specific context. The
effectiveness of anomaly detection hinges on
selecting the most appropriate algorithm for the
specific data type and desired data-centric outcome.
Most studies in the mentioned field (see, e.g., (Liu
2022), (Liu, 2024)) concentrate on anomaly citation
recognition, examining a citation graph in its entirety
and losing the graph granularity. An anomaly paper
in a citation network is one whose citation patterns
deviate significantly from the norm for its field and
topic. These deviations can indicate various issues,
potentially impacting the integrity of the academic
record. Such an anomaly is associated with the paper's
position within the citation network. Unexpected co-
citation patterns can signal anomalies, such as a
highly cited paper only co-cited with irrelevant
works.
The community structure is also important to
consider, considering whether the paper belongs to a
cluster of highly interconnected papers that exhibit
unusual citation behavior. So, anomalies can exhibit
a spectrum of deviations from the norm, indicating
that their departure from typical patterns can vary in
severity across different instances, creating a nested
anomaly structure.
Unlike traditional methods, this paper presents a
multi-level analysis for more comprehensive
detection, examining articles at various granularities
to uncover overlooked irregularities. The findings
of this research can be applied to various fields,
including citation analysis, software engineering, and
scholarly communication, to detect irregularities,
uncover emerging trends, and enhance the accuracy
of citation-based metrics and analyses, thereby
improving the quality and trustworthiness of
academic research evaluation.
Aiming to recognize anomaly papers on nested
citation levels, we base our research on the method
proposed in (Tang, 2022). It is an innovative
approach to harnessing spectral information within
Graph Neural Networks (GNNs) to detect anomalies.
It proposes a new network architecture called a Beta
Wavelet Graph Neural Network (BWGNN).
The proposed method involves a detailed
examination of articles' location in a network, starting
from their broad structural attributes and narrowing
down to finer connection elements to identify
anomalies in the citation nested patterns.
Aiming to prepare the initiating anomaly sets in
data clusters, a citation graph is embedded using the
Node2Vec method (Grover, 2016) and clustered in a
linear space. Subsequently, the outer shell of the
clusters—those points most distant from the cluster
centers—is identified as the initial set of anomalies.
We employ the BWGNN network trained on a dataset
with elements assigned as anomalies or normal
elements to better understand the identified
anomalies. After categorizing the data points, the
network identifies anomalies and their connections
within the network. These anomalies are then
removed, resulting in a cleaner dataset. The process
is then repeated using this reduced graph to refine the
detection and analysis of anomalies at further deeper
levels.
2 MATHEMATICAL
FRAMEWORKS
Subsections 2.1 and 2.2 establish the background for
BWGNN to be employed throughout this study.
2.1 Signal on Graphs
An attributed graph, G = {V, E}, is characterized in
this study by a collection of nodes V and unweighted
edges E connecting the nodes. The degree matrix D is
a diagonal matrix where D
ii
denotes the degree, or
number of connections, of vertex i. The adjacency
matrix A is a square matrix where A
ij
signifies the
presence (with a 1) or absence (with a 0) of an edge
between vertices i and j. Let L=D-A be the regular