The  vertical  red  lines  assign  the  boundary 
between  clusters.  As  can  be seen, the  partitions  are 
about  the  same  in  all  cases,  which  can  indicate  the 
presence  of  two  subpopulations  in  the  edge 
community. 
5  SUMMARY AND 
CONCLUSIONS 
This  research  introduces  a  new  methodology  for 
detecting  suspicious  citations  in  scientific  literature 
using  the  GraphSAGE  algorithm  and  enhanced 
citation  graph  embeddings.  The  method  has  shown 
effectiveness  in  uncovering  citation  anomalies 
through extensive testing. However, challenges arise 
in  handling  interdisciplinary  research  and  "sleeping 
beauties"—articles  initially  overlooked  but  later 
recognized due to delayed breakthroughs—making it 
difficult for the model to differentiate genuine citation 
dynamics from anomalies. 
Approximately 80% of citation edges studied in 
the  study  are  identified  as  vulnerable  to  distortion, 
revealing their lack of robustness within the citation 
graph.  These  edges  are  flagged  as  potentially 
manipulated,  highlighting  the  fragile  nature  of 
citation  datasets  and  the  significant  impact  that 
individual  edges  can  have  on  network  stability  and 
reliability.  Despite  structural  differences  between 
datasets,  shared  characteristics  are  identified, 
suggesting  universal  tendencies  within  citation 
systems. The Cora dataset displayed a homogeneous 
structure  with  a  higher  proportion  of  suspicious 
citations,  while  an  analysis  of  the  larger  and  more 
heterogeneous  PubMed  dataset  reveals  two  distinct 
citation groups: one associated with suspicious edges 
and  another  with  more  stable,  well-reconstructed 
citations. 
All  datasets  considered  exhibit  a  stable  core  of 
reliable  connections,  reflecting  the  gradual 
accumulation  of  trustworthy  citations  over  time. 
Nonetheless, even in datasets regularly updated with 
new publications, a substantial number of edges are 
found  to  be  unstable  or  irrelevant,  suggesting  that 
citation  datasets  inherently  include  connections 
disposed  to  manipulation  or  unreliability. 
Reconstruction  score  distributions  demonstrated  a 
positively  skewed,  unimodal  pattern,  where  most 
citations clustered around lower scores, with a right-
skewed  tail  influenced  by  higher  scores.  This 
distribution  implies  that  a  significant  portion  of 
citations may lack reliability, raising concerns about 
potential manipulation.  
To validate the proposed approach, an experiment 
is  conducted  with  artificially  augmented  citation 
graphs obtained by adding random noise expressed in 
random  edges.  The  results  validate  the  model's 
effectiveness  in  detecting  such  anomalies,  further 
reinforcing its value as a reliable tool for identifying 
citation manipulation. The proposed method provides 
a  framework  for  dynamically  monitoring  research 
trends  and  integrating  new  articles  into  citation 
graphs,  leveraging  a  stable  core  of  knowledge  to 
evaluate individual links. Exploring positions within 
the  recovery  histogram  offers  insights  into  citation 
reliability and susceptibility to manipulation. 
This  research  proposes  new  avenues  for 
understanding  citation  dynamics,  emphasizing  the 
role  of  stable  reconstructed  edge  clusters  in 
maintaining  citation  network  integrity.  It  also 
highlights universal patterns within citation systems, 
offering valuable insights for developing robust tools 
for citation analysis and anomaly detection. 
REFERENCES 
Avros, R., Haim, M. B., Madar, A., Ravve, E., & 
Volkovich,  Z.  (2024).  Spotting  suspicious  academic 
citations  using  self-learning  graph  transformers. 
Mathematics,  12(6),  814. 
https://doi.org/10.3390/math12060814. 
Avros, R., Keshet, S., Kitai, D. T., Vexler, E., & Volkovich, 
Z.  (2023).  Detecting  pseudo-manipulated  citations  in 
scientific literature through perturbations of the citation 
graph.  Mathematics,  11(18),  3820. 
https://doi.org/10.3390/math11123820. 
Avros, R., Keshet, S., Kitai, D. T., Vexler, E., & Volkovich, 
Z.  (2023).  Detecting  manipulated  citations  through 
disturbed node2vec embedding. In Proceedings of the 
25th  International  Symposium  on  Symbolic  and 
Numeric  Algorithms  for  Scientific  Computing 
(SYNASC), Nancy, France, 2023 (pp. 274–278). IEEE. 
https://doi.org/10.1109/SYNASC61333.2023.00047 
Falagas, M. E., & Alexiou, V. G. (2008). The Top-Ten in 
Journal  Impact  Factor  Manipulation.  Archives  of 
Immunology  and  Therapy  Experimental  (Warsz), 
56(4),  223–226.  https://doi.org/10.1007/s00005-008-
0024-5. 
Fong,  E.  A.,  &  Wilhite,  A.  W.  (2017).  Authorship  and 
citation  manipulation  in  academic  research.  PLOS 
ONE,  12(12),  e0187394. 
https://doi.org/10.1371/journal.pone.0187394. 
Grover,  A.,  &  Leskovec,  J.  (2016).  Node2vec:  Scalable 
feature  learning  for  networks.  In  Proceedings  of  the 
22nd  ACM  SIGKDD  International  Conference  on 
Knowledge  Discovery  and  Data  Mining  (KDD  '16), 
San Francisco, CA, USA, 13–17 August 2016 (pp. 855–
864). ACM. https://doi.org/10.1145/2939672.2939754. 
Hamilton, W., Ying, Z., & Leskovec, J. (2017). Inductive 
representation  learning  on  large  graphs.  Advances  in 
Neural  Information  Processing  Systems,  30,  1024–
1034.