Authors:
Masanao Ochi
1
;
Masanori Shiro
2
;
Jun’ichiro Mori
1
and
Ichiro Sakata
1
Affiliations:
1
Department of Technology Management for Innovation, Graduate School of Engineering, The University of Tokyo, Hongo 7-3-1, Bunkyo, Tokyo, Japan
;
2
HIRI, National Institute of Advanced Industrial Science and Technology, Umezono 1-1-1, Tsukuba, Ibaraki, Japan
Keyword(s):
Citation Analysis, Scientific Impact, Graph Neural Network, BERT.
Abstract:
The scientific literature contains a wide variety of data, including language, citations, and images of figures and tables. The Transformer model, released in 2017, was initially used in natural language processing but has since been widely used in various fields, including image processing and network science. Many Transformer models trained with an extensive data set are available, and we can apply small new data to the models for our focused tasks. However, classification and regression studies for scholarly data have been conducted primarily by using each data set individually and combining the extracted features, with insufficient consideration given to the interactions among the data. In this paper, we propose an end2end fusion method for linguistic and citation information in scholarly literature data using the Transformer model. The proposed method shows the potential to efficiently improve the accuracy of various classifications and predictions by end2end fusion of various d
ata in the scholarly literature. Using a dataset from the Web of Science, we classified papers with the top 20% citation counts three years after publication. The results show that the proposed method improves the F-value by 2.65 to 6.08 percentage points compared to using only particular information.
(More)