
6 CONCLUSIONS
This paper presents a two-phase classification frame-
work for identifying innovative sentences in scien-
tific literature, integrating a Time Mixing Attention
(TMA) mechanism and a Mixture of Experts (MoE)
model. The first phase enhances long-range depen-
dency modeling using TMA, while the second phase
employs MoE to classify sentences into theoretical,
methodological, and applied innovation categories.
Additionally, a generative semantic data augmenta-
tion method is introduced to address class imbalance
and improve model performance.
Experimental results demonstrate that the pro-
posed two-phase SciBERT+TMA model achieves su-
perior performance, with a macro-averaged F1-score
of 90.8%, outperforming the one-phase approach and
all LLM baselines. Specifically, the F1-scores for
theoretical, methodological, and applied innovation
categories reach 95.1%, 90.8%, and 86.6%, respec-
tively, highlighting the effectiveness of progressive
classification refinement. Compared to direct clas-
sification, the MoE-based approach significantly im-
proves precision and recall. Among LLMs, Ministral-
8B achieves the best performance in prompt-based
classification, with a macro-averaged F1-score of
85.2%, reinforcing the advantages of a domain-
adapted framework over general-purpose LLM infer-
ence.
ACKNOWLEDGEMENTS
This study was funded by National Social Science
Foundation of China (Grant No. 21&ZD329).
REFERENCES
Abdin, M., Aneja, J., Awadalla, H., and et al. (2024).
Phi-3 technical report: A highly capable language
model locally on your phone. arXiv preprint,
arXiv:2404.14219.
Afzal, M., Hussain, J., Abbas, A., and et al. (2024).
Transformer-based active learning for multi-class
text annotation and classification. Digital Health,
10:20552076241287357.
Bayer, M., Kaufhold, M. A., and Reuter, C. (2022). A sur-
vey on data augmentation for text classification. ACM
Computing Surveys, 55(7):1–39.
Beltagy, I., Lo, K., and Cohan, A. (2019). Scibert: A
pretrained language model for scientific text. arXiv
preprint arXiv:1903.10676.
Devlin, J. (2018). Bert: Pre-training of deep bidirec-
tional transformers for language understanding. arXiv
preprint arXiv:1810.04805.
Dubey, A., Jauhri, A., Pandey, A., and et al. (2024).
The llama 3 herd of models. arXiv preprint,
arXiv:2407.21783.
Guo, Q., Qiu, X., Liu, P., and et al. (2020). Multi-scale
self-attention for text classification. In Proceedings
of the AAAI Conference on Artificial Intelligence, vol-
ume 34, pages 7847–7854.
Jain, V., Rungta, M., Zhuang, Y., and et al. (2024). Higen:
Hierarchy-aware sequence generation for hierarchical
text classification. arXiv preprint arXiv:2402.01696.
Le, L., Zhao, G., Zhang, X., and et al. (2024). Colal: Co-
learning active learning for text classification. In Pro-
ceedings of the AAAI Conference on Artificial Intelli-
gence, volume 38, pages 13337–13345.
Liu, Y. (2019). Roberta: A robustly optimized bert pretrain-
ing approach. arXiv preprint arXiv:1907.11692, 364.
Oida-Onesa, R. and Ballera, M. A. (2024). Fine tuning lan-
guage models: A tale of two low-resource languages.
Data Intelligence, 6(4):946–967.
Paramanayakam, V., Karatzas, A., and Anagnostopoulos, I,
e. a. (2024). Less is more: Optimizing function call-
ing for llm execution on edge devices. arXiv preprint,
arXiv:2411.15399.
Shang, S., Jiang, R., Shibasaki, R., and Yan, R. (2024).
Foundation models for information retrieval and
knowledge processing. Data Intelligence, 6(4):891–
892.
Vaswani, A. (2017). Attention is all you need. In Advances
in Neural Information Processing Systems.
Wang, M., Kim, J., and Yan, Y. (2025). Syntactic-aware text
classification method embedding the weight vectors of
feature words. IEEE Access, 13:37572–37590.
Wang, M., Zhang, Z., Li, H., and Zhang, G. (2024). An im-
proved meta-knowledge prompt engineering approach
for generating research questions in scientific litera-
ture. In Proceedings of the 16th International Joint
Conference on Knowledge Discovery, Knowledge En-
gineering and Knowledge Management (KDIR), vol-
ume 1, pages 457–464.
Yang, A., Yang, B., Zhang, B., and et al. (2024). Qwen2.5
technical report. arXiv preprint, arXiv:2412.15115.
You, R., Zhang, Z., Dai, S., and et al. (2019). Haxmlnet:
Hierarchical attention network for extreme multi-label
text classification. arXiv preprint arXiv:1904.12578.
Yu, G., Zou, J., Hu, X., and et al. (2024). Revital-
izing multivariate time series forecasting: Learn-
able decomposition with inter-series dependencies
and intra-series variations modeling. arXiv preprint
arXiv:2402.12694.
Innovative Sentence Classification in Scientific Literature: A Two-Phase Approach with Time Mixing Attention and Mixture of Experts
389