Innovative Sentence Classification in Scientific Literature: A Two-Phase Approach with Time Mixing Attention and Mixture of Experts

Meng Wang, Mengting Zhang, Mengting Zhang, Hanyu Li, Hanyu Li, Jing Xie, Zhixiong Zhang, Zhixiong Zhang, Yang Li, Yang Li, Gaihong Yu

2025

Abstract

Accurately classifying innovative sentences in scientific literature is essential for understanding research contributions. This paper proposes a two-phase classification framework that integrates a Time Mixing Attention (TMA) mechanism and a Mixture of Experts (MoE) system to enhance multi-class innovation classification. In the first phase, TMA improves long-range dependency modeling through temporal shift padding and sequence slice reorganization. The second phase employs an MoE-based approach to classify theoretical, methodological, and applied innovations. To mitigate class imbalance, a generative semantic data augmentation method is introduced, improving model performance across different innovation categories. Experimental results demonstrate that the proposed two-phase SciBERT+TMA model achieves the highest performance, with a macroaveraged F1-score of 90.8%, including 95.1% for theoretical innovation, 90.8% for methodological innovation, and 86.6% for applied innovation. Compared to the one-phase SciBERT+TMA model, the two-phase approach significantly improves precision and recall, highlighting the benefits of progressive classification refinement. In contrast, the best-performing LLM baseline, Ministral-8B-Instruct, achieves a macro-averaged F1-score of 85.2%, demonstrating the limitations of prompt-based inference in structured classification tasks. The results underscore the advantage of a domain-adapted approach in capturing fine-grained distinctions in innovation classification. The proposed framework provides a scalable solution for multi-class sentence classification and can be extended to broader academic classification tasks. Model weights and details are available at https://huggingface.co/wmsr22/Research Value Generation/tree/main.

Download


Paper Citation


in Harvard Style

Wang M., Zhang M., Li H., Xie J., Zhang Z., Li Y. and Yu G. (2025). Innovative Sentence Classification in Scientific Literature: A Two-Phase Approach with Time Mixing Attention and Mixture of Experts. In Proceedings of the 14th International Conference on Data Science, Technology and Applications - Volume 1: DATA; ISBN 978-989-758-758-0, SciTePress, pages 382-389. DOI: 10.5220/0013513200003967


in Bibtex Style

@conference{data25,
author={Meng Wang and Mengting Zhang and Hanyu Li and Jing Xie and Zhixiong Zhang and Yang Li and Gaihong Yu},
title={Innovative Sentence Classification in Scientific Literature: A Two-Phase Approach with Time Mixing Attention and Mixture of Experts},
booktitle={Proceedings of the 14th International Conference on Data Science, Technology and Applications - Volume 1: DATA},
year={2025},
pages={382-389},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0013513200003967},
isbn={978-989-758-758-0},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 14th International Conference on Data Science, Technology and Applications - Volume 1: DATA
TI - Innovative Sentence Classification in Scientific Literature: A Two-Phase Approach with Time Mixing Attention and Mixture of Experts
SN - 978-989-758-758-0
AU - Wang M.
AU - Zhang M.
AU - Li H.
AU - Xie J.
AU - Zhang Z.
AU - Li Y.
AU - Yu G.
PY - 2025
SP - 382
EP - 389
DO - 10.5220/0013513200003967
PB - SciTePress