Enhanced LLM Text Classification Method with Embedded Semantic Feature Encoding

Meng Wang, Jing Xie, Yang Li, Yang Li, Zhixiong Zhang, Zhixiong Zhang, Hanyu Li, Hanyu Li

2025

Abstract

Accurate identification of semantic features in scientific texts is crucial for enhancing text classification performance. This paper presents a large language model text classification method with embedded semantic feature encoding, which enhances the model's understanding of textual semantics through a dual semantic feature encoding mechanism. The method employs a dynamic window-based local-global feature extraction strategy to capture topical semantic features and utilizes hierarchical structural aggregation mechanisms to extract organizational semantic information from texts. To fully leverage the extracted semantic features, we design a feature replacement encoding strategy that embeds topical semantic features and structural semantic features into the [CLS] and [SEP] positions of large language models, respectively, achieving deep fusion between semantic features and internal model representations, thereby improving the accuracy and robustness of text classification. Experimental results demonstrate that the proposed semantic feature encoding enhancement method achieves significant performance improvements. On the DBPedia dataset, the semantically encoded SciBERT model achieves an F1-score of 91.07%, representing a 5.26% improvement over the original encoding approach. In the scientific literature value sentence identification task, Qwen3-14B combined with semantic feature encoding and QLora fine-tuning achieves an F1-score of 94.19%, showing a 14.64% improvement over the baseline model. Compared to traditional feature concatenation or simple fusion approaches, our feature replacement encoding strategy leverages semantic features at critical positions, significantly enhancing both classification precision and recall. Ablation experiments further validate the synergistic effects of topical semantic features and structural semantic features, confirming the effectiveness of the dual semantic feature encoding mechanism. The research findings highlight the advantages of semantic feature encoding in text classification tasks, providing an effective technical solution for intelligent analysis of scientific texts.

Download


Paper Citation


in Harvard Style

Wang M., Xie J., Li Y., Zhang Z. and Li H. (2025). Enhanced LLM Text Classification Method with Embedded Semantic Feature Encoding. In Proceedings of the 17th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - Volume 1: KDIR; ISBN , SciTePress, pages 87-97. DOI: 10.5220/0013697900004000


in Bibtex Style

@conference{kdir25,
author={Meng Wang and Jing Xie and Yang Li and Zhixiong Zhang and Hanyu Li},
title={Enhanced LLM Text Classification Method with Embedded Semantic Feature Encoding},
booktitle={Proceedings of the 17th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - Volume 1: KDIR},
year={2025},
pages={87-97},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0013697900004000},
isbn={},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 17th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - Volume 1: KDIR
TI - Enhanced LLM Text Classification Method with Embedded Semantic Feature Encoding
SN -
AU - Wang M.
AU - Xie J.
AU - Li Y.
AU - Zhang Z.
AU - Li H.
PY - 2025
SP - 87
EP - 97
DO - 10.5220/0013697900004000
PB - SciTePress