Enhanced LLM Text Classification Method with Embedded Semantic Feature Encoding
Meng Wang, Jing Xie, Yang Li, Yang Li, Zhixiong Zhang, Zhixiong Zhang, Hanyu Li, Hanyu Li
2025
Abstract
Accurate identification of semantic features in scientific texts is crucial for enhancing text classification performance. This paper presents a large language model text classification method with embedded semantic feature encoding, which enhances the model's understanding of textual semantics through a dual semantic feature encoding mechanism. The method employs a dynamic window-based local-global feature extraction strategy to capture topical semantic features and utilizes hierarchical structural aggregation mechanisms to extract organizational semantic information from texts. To fully leverage the extracted semantic features, we design a feature replacement encoding strategy that embeds topical semantic features and structural semantic features into the [CLS] and [SEP] positions of large language models, respectively, achieving deep fusion between semantic features and internal model representations, thereby improving the accuracy and robustness of text classification. Experimental results demonstrate that the proposed semantic feature encoding enhancement method achieves significant performance improvements. On the DBPedia dataset, the semantically encoded SciBERT model achieves an F1-score of 91.07%, representing a 5.26% improvement over the original encoding approach. In the scientific literature value sentence identification task, Qwen3-14B combined with semantic feature encoding and QLora fine-tuning achieves an F1-score of 94.19%, showing a 14.64% improvement over the baseline model. Compared to traditional feature concatenation or simple fusion approaches, our feature replacement encoding strategy leverages semantic features at critical positions, significantly enhancing both classification precision and recall. Ablation experiments further validate the synergistic effects of topical semantic features and structural semantic features, confirming the effectiveness of the dual semantic feature encoding mechanism. The research findings highlight the advantages of semantic feature encoding in text classification tasks, providing an effective technical solution for intelligent analysis of scientific texts.
DownloadPaper Citation
in Harvard Style
Wang M., Xie J., Li Y., Zhang Z. and Li H. (2025). Enhanced LLM Text Classification Method with Embedded Semantic Feature Encoding. In Proceedings of the 17th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - Volume 1: KDIR; ISBN , SciTePress, pages 87-97. DOI: 10.5220/0013697900004000
in Bibtex Style
@conference{kdir25,
author={Meng Wang and Jing Xie and Yang Li and Zhixiong Zhang and Hanyu Li},
title={Enhanced LLM Text Classification Method with Embedded Semantic Feature Encoding},
booktitle={Proceedings of the 17th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - Volume 1: KDIR},
year={2025},
pages={87-97},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0013697900004000},
isbn={},
}
in EndNote Style
TY - CONF
JO - Proceedings of the 17th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - Volume 1: KDIR
TI - Enhanced LLM Text Classification Method with Embedded Semantic Feature Encoding
SN -
AU - Wang M.
AU - Xie J.
AU - Li Y.
AU - Zhang Z.
AU - Li H.
PY - 2025
SP - 87
EP - 97
DO - 10.5220/0013697900004000
PB - SciTePress