CGNTM: Unsupervised Causal Topic Modeling with LLMs and Nonlinear Causal GNNs

Peixuan Men; Longchao Wang; Aihua Li; Xiaoli Tang

doi:10.5220/0013708200004000

CGNTM: Unsupervised Causal Topic Modeling with LLMs and Nonlinear Causal GNNs

Peixuan Men, Longchao Wang, Aihua Li, Xiaoli Tang

2025

Abstract

We propose CGNTM, a fully unsupervised causal topic model that integrates large language models (LLMs) with neural causal inference. Unlike conventional and supervised topic models, CGNTM learns both hierarchical topics and their directed causal relations directly from raw text, without requiring labeled data. The framework leverages LLM-based prompt extraction to identify salient keywords and candidate causal pairs, which are refined through differentiable Directed Acyclic Graph (DAG) learning and modeled via a nonlinear structural causal model (SCM). A directionally masked graph neural network (GNN) propagates information strictly along causal edges, while a Wasserstein Generative Adversarial Network (GAN) enforces semantic consistency under counterfactual interventions via BERT-based regularization. This combination enables the model to not only discover coherent and diverse topics but also uncover interpretable causal relationships among them. The architecture supports hierarchical topic organization by clustering fine-grained terms into broader themes and modeling cross-level dependencies through dual-layer message passing. Experimental results demonstrate that CGNTM outperforms state-of-the-art models in topic quality and causal interpretability. Ablation studies confirm the essential role of each component-LLM-guided extraction, nonlinear SCM, directional GNN propagation, and adversarial training-in contributing to both causal accuracy and topic coherence. The proposed framework opens new directions for unsupervised causal discovery in text, offering transforma-tive potential in domains where understanding why certain topics co-occur is as crucial as identifying what they are.

Download

Paper Citation

in Harvard Style

Men P., Wang L., Li A. and Tang X. (2025). CGNTM: Unsupervised Causal Topic Modeling with LLMs and Nonlinear Causal GNNs. In Proceedings of the 17th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - Volume 1: KDIR; ISBN , SciTePress, pages 275-285. DOI: 10.5220/0013708200004000

in Bibtex Style

@conference{kdir25,
author={Peixuan Men and Longchao Wang and Aihua Li and Xiaoli Tang},
title={CGNTM: Unsupervised Causal Topic Modeling with LLMs and Nonlinear Causal GNNs},
booktitle={Proceedings of the 17th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - Volume 1: KDIR},
year={2025},
pages={275-285},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0013708200004000},
isbn={},
}

in EndNote Style

TY - CONF

JO - Proceedings of the 17th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - Volume 1: KDIR
TI - CGNTM: Unsupervised Causal Topic Modeling with LLMs and Nonlinear Causal GNNs
SN -
AU - Men P.
AU - Wang L.
AU - Li A.
AU - Tang X.
PY - 2025
SP - 275
EP - 285
DO - 10.5220/0013708200004000
PB - SciTePress