Authors:
Claire Ponciano
;
Markus Schaffert
and
Jean-Jacques Ponciano
Affiliation:
i3mainz, University of Applied Sciences, Germany
Keyword(s):
Ontology-Grounded Language Modeling, GPT, Knowledge-Enhanced Text Generation, Retrieval-Augmented Generation, Spinoza, Linked Open Data, Historical Text Synthesis, Philosophical Language Modeling, BERTScore Evaluation, Structured Knowledge Integration, Latin Text Generation, Large Language Models, Text Style Transfer, Semantic Conditioning, Canonical Corpus Fine-Tuning.
Abstract:
We present an ontology-grounded approach to GPT-based text generation aimed at improving factual grounding, historical plausibility, and stylistic fidelity in a case study: Baruch Spinoza’s Latin writings. We construct a compact ontology from Linked Open Data (Wikidata/DBpedia) augmented with expert-curated facts, serialize triples into natural-language statements, and interleave these with a canonical Latin corpus during fine-tuning of a GPT-2 (124M) model. At inference, retrieval-augmented generation (RAG) prepends ontology-derived facts and lightweight stylistic instructions, guiding the model toward historically consistent continuations in Spinoza’s register. Evaluation follows an 80/20 paragraph split of Ethica: we generate continuations for the 80% of segments retained and measure the semantic similarity (BERTScore) with the 20% omitted. This evaluation is completed by an expert assessment of historical plausibility and cosine similarity scores computation for the stylistic aut
henticity. Relative to a GPT-2 baseline trained only on the Latin corpus, our ontology-grounded variant achieves higher BERTScore and produces fewer factual and conceptual errors, preserving Latin rhetorical structure. These results indicate that structured knowledge integration is a feasible and effective way to make generative models more reliable for cultural-heritage text.
(More)