
models. To address these, we proposed an ensemble
approach that outperformed all individual models, in-
cluding proprietary LLMs like GPT-4, highlighting its
potential as a lightweight yet effective alternative for
ontology curation tasks.
Despite remaining challenges, our work opens
promising directions for automating knowledge base
validation and enrichment. In future work, we aim to
investigate fine-tuning strategies for LLMs to improve
their reasoning on domain-specific tasks. Another
perspective involves adapting our methods to differ-
ent domains and ontological structures. We also see
potential in integrating external knowledge sources,
such as curated databases of occupations and skills, to
enhance LLM interpretability and decision-making.
Finally, assessing the impact of these methods on real-
world applications, like recommendation systems or
career guidance platforms, would be an essential step
toward validating their practical value.
ACKNOWLEDGEMENTS
The authors gratefully acknowledge InterMEDIUS
for the partial funding of this work.
REFERENCES
Abolhasani, M. S. and Pan, R. (2024). Leveraging llm for
automated ontology extraction and knowledge graph
generation.
Brown, T. B. and al. (2020). Language Models are Few-
Shot Learners. arXiv:2005.14165 [cs] version: 1.
Chen, X., Aksitov, R., Alon, U., Ren, J., Xiao, K., Yin,
P., Prakash, S., Sutton, C., Wang, X., and Zhou,
D. (2023). Universal Self-Consistency for Large
Language Model Generation. arXiv e-prints, page
arXiv:2311.17311.
de Souza P. Moreira, G., Osmulski, R., Xu, M., Ak, R.,
Schifferer, B., and Oldridge, E. (2025). Nv-retriever:
Improving text embedding models with effective hard-
negative mining.
Devlin, J., Chang, M.-W., Lee, K., and Toutanova,
K. (2018). BERT: Pre-training of Deep Bidirec-
tional Transformers for Language Understanding.
arXiv:1810.04805 [cs] version: 1.
et al., A. Q. J. (2023). Mistral 7b.
Fleiss, J. L. (1971). Measuring nominal scale agree-
ment among many raters. Psychological Bulletin,
76(5):378–382. Place: US Publisher: American Psy-
chological Association.
Gudibande, A., Wallace, E., Snell, C., Geng, X., Liu, H.,
Abbeel, P., Levine, S., and Song, D. (2023). The
False Promise of Imitating Proprietary LLMs. arXiv
e-prints, page arXiv:2305.15717.
Kojima, T., Gu, S. S., Reid, M., Matsuo, Y., and Iwasawa, Y.
(2022). Large language models are zero-shot reason-
ers. In Koyejo, S., Mohamed, S., Agarwal, A., Bel-
grave, D., Cho, K., and Oh, A., editors, Advances in
Neural Information Processing Systems, volume 35,
pages 22199–22213. Curran Associates, Inc.
Kommineni, V. K., K
¨
onig-Ries, B., and Samuel, S. (2024).
From human experts to machines: An llm supported
approach to ontology and knowledge graph construc-
tion.
le Vrang, M., Papantoniou, A., Pauwels, E., Fannes, P., Van-
densteen, D., and De Smedt, J. (2014). Esco: Boosting
job matching in europe with semantic interoperability.
Computer, 47(10):57–64.
Martin, L., Muller, B., Ortiz Su
´
arez, P. J., Dupont, Y., Ro-
mary, L., de la Clergerie,
´
E., Seddah, D., and Sagot, B.
(2020). CamemBERT: a tasty French language model.
In Jurafsky, D., Chai, J., Schluter, N., and Tetreault,
J., editors, Proceedings of the 58th Annual Meet-
ing of the Association for Computational Linguistics,
pages 7203–7219, Online. Association for Computa-
tional Linguistics.
Meyer, L.-P., Stadler, C., Frey, J., Radtke, N., Junghanns,
K., Meissner, R., Dziwis, G., Bulert, K., and Martin,
M. (2024). Llm-assisted knowledge graph engineer-
ing: Experiments with chatgpt. In Zinke-Wehlmann,
C. and Friedrich, J., editors, First Working Con-
ference on Artificial Intelligence Development for a
Resilient and Sustainable Tomorrow, pages 103–115,
Wiesbaden. Springer Fachmedien Wiesbaden.
Petroni, F., Rockt
¨
aschel, T., Lewis, P., Bakhtin, A., Wu,
Y., Miller, A. H., and Riedel, S. (2019). Language
models as knowledge bases? arXiv preprint
arXiv:1909.01066.
Ranaldi, L. and Freitas, A. (2024). Aligning Large and
Small Language Models via Chain-of-Thought Rea-
soning. In Graham, Y. and Purver, M., editors, Pro-
ceedings of the 18th Conference of the European
Chapter of the Association for Computational Lin-
guistics (Volume 1: Long Papers), pages 1812–1827,
St. Julian’s, Malta. Association for Computational
Linguistics.
Reimers, N. and Gurevych, I. (2019). Sentence-BERT: Sen-
tence embeddings using Siamese BERT-networks. In
Inui, K., Jiang, J., Ng, V., and Wan, X., editors, Pro-
ceedings of the 2019 Conference on Empirical Meth-
ods in Natural Language Processing and the 9th Inter-
national Joint Conference on Natural Language Pro-
cessing (EMNLP-IJCNLP), pages 3982–3992, Hong
Kong, China. Association for Computational Linguis-
tics.
Robinson, J., Chuang, C.-Y., Sra, S., and Jegelka, S. (2021).
Contrastive learning with hard negative samples.
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones,
L., Gomez, A. N., Kaiser, L. u., and Polosukhin,
I. (2017). Attention is all you need. In Guyon,
I., Luxburg, U. V., Bengio, S., Wallach, H., Fer-
gus, R., Vishwanathan, S., and Garnett, R., editors,
Advances in Neural Information Processing Systems,
volume 30. Curran Associates, Inc.
KEOD 2025 - 17th International Conference on Knowledge Engineering and Ontology Development
182