
ACKNOWLEDGMENTS
We thank the National Council of Technological
and Scientific Development (CNPq), Brazil, grant
#301337/2025-0.
REFERENCES
Achiam, J., Adler, S., Agarwal, S., Ahmad, L., Akkaya, I.,
Aleman, F. L., Almeida, D., Altenschmidt, J., Altman,
S., Anadkat, S., et al. (2023). Gpt-4 technical report.
arXiv preprint arXiv:2303.08774.
Bonatti, P. A., Decker, S., Polleres, A., and Presutti,
V. (2019). Knowledge graphs: New directions
for knowledge representation on the semantic web
(dagstuhl seminar 18371). Dagstuhl reports, 8(9):29–
111.
Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J. D.,
Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G.,
Askell, A., et al. (2020). Language models are few-
shot learners. Advances in neural information pro-
cessing systems, 33:1877–1901.
Elshin, D., Karpachev, N., Gruzdev, B., Golovanov, I.,
Ivanov, G., Antonov, A., Skachkov, N., Latypova, E.,
Layner, V., Enikeeva, E., et al. (2024). From gen-
eral llm to translation: How we dramatically improve
translation quality using human evaluation data for
llm finetuning. In Proceedings of the Ninth Confer-
ence on Machine Translation, pages 247–252.
Guerdan, L., Barocas, S., Holstein, K., Wallach, H. M.,
Wu, Z. S., and Chouldechova, A. (2025). Validating
llm-as-a-judge systems in the absence of gold labels.
CoRR.
Huang, H., Chen, C., He, C., Li, Y., Jiang, J., and
Zhang, W. (2024). Can llms be good graph judger
for knowledge graph construction? arXiv preprint
arXiv:2411.17388.
Kendall, M. G. (1938). A new measure of rank correlation.
Biometrika, 30(1-2):81–93.
Khorashadizadeh, H., Amara, F. Z., Ezzabady, M., Ieng,
F., Tiwari, S., Mihindukulasooriya, N., Groppe, J.,
Sahri, S., Benamara, F., and Groppe, S. (2024). Re-
search trends for the interplay between large lan-
guage models and knowledge graphs. arXiv preprint
arXiv:2406.08223.
Liu, A., Feng, B., Xue, B., Wang, B., Wu, B., Lu, C.,
Zhao, C., Deng, C., Zhang, C., Ruan, C., et al.
(2024). Deepseek-v3 technical report. arXiv preprint
arXiv:2412.19437.
Racharak, T., Wang, T., and Jearanaiwongkul, W. (2024).
An automated medical rdf knowledge graph construc-
tion from text using in-context learning. In 2024 16th
International Conference on Knowledge and System
Engineering (KSE), pages 465–471. IEEE.
Regino, A. and dos Reis, J. C. (2025). Can llms be knowl-
edge graph curators for validating triple insertions?
In Genet Asefa Gesese, Harald Sack, H. P. A. M.-P.
and Chen, L., editors, Proceedings of the Workshop
on Generative AI and Knowledge Graphs (GenAIK)
co-located with the 31st International Conference on
Computational Linguistics (COLING 2025), pages
87–99, Abu Dhabi, UAE. International Committee on
Computational Linguistics.
Regino, A. G., Caus, R. O., Hochgreb, V., and dos Reis,
J. C. (2023). From natural language texts to rdf triples:
A novel approach to generating e-commerce knowl-
edge graphs. In Coenen, F., Fred, A., Aveiro, D.,
Dietz, J., Bernardino, J., Masciari, E., and Filipe,
J., editors, Knowledge Discovery, Knowledge Engi-
neering and Knowledge Management, pages 149–174.
Springer Nature Switzerland.
So, C. C., Sun, Y., Wang, J.-M., Yung, S. P., Loh, A.
W. K., and Chau, C. P. (2025). Are large language
models capable of deep relational reasoning? insights
from deepseek-r1 and benchmark comparisons. arXiv
preprint arXiv:2506.23128.
Spearman, C. (1987). The proof and measurement of asso-
ciation between two things. The American journal of
psychology, 100(3/4):441–471.
Touvron, H., Lavril, T., Izacard, G., Martinet, X., Lachaux,
M.-A., Lacroix, T., Rozi
`
ere, B., Goyal, N., Hambro,
E., Azhar, F., et al. (2023). Llama: Open and ef-
ficient foundation language models. arXiv preprint
arXiv:2302.13971.
Zhang, H., Yu, P. S., and Zhang, J. (2025). A systematic sur-
vey of text summarization: From statistical methods
to large language models. ACM Computing Surveys,
57(11):1–41.
Zheng, L., Chiang, W.-L., Sheng, Y., Zhuang, S., Wu, Z.,
Zhuang, Y., Lin, Z., Li, Z., Li, D., Xing, E., et al.
(2023). Judging llm-as-a-judge with mt-bench and
chatbot arena. Advances in Neural Information Pro-
cessing Systems, 36:46595–46623.
Zhuang, Y., Yu, Y., Wang, K., Sun, H., and Zhang, C.
(2023). Toolqa: A dataset for llm question answering
with external tools. Advances in Neural Information
Processing Systems, 36:50117–50143.
Leveraging Large Language Models for Semantic Evaluation of RDF Triples
85