Authors:
André Regino
1
;
Fernando Rezende Zagatti
1
;
2
;
Rodrigo Bonacin
1
;
Victor Jesus Sotelo Chico
3
;
Victor Hochgreb
3
and
Julio Reis
4
Affiliations:
1
Center for Information Technology Renato Archer, Campinas, São Paulo, Brazil
;
2
Department of Computing, UFScar, São Carlos, Brazil
;
3
GoBots, Campinas, São Paulo, Brazil
;
4
Institute of Computing, University of Campinas, Campinas, São Paulo, Brazil
Keyword(s):
LLM as a Judge, RDF Triple Generation, RDF Triple Validation.
Abstract:
Knowledge Graphs (KGs) depend on accurate RDF triples, making the quality assurance of these triples a significant challenge. Large Language Models (LLMs) can serve as graders for RDF data, providing scalable alternatives to human validation. This study evaluates the feasibility of utilizing LLMs to assess the quality of RDF triples derived from natural language sentences in the e-commerce sector. We analyze 12 LLM configurations by comparing their Likert-scale ratings of triple quality with human evaluations, focusing on both complete triples and their individual components (subject, predicate, object). We employ statistical correlation measures (Spearman and Kendall Tau) to quantify the alignment between LLM and expert assessments. Our study examines whether justifications generated by LLMs can indicate higher-quality grading. Our findings reveal that some LLMs demonstrate moderate agreement with human annotators and none achieve full alignment. This study presents a replicable eva
luation framework and emphasizes the current limitations and potential of LLMs as semantic validators. These results support efforts to incorporate LLM-based validation into KG construction processes and suggest avenues for prompt engineering and hybrid human-AI validation systems.
(More)