Authors:
Nacir Bouali
;
Marcus Gerhold
;
Tosif Ul Rehman
and
Faizan Ahmed
Affiliation:
Department of Computer Science, University of Twente, The Netherlands
Keyword(s):
AI-Assisted Grading, Autograding, Large Language Models, GPT, Llama, Claude, UML.
Abstract:
This paper investigates the feasibility of using Large Language Models (LLMs) to automate the grading of Unified Modeling Language (UML) class diagrams in a software design course. Our method involves carefully designing case studies with constraints that guide students’ design choices, converting visual diagrams to textual descriptions, and leveraging LLMs’ natural language processing capabilities to evaluate submissions. We evaluated our approach using 92 student submissions, comparing grades assigned by three teaching assistants with those generated by three LLMs (Llama, GPT o1-mini, and Claude). Our results show that GPT o1-mini and Claude Sonnet achieved strong alignment with human graders, reaching correlation coefficients above 0.76 and Mean Absolute Errors below 4 points on a 40-point scale. The findings suggest that LLM-based grading can provide consistent, scalable assessment of UML diagrams while matching the grading quality of human assessors. This approach offers a promi
sing solution for managing growing student numbers while ensuring fair and timely feedback.
(More)