Authors:
Miguel Da Corte
1
;
2
and
Jorge Baptista
2
;
1
Affiliations:
1
University of Algarve, Faro, Portugal
;
2
INESC-ID Lisboa, Lisbon, Portugal
Keyword(s):
Developmental Education (DevEd), Automatic Writing Assessment Systems, English (L1) Writing Proficiency Assessment, Natural Language Processing (NLP), Machine-Learning (ML) Models.
Abstract:
This study investigates the adequacy of Machine Learning (ML)-based systems, specifically ACCUPLACER, compared to human rater classifications within U.S. Developmental Education. A corpus of 100 essays was assessed by human raters using 6 linguistic descriptors, with each essay receiving a skill-level classification. These classifications were compared to those automatically generated by ACCUPLACER. Disagreements among raters were analyzed and resolved, producing a gold standard used as a benchmark for modeling ACCUPLACER’S classification task. A comparison of skill levels assigned by ACCUPLACER and humans revealed a “weak” Pearson correlation (ρ = 0.22), indicating a significant misplacement rate and raising important pedagogical and institutional concerns. Several ML algorithms were tested to replicate ACCUPLACER’S classification approach. Using the Chi-square (χ2) method to rank the most predictive linguistic descriptors, Na¨ıve Bayes achieved 81.1% accuracy with the top-four rank
ed features. These findings emphasize the importance of refining descriptors and incorporating human input into the training of automated ML systems. Additionally, the gold standard developed for the 6 linguistic descriptors and overall skill levels can be used to (i) assess and classify students’ English (L1) writing proficiency more holistically and equitably; (ii) support future ML modeling tasks; and (iii) enhance both student outcomes and higher education efficiency.
(More)