
6 CONCLUSIONS
We presented a novel local search method for read-
ability assessment, leveraging fine-tuned models over
a selected PLM for various tasks. Our experiments
demonstrated that the proposed local search method
significantly enhances ARA accuracy over the lead-
ing method. Investigations for further improvements
of accuracy can be carried out along the following
lines: (1) Construct a dataset that is larger and more
balanced than CLDL. Specifically, for each genre, we
aim to collect a sufficient number of written works
that are evenly distributed across all grade levels. This
will eliminate the need to partition the dataset by sim-
ilar genres and enable fairer comparisons between
genre-agnostic and genre-aware grade assessment and
readability evaluation methods. (2) Explore alterna-
tive black-box LLMs with improved fine-tuning ca-
pabilities to enhance the accuracy of various tasks.
(3) Investigate white-box LLMs, such as the LLaMA
models, to optimize fine-tuning for specific tasks.
REFERENCES
Collins-Thompson, K. (2014). Computational assessment
of text readability: A survey of current and future re-
search running title: Computational assessment of text
readability.
Deutsch, T., Jasbi, M., and Shieber, S. (2020). Linguis-
tic features for readability assessment. In Burstein,
J., Kochmar, E., Leacock, C., Madnani, N., Pil
´
an, I.,
Yannakoudakis, H., and Zesch, T., editors, Proceed-
ings of the Fifteenth Workshop on Innovative Use of
NLP for Building Educational Applications, pages 1–
17.
Engelmann, B., Kreutz, C. K., Haak, F., and Schaer, P.
(2024). Arts: Assessing readability and text simplic-
ity. In Proceedings of EMNLP.
Feng, L., Elhadad, N., and Huenerfauth, M. (2009). Cog-
nitively motivated features for readability assessment.
In Lascarides, A., Gardent, C., and Nivre, J., editors,
Proceedings of the 12th Conference of the European
Chapter of the ACL (EACL 2009), pages 229–237.
Filighera, A., Steuer, T., and Rensing, C. (2019). Auto-
matic Text Difficulty Estimation Using Embeddings
and Neural Networks, pages 335–348.
Gunning, R. (1969). The fog index after twenty years. Jour-
nal of Business Communication, 6:13 – 3.
Hale, J. (2016). Information-theoretical complexity metrics.
Lang. Linguistics Compass, 10:397–412.
Heilman, M., Collins-Thompson, K., and Eskenazi, M.
(2008). An analysis of statistical models and features
for reading difficulty prediction. Proceedings of the
Third Workshop on Innovative Use of NLP for Build-
ing Educational Applications.
Holtgraves, T. (1999). Comprehending indirect replies:
When and how are their conveyed meanings acti-
vated? Journal of Memory and Language, 41(4):519–
540.
Jeanne Sternlicht Chall, E. D. (1995). Readability Re-
visited: The New Dale-Chall Readability Formula.
Brookline Books.
Lee, B. W., Jang, Y. S., and Lee, J. (2021). Pushing on text
readability assessment: A transformer meets hand-
crafted linguistic features. In Moens, M.-F., Huang,
X., Specia, L., and Yih, S. W.-t., editors, Proceedings
of the 2021 Conference on Empirical Methods in Nat-
ural Language Processing, pages 10669–10686. Re-
public.
Lee, B. W. and Lee, J. (2020). Lxper index 2.0: Improving
text readability assessment for l2 English learners in
South Korea.
Lu, X. (2010). Automatic analysis of syntactic complexity
in second language writing. International Journal of
Corpus Linguistics, 15:474–496.
Peabody, M. A. and Schaefer, C. (2016). Towards semantic
clarity in play therapy. International Journal of Play
Therapy, 25:197–202.
Schwarm, S. and Ostendorf, M. (2005). Reading level as-
sessment using support vector machines and statisti-
cal language models. In Knight, K., Ng, H. T., and
Oflazer, K., editors, Proceedings of the 43rd Annual
Meeting of the Association for Computational Lin-
guistics (ACL’05), pages 523–530.
Tonelli, S., Tran Manh, K., and Pianta, E. (2012). Mak-
ing readability indices readable. In Williams, S., Sid-
dharthan, A., and Nenkova, A., editors, Proceedings
of the First Workshop on Predicting and Improving
Text Readability for target reader populations, pages
40–48. Canada.
Trott, S. and Rivi
`
ere, P. (2024). Measuring and modifying
the readability of English texts with GPT-4. In Shard-
low, M., Saggion, H., Alva-Manchego, F., Zampieri,
M., North, K.,
ˇ
Stajner, S., and Stodden, R., editors,
Proceedings of the Third Workshop on Text Simpli-
fication, Accessibility and Readability (TSAR 2024),
pages 126–134. Linguistics.
Vajjala, S. and Meurers, D. (2012). On improving the ac-
curacy of readability classification using insights from
second language acquisition. pages 163—-173.
Wang, J. (2025). Ai-oracle machines for intelligent com-
puting. AI Matters, 11:8–11.
Xia, M., Kochmar, E., and Briscoe, T. (2016). Text
readability assessment for second language learn-
ers. In Tetreault, J., Burstein, J., Leacock, C., and
Yannakoudakis, H., editors, Proceedings of the 11th
Workshop on Innovative Use of NLP for Building Ed-
ucational Applications, pages 12–22.
Assessing Grade Levels of Texts via Local Search over Fine-Tuned LLMs
231