loading
Papers Papers/2022 Papers Papers/2022

Research.Publish.Connect.

Paper

Authors: Vinícius Di Oliveira 1 ; 2 ; Pedro Brom 2 ; 3 and Li Weigang 2

Affiliations: 1 Secretary of Economy, Brasilia, Federal District, Brazil ; 2 TransLab, University of Brasilia, Brasilia, Federal District, Brazil ; 3 Federal Institute of Brasilia, Brasilia, Federal District, Brazil

Keyword(s): Large Language Models, Statistical Evaluation, Linear Mixed Models, Bootstrap Resampling, Variance Decomposition, Retrieval-Augmented Generation, LLM Evaluation.

Abstract: Large Language Models (LLMs) have advanced natural language processing across diverse applications, yet their evaluation remains methodologically limited. Standard metrics such as accuracy or BLEU offer aggregate performance snapshots but fail to capture the inherent variability of LLM outputs under prompt changes and decoding parameters like temperature and top-p. This limitation is particularly critical in high-stakes domains, such as legal, fiscal, or healthcare contexts, where output consistency and interpretability are essential. To address this gap, we propose IMMBA: Integrated Mixed Models with Bootstrap Analysis, a statistically principled framework for robust LLM evaluation. IMMBA combines Linear Mixed Models (LMMs) with bootstrap resampling to decompose output variability into fixed effects (e.g., retrieval method, decoding configuration) and random effects (e.g., prompt phrasing), while improving estimation reliability under relaxed distributional assumptions. We validate IMMBA in a Retrieval-Augmented Generation (RAG) scenario involving structured commodity classification under the Mercosur Common Nomenclature (NCM). Our results demonstrate that IMMBA isolates meaningful performance factors and detects significant interaction effects across configurations. By integrating hierarchical modelling and resampling-based inference, IMMBA offers a reproducible and scalable foundation for evaluating LLMs in sensitive, variance-prone settings. (More)

CC BY-NC-ND 4.0

Sign In Guest: Register as new SciTePress user now for free.

Sign In SciTePress user: please login.

PDF ImageMy Papers

You are not signed in, therefore limits apply to your IP address 216.73.216.106

In the current month:
Recent papers: 100 available of 100 total
2+ years older papers: 200 available of 200 total

Paper citation in several formats:
Di Oliveira, V., Brom, P. and Weigang, L. (2025). IMMBA: Integrated Mixed Models with Bootstrap Analysis - A Statistical Framework for Robust LLM Evaluation. In Proceedings of the 21st International Conference on Web Information Systems and Technologies - WEBIST; ISBN 978-989-758-772-6; ISSN 2184-3252, SciTePress, pages 92-102. DOI: 10.5220/0013819400003985

@conference{webist25,
author={Vinícius {Di Oliveira} and Pedro Brom and Li Weigang},
title={IMMBA: Integrated Mixed Models with Bootstrap Analysis - A Statistical Framework for Robust LLM Evaluation},
booktitle={Proceedings of the 21st International Conference on Web Information Systems and Technologies - WEBIST},
year={2025},
pages={92-102},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0013819400003985},
isbn={978-989-758-772-6},
issn={2184-3252},
}

TY - CONF

JO - Proceedings of the 21st International Conference on Web Information Systems and Technologies - WEBIST
TI - IMMBA: Integrated Mixed Models with Bootstrap Analysis - A Statistical Framework for Robust LLM Evaluation
SN - 978-989-758-772-6
IS - 2184-3252
AU - Di Oliveira, V.
AU - Brom, P.
AU - Weigang, L.
PY - 2025
SP - 92
EP - 102
DO - 10.5220/0013819400003985
PB - SciTePress