IMMBA: Integrated Mixed Models with Bootstrap Analysis - A Statistical Framework for Robust LLM Evaluation

Vinícius Di Oliveira; Vinícius Di Oliveira; Pedro Brom; Pedro Brom; Li Weigang

Research.Publish.Connect.

*Please fill out at least one Field. *Value must be an number!

Title:
ISBN:
Year:
Acronym:
Subject:

Advanced Search Proceedings Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Title:
Author:
Affiliation:
Subject:

Advanced Search Papers Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Name:
Affiliation:
Country:
Conference:
Subject:

Advanced Search Authors Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Name:
Country:
Subject:

Advanced Search Affiliations Search

If you're looking for an exact phrase use quotation marks on text fields.

Proceedings

Proceedings Search *Please fill out at least one Field. *Value must be an number!

Title:
ISBN:
Year:
Acronym:
Subject:

Advanced Search Proceedings Search

If you're looking for an exact phrase use quotation marks on text fields.

Papers

Papers Search *Please fill out at least one Field.

Title:
Author:
Affiliation:
Subject:

Advanced Search Papers Search

If you're looking for an exact phrase use quotation marks on text fields.

Authors

Authors Search *Please fill out at least one Field.

Name:
Affiliation:
Country:
Conference:
Subject:

Advanced Search Authors Search

If you're looking for an exact phrase use quotation marks on text fields.

Advanced Search

Paper

IMMBA: Integrated Mixed Models with Bootstrap Analysis - A Statistical Framework for Robust LLM Evaluation

Topics: Application, Research project and Internet Technology; Generative AI Application Development and LLM Engineering; Natural Language Processing

In Proceedings of the 21st International Conference on Web Information Systems and Technologies WEBIST - Volume 1, 92-102, 2025 , Marbella, Spain

Authors: Vinícius Di Oliveira ^{1

;

2} ; Pedro Brom ^{2

;

3} and Li Weigang ²

Affiliations: ¹ Secretary of Economy, Brasilia, Federal District, Brazil ; ² TransLab, University of Brasilia, Brasilia, Federal District, Brazil ; ³ Federal Institute of Brasilia, Brasilia, Federal District, Brazil

Keyword(s): Large Language Models, Statistical Evaluation, Linear Mixed Models, Bootstrap Resampling, Variance Decomposition, Retrieval-Augmented Generation, LLM Evaluation.

Abstract: Large Language Models (LLMs) have advanced natural language processing across diverse applications, yet their evaluation remains methodologically limited. Standard metrics such as accuracy or BLEU offer aggregate performance snapshots but fail to capture the inherent variability of LLM outputs under prompt changes and decoding parameters like temperature and top-p. This limitation is particularly critical in high-stakes domains, such as legal, fiscal, or healthcare contexts, where output consistency and interpretability are essential. To address this gap, we propose IMMBA: Integrated Mixed Models with Bootstrap Analysis, a statistically principled framework for robust LLM evaluation. IMMBA combines Linear Mixed Models (LMMs) with bootstrap resampling to decompose output variability into fixed effects (e.g., retrieval method, decoding configuration) and random effects (e.g., prompt phrasing), while improving estimation reliability under relaxed distributional assumptions. We validate IMMBA in a Retrieval-Augmented Generation (RAG) scenario involving structured commodity classification under the Mercosur Common Nomenclature (NCM). Our results demonstrate that IMMBA isolates meaningful performance factors and detects significant interaction effects across configurations. By integrating hierarchical modelling and resampling-based inference, IMMBA offers a reproducible and scalable foundation for evaluating LLMs in sensitive, variance-prone settings. (More)

CC BY-NC-ND 4.0

Guest: Register as new SciTePress user now for free.

SciTePress user: please login.

My Papers

You are not signed in, therefore limits apply to your IP address 216.73.216.106

In the current month:

Recent papers: 100 available of 100 total

2⁺ years older papers: 200 available of 200 total

Paper citation in several formats:

Di Oliveira, V., Brom, P. and Weigang, L. (2025). IMMBA: Integrated Mixed Models with Bootstrap Analysis - A Statistical Framework for Robust LLM Evaluation. In Proceedings of the 21st International Conference on Web Information Systems and Technologies - WEBIST; ISBN 978-989-758-772-6; ISSN 2184-3252, SciTePress, pages 92-102. DOI: 10.5220/0013819400003985

@conference{webist25,
author={Vinícius {Di Oliveira} and Pedro Brom and Li Weigang},
title={IMMBA: Integrated Mixed Models with Bootstrap Analysis - A Statistical Framework for Robust LLM Evaluation},
booktitle={Proceedings of the 21st International Conference on Web Information Systems and Technologies - WEBIST},
year={2025},
pages={92-102},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0013819400003985},
isbn={978-989-758-772-6},
issn={2184-3252},
}

TY - CONF

JO - Proceedings of the 21st International Conference on Web Information Systems and Technologies - WEBIST
TI - IMMBA: Integrated Mixed Models with Bootstrap Analysis - A Statistical Framework for Robust LLM Evaluation
SN - 978-989-758-772-6
IS - 2184-3252
AU - Di Oliveira, V.
AU - Brom, P.
AU - Weigang, L.
PY - 2025
SP - 92
EP - 102
DO - 10.5220/0013819400003985
PB - SciTePress