Beyond Parameter Counts: Benchmarking Similar-Sized Large Language Models for Next-Item Recommendation

Kavach Dheer; Peter Corcoran; Josephine Griffith

doi:10.5220/0013736800004000

Beyond Parameter Counts: Benchmarking Similar-Sized Large Language Models for Next-Item Recommendation

Kavach Dheer, Peter Corcoran, Josephine Griffith

2025

Abstract

Large language models (LLMs) are rapidly being integrated into recommender systems. New LLMs are released frequently, offering numerous architectures that share identical parameter sizes within their class, giving practitioners many options to choose from. While existing benchmarks evaluate LLM-powered recommender systems on various tasks, none have examined how same-sized LLMs perform under identical experimental conditions as a recommender system. Additionally, these benchmarks do not verify whether the evaluation datasets were part of the LLMs pre-training data. This research evaluates five open-source 7–8B parameter models (Gemma, Deepseek, Qwen, Llama-3.1, and Mistral) using a fixed A-LLMRec architecture for next-item prediction using the Amazon Luxury-Beauty Dataset. We measure top-1 accuracy (Hit@1) and evaluate dataset leakage through reference-model membership-inference attacks to ensure no model gains advantages from pre-training exposure. Although all models show negligible dataset leakage rates $(<0.2\%)$, Hit@1 varies dramatically across 20 percentage points, from 44\% for Gemma to 64\% for Mistral, despite identical parameter counts and evaluation conditions. These findings demonstrate that selecting among the most appropriate LLMs is a crucial design decision in LLM-based recommender systems.

Download

Paper Citation

in Harvard Style

Dheer K., Corcoran P. and Griffith J. (2025). Beyond Parameter Counts: Benchmarking Similar-Sized Large Language Models for Next-Item Recommendation. In Proceedings of the 17th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - Volume 1: KDIR; ISBN , SciTePress, pages 364-371. DOI: 10.5220/0013736800004000

in Bibtex Style

@conference{kdir25,
author={Kavach Dheer and Peter Corcoran and Josephine Griffith},
title={Beyond Parameter Counts: Benchmarking Similar-Sized Large Language Models for Next-Item Recommendation},
booktitle={Proceedings of the 17th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - Volume 1: KDIR},
year={2025},
pages={364-371},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0013736800004000},
isbn={},
}

in EndNote Style

TY - CONF

JO - Proceedings of the 17th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - Volume 1: KDIR
TI - Beyond Parameter Counts: Benchmarking Similar-Sized Large Language Models for Next-Item Recommendation
SN -
AU - Dheer K.
AU - Corcoran P.
AU - Griffith J.
PY - 2025
SP - 364
EP - 371
DO - 10.5220/0013736800004000
PB - SciTePress