Evaluation of LLM-Based Strategies for the Extraction of Food Product Information from Online Shops

Christoph Brosch; Sian Brumm; Rolf Krieger; Jonas Scheffler

Research.Publish.Connect.

*Please fill out at least one Field. *Value must be an number!

Title:
ISBN:
Year:
Acronym:
Subject:

Advanced Search Proceedings Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Title:
Author:
Affiliation:
Subject:

Advanced Search Papers Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Name:
Affiliation:
Country:
Conference:
Subject:

Advanced Search Authors Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Name:
Country:
Subject:

Advanced Search Affiliations Search

If you're looking for an exact phrase use quotation marks on text fields.

Proceedings

Proceedings Search *Please fill out at least one Field. *Value must be an number!

Title:
ISBN:
Year:
Acronym:
Subject:

Advanced Search Proceedings Search

If you're looking for an exact phrase use quotation marks on text fields.

Papers

Papers Search *Please fill out at least one Field.

Title:
Author:
Affiliation:
Subject:

Advanced Search Papers Search

If you're looking for an exact phrase use quotation marks on text fields.

Authors

Authors Search *Please fill out at least one Field.

Name:
Affiliation:
Country:
Conference:
Subject:

Advanced Search Authors Search

If you're looking for an exact phrase use quotation marks on text fields.

Advanced Search

Paper

Evaluation of LLM-Based Strategies for the Extraction of Food Product Information from Online Shops

Topics: Data in Finance, Healthcare, Retail & E-commerce, Manufacturing, Marketing; Data Pipelines & ETL; Natural Language Processing; Open Data and Linked Data; Pattern Recognition

In Proceedings of the 14th International Conference on Data Science, Technology and Applications - Volume 1: DATA, 709-715, 2025 , Bilbao, Spain

Authors: Christoph Brosch ; Sian Brumm ; Rolf Krieger and Jonas Scheffler

Affiliation: Institute for Software Systems, University of Applied Sciences Trier, Birkenfeld, Germany

Keyword(s): Web Scraping, Information Extraction, Large Language Models, Schema-Constrained Output, Product Pages, Pydantic Models, Code Generation.

Abstract: Generative AI and large language models (LLMs) offer significant potential for automating the extraction of structured information from web pages. In this work, we focus on food product pages from online retailers and explore schema-constrained extraction approaches to retrieve key product attributes, such as ingredient lists and nutrition tables. We compare two LLM-based approaches, direct extraction and indirect extraction via generated functions, evaluating them in terms of accuracy, efficiency, and cost on a curated dataset of 3,000 food product pages from three different online shops. Our results show that although the indirect approach achieves slightly lower accuracy (96.48%, −2.27% compared to direct extraction), it reduces the number of required LLM calls by 95.82%, leading to substantial efficiency gains and lower operational costs. These findings suggest that indirect extraction approaches can provide scalable and cost-effective solutions for large-scale information extrac tion tasks from template-based web pages using LLMs. (More)

CC BY-NC-ND 4.0

Guest: Register as new SciTePress user now for free.

SciTePress user: please login.

My Papers

You are not signed in, therefore limits apply to your IP address 216.73.216.40

In the current month:

Recent papers: 100 available of 100 total

2⁺ years older papers: 200 available of 200 total

Paper citation in several formats:

Brosch, C., Brumm, S., Krieger, R. and Scheffler, J. (2025). Evaluation of LLM-Based Strategies for the Extraction of Food Product Information from Online Shops. In Proceedings of the 14th International Conference on Data Science, Technology and Applications - Volume 1: DATA; ISBN 978-989-758-758-0; ISSN 2184-285X, SciTePress, pages 709-715. DOI: 10.5220/0013647300003967

@conference{data25,
author={Christoph Brosch and Sian Brumm and Rolf Krieger and Jonas Scheffler},
title={Evaluation of LLM-Based Strategies for the Extraction of Food Product Information from Online Shops},
booktitle={Proceedings of the 14th International Conference on Data Science, Technology and Applications - Volume 1: DATA},
year={2025},
pages={709-715},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0013647300003967},
isbn={978-989-758-758-0},
issn={2184-285X},
}

TY - CONF

JO - Proceedings of the 14th International Conference on Data Science, Technology and Applications - Volume 1: DATA
TI - Evaluation of LLM-Based Strategies for the Extraction of Food Product Information from Online Shops
SN - 978-989-758-758-0
IS - 2184-285X
AU - Brosch, C.
AU - Brumm, S.
AU - Krieger, R.
AU - Scheffler, J.
PY - 2025
SP - 709
EP - 715
DO - 10.5220/0013647300003967
PB - SciTePress