loading
Papers Papers/2022 Papers Papers/2022

Research.Publish.Connect.

Paper

Authors: Andrea Riquelme-García ; Juan Mulero-Hernández and Jesualdo Fernández-Breis

Affiliation: Departamento de Informática y Sistemas, Universidad de Murcia, CEIR Campus Mare Nostrum, IMIB-Pascual Parrilla, Murcia, 30100, Spain

Keyword(s): Large Language Models, Ontologies, Data Interoperability, Bioinformatics.

Abstract: Integrating biological data remains a significant challenge due to heterogeneous sources, inconsistent formats, and the evolving landscape of biomedical ontologies. Standardized annotation of biological entities with ontology terms is crucial for interoperability and machine-readability in line with FAIR principles. This study compares three approaches for automatic ontology-based annotation of biomedical labels: a base GPT-4o-mini model, a fine-tuned variant of the same model, and a Retrieval-Augmented Generation (RAG) approach. The aim is to assess whether RAG can serve as a cost-effective alternative to fine-tuning for semantic annotation tasks. The evaluation focuses on annotating cell lines, cell types, and anatomical structures using four ontologies: CLO, CL, BTO, and UBERON. The performance was measured using precision, recall, F1-score, and error analysis. The results indicate that RAG performs best when label phrasing aligns closely with external sources, achieving high prec ision particularly with CLO (cell lines) and UBERON/BTO (anatomical structures). The fine-tuned model performs better in cases requiring semantic inference, notably for CL and UBERON, but struggles with lexically diverse inputs. The base model consistently underperforms. These findings suggest that RAG is a promising and cost-effective alternative to fine-tuning. Future work will investigate ontology-aware retrieval using embeddings. (More)

CC BY-NC-ND 4.0

Sign In Guest: Register as new SciTePress user now for free.

Sign In SciTePress user: please login.

PDF ImageMy Papers

You are not signed in, therefore limits apply to your IP address 216.73.216.186

In the current month:
Recent papers: 100 available of 100 total
2+ years older papers: 200 available of 200 total

Paper citation in several formats:
Riquelme-García, A., Mulero-Hernández, J. and Fernández-Breis, J. (2025). Integrating Retrieval-Augmented Generation with the BioPortal Annotator for Biological Sample Annotation. In Proceedings of the 17th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - KEOD; ISBN 978-989-758-769-6; ISSN 2184-3228, SciTePress, pages 128-135. DOI: 10.5220/0013740700004000

@conference{keod25,
author={Andrea Riquelme{-}García and Juan Mulero{-}Hernández and Jesualdo Fernández{-}Breis},
title={Integrating Retrieval-Augmented Generation with the BioPortal Annotator for Biological Sample Annotation},
booktitle={Proceedings of the 17th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - KEOD},
year={2025},
pages={128-135},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0013740700004000},
isbn={978-989-758-769-6},
issn={2184-3228},
}

TY - CONF

JO - Proceedings of the 17th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - KEOD
TI - Integrating Retrieval-Augmented Generation with the BioPortal Annotator for Biological Sample Annotation
SN - 978-989-758-769-6
IS - 2184-3228
AU - Riquelme-García, A.
AU - Mulero-Hernández, J.
AU - Fernández-Breis, J.
PY - 2025
SP - 128
EP - 135
DO - 10.5220/0013740700004000
PB - SciTePress