Integrating Retrieval-Augmented Generation with the BioPortal Annotator for Biological Sample Annotation

Andrea Riquelme-García, Juan Mulero-Hernández, Jesualdo Fernández-Breis

2025

Abstract

Integrating biological data remains a significant challenge due to heterogeneous sources, inconsistent formats, and the evolving landscape of biomedical ontologies. Standardized annotation of biological entities with ontology terms is crucial for interoperability and machine-readability in line with FAIR principles. This study compares three approaches for automatic ontology-based annotation of biomedical labels: a base GPT-4o-mini model, a fine-tuned variant of the same model, and a Retrieval-Augmented Generation (RAG) approach. The aim is to assess whether RAG can serve as a cost-effective alternative to fine-tuning for semantic annotation tasks. The evaluation focuses on annotating cell lines, cell types, and anatomical structures using four ontologies: CLO, CL, BTO, and UBERON. The performance was measured using precision, recall, F1-score, and error analysis. The results indicate that RAG performs best when label phrasing aligns closely with external sources, achieving high precision particularly with CLO (cell lines) and UBERON/BTO (anatomical structures). The fine-tuned model performs better in cases requiring semantic inference, notably for CL and UBERON, but struggles with lexically diverse inputs. The base model consistently underperforms. These findings suggest that RAG is a promising and cost-effective alternative to fine-tuning. Future work will investigate ontology-aware retrieval using embeddings.

Download


Paper Citation


in Harvard Style

Riquelme-García A., Mulero-Hernández J. and Fernández-Breis J. (2025). Integrating Retrieval-Augmented Generation with the BioPortal Annotator for Biological Sample Annotation. In Proceedings of the 17th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - Volume 2: KEOD; ISBN 978-989-758-769-6, SciTePress, pages 128-135. DOI: 10.5220/0013740700004000


in Bibtex Style

@conference{keod25,
author={Andrea Riquelme-García and Juan Mulero-Hernández and Jesualdo Fernández-Breis},
title={Integrating Retrieval-Augmented Generation with the BioPortal Annotator for Biological Sample Annotation},
booktitle={Proceedings of the 17th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - Volume 2: KEOD},
year={2025},
pages={128-135},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0013740700004000},
isbn={978-989-758-769-6},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 17th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - Volume 2: KEOD
TI - Integrating Retrieval-Augmented Generation with the BioPortal Annotator for Biological Sample Annotation
SN - 978-989-758-769-6
AU - Riquelme-García A.
AU - Mulero-Hernández J.
AU - Fernández-Breis J.
PY - 2025
SP - 128
EP - 135
DO - 10.5220/0013740700004000
PB - SciTePress