Integrating Retrieval-Augmented Generation with the BioPortal Annotator for Biological Sample Annotation
Andrea Riquelme-García, Juan Mulero-Hernández, Jesualdo Fernández-Breis
2025
Abstract
Integrating biological data remains a significant challenge due to heterogeneous sources, inconsistent formats, and the evolving landscape of biomedical ontologies. Standardized annotation of biological entities with ontology terms is crucial for interoperability and machine-readability in line with FAIR principles. This study compares three approaches for automatic ontology-based annotation of biomedical labels: a base GPT-4o-mini model, a fine-tuned variant of the same model, and a Retrieval-Augmented Generation (RAG) approach. The aim is to assess whether RAG can serve as a cost-effective alternative to fine-tuning for semantic annotation tasks. The evaluation focuses on annotating cell lines, cell types, and anatomical structures using four ontologies: CLO, CL, BTO, and UBERON. The performance was measured using precision, recall, F1-score, and error analysis. The results indicate that RAG performs best when label phrasing aligns closely with external sources, achieving high precision particularly with CLO (cell lines) and UBERON/BTO (anatomical structures). The fine-tuned model performs better in cases requiring semantic inference, notably for CL and UBERON, but struggles with lexically diverse inputs. The base model consistently underperforms. These findings suggest that RAG is a promising and cost-effective alternative to fine-tuning. Future work will investigate ontology-aware retrieval using embeddings.
DownloadPaper Citation
in Harvard Style
Riquelme-García A., Mulero-Hernández J. and Fernández-Breis J. (2025). Integrating Retrieval-Augmented Generation with the BioPortal Annotator for Biological Sample Annotation. In Proceedings of the 17th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - Volume 2: KEOD; ISBN 978-989-758-769-6, SciTePress, pages 128-135. DOI: 10.5220/0013740700004000
in Bibtex Style
@conference{keod25,
author={Andrea Riquelme-García and Juan Mulero-Hernández and Jesualdo Fernández-Breis},
title={Integrating Retrieval-Augmented Generation with the BioPortal Annotator for Biological Sample Annotation},
booktitle={Proceedings of the 17th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - Volume 2: KEOD},
year={2025},
pages={128-135},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0013740700004000},
isbn={978-989-758-769-6},
}
in EndNote Style
TY - CONF
JO - Proceedings of the 17th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - Volume 2: KEOD
TI - Integrating Retrieval-Augmented Generation with the BioPortal Annotator for Biological Sample Annotation
SN - 978-989-758-769-6
AU - Riquelme-García A.
AU - Mulero-Hernández J.
AU - Fernández-Breis J.
PY - 2025
SP - 128
EP - 135
DO - 10.5220/0013740700004000
PB - SciTePress