
tion from labels, our work was limited in its effec-
tiveness in retrieving appropriate identifiers from cer-
tain ontologies. We propose exploring alternative ap-
proaches based on the RAG framework, in particular,
replacing the BioPortal Annotator with a method that
directly interacts with the ontology structure, such
as leveraging ontology graphs or embeddings, which
may offer a more effective and flexible solution for
identifying relevant terms.
5 CONCLUSIONS
This study presented a comparative evaluation of
three methods: base GPT-4o-mini, a fine-tuned ver-
sion of the same model, and a RAG-based approach,
for the automatic annotation of biomedical labels
using four widely adopted ontologies. The results
demonstrate that the effectiveness of each method
varies depending on the ontology and the nature of the
labels. The fine-tuned model demonstrates strong per-
formance when domain-specific training supports se-
mantic inference, particularly for CL and UBERON.
Conversely, the RAG approach proves more effective
in contexts where label phrasing closely corresponds
to external knowledge sources, as observed with CLO
and BTO in relation to cell lines, and with UBERON
and BTO in the case of anatomical structures. The
limitations of using tools like BioPortal, which lack
semantic inference capabilities, highlight the need for
more flexible and ontology-aware approaches for the
RAG method. Future work will focus on improving
the integration of ontological knowledge within RAG
frameworks to enhance accuracy and generalizability
of automated annotation.
ACKNOWLEDGEMENTS
This research has been funded by MI-
CIU/AEI/10.13039/501100011033/ [grant numbers
PID2020-113723RB-C22, PID2024-155257OB-I00].
REFERENCES
Bernab
´
e, C. H., Queralt-Rosinach, N., Silva Souza, V. E.,
Bonino da Silva Santos, L. O., Mons, B., Jacobsen,
A., and Roos, M. (2023). The use of foundational on-
tologies in biomedical research. Journal of Biomedi-
cal Semantics, 14(1):21.
Chaudhari, J. K., Pant, S., Jha, R., Pathak, R. K., and Singh,
D. B. (2024). Biological big-data sources, problems
of storage, computational issues, and applications: a
comprehensive review. Knowledge and Information
Systems, pages 1–51.
Gonc¸alves, R. S., Payne, J., Tan, A., Benitez, C., Haddock,
J., and Gentleman, R. (2024). The text2term tool to
map free-text descriptions of biomedical terms to on-
tologies. Database, 2024:baae119.
Jahan, I., Laskar, M. T. R., Peng, C., and Huang, J. X.
(2024). A comprehensive evaluation of large lan-
guage models on benchmark biomedical text pro-
cessing tasks. Computers in biology and medicine,
171:108189.
Jonquet, C., Shah, N. H., Youn, C. H., Musen, M. A.,
Callendar, C., and Storey, M.-A. (2009). Ncbo an-
notator: semantic annotation of biomedical data. In
ISWC 2009-8th International Semantic Web Confer-
ence, Poster and Demo Session.
Morris, J. H., Soman, K., Akbas, R. E., Zhou, X., Smith, B.,
Meng, E. C., Huang, C. C., Cerono, G., Schenk, G.,
Rizk-Jackson, A., Harroud, A., Sanders, L., Costes,
S. V., Bharat, K., Chakraborty, A., Pico, A. R.,
Mardirossian, T., Keiser, M., Tang, A., and Baranzini,
S. E. (2023). The scalable precision medicine open
knowledge engine (spoke): A massive knowledge
graph of biomedical information. Bioinformatics,
39(2):btad080.
Mulero-Hern
´
andez, J. and Fern
´
andez-Breis, J. T. (2022).
Analysis of the landscape of human enhancer se-
quences in biological databases. Computational and
Structural Biotechnology Journal, 20:2728–2744.
Mulero-Hern
´
andez, J., Mironov, V., Mi
˜
narro-Gim
´
enez,
J. A., Kuiper, M., and Fern
´
andez-Breis, J. T. (2024).
Integration of chromosome locations and functional
aspects of enhancers and topologically associating do-
mains in knowledge graphs enables versatile queries
about gene regulation. Nucleic Acids Research,
52(15):e69–e69.
Ng, K. K. Y., Matsuba, I., and Zhang, P. C. (2025). Rag in
health care: A novel framework for improving com-
munication and decision-making by addressing llm
limitations. NEJM AI, 2(1):AIra2400380.
Riquelme-Garc
´
ıa, A., Mulero-Hern
´
andez, J., and
Fern
´
andez-Breis, J. T. (2025). Annotation of bi-
ological samples data to standard ontologies with
support from large language models. Computational
and Structural Biotechnology Journal, 27:2155–2167.
Smith, B., Ashburner, M., Rosse, C., Bard, J., Bug, W.,
Ceusters, W., Goldberg, L. J., Eilbeck, K., Ireland, A.,
Mungall, C. J., et al. (2007). The obo foundry: coor-
dinated evolution of ontologies to support biomedical
data integration. Nature biotechnology, 25(11):1251–
1255.
Wilkinson, M. D., Dumontier, M., Aalbersberg, I. J., Apple-
ton, G., Axton, M., Baak, A., Blomberg, N., Boiten,
J.-W., da Silva Santos, L. B., Bourne, P. E., Bouwman,
J., Brookes, A. J., Clark, T., Crosas, M., Dillo, I., Du-
mon, O., Edmunds, S., Evelo, C. T., Finkers, R., and
Mons, B. (2016). The fair guiding principles for sci-
entific data management and stewardship. Scientific
Data, 3(1):160018.
Integrating Retrieval-Augmented Generation with the BioPortal Annotator for Biological Sample Annotation
135