Towards More Robust Transcription Factor Binding Site Classifiers Using Out-of-Distribution Data

István Megyeri, Gergely Pap, Gergely Pap, Gergely Pap

2025

Abstract

The use of deep learning methods for solving tasks in computational biology has increased in recent years. Many challenging problems are now addressed with novel architectures, training strategies and techniques involved in deep learning such as gene expression prediction, identifying splicing patterns, and DNA-protein binding site classification. Moreover, interpretability has become a key component of those methods used to solve computational biology tasks. Gaining a novel insight by analyzing the learners is a key factor. However, most deep learning models are hard to interpret, and they are prone to learn features which generalize poorly. In this study, we examine the robustness of high performing neural networks using in-distribution (ID) and out-of-distribution (OOD) examples. We demonstrate our findings in two different tasks taken from the domain of DNA-protein binding site classification and show that the overconfident and incorrect predictions are a result of the training data that has been built exclusively from ID samples. Adding OOD data to the training process enhances the reliability of the networks and it improves the performance on the ID tasks.

Download


Paper Citation


in Harvard Style

Megyeri I. and Pap G. (2025). Towards More Robust Transcription Factor Binding Site Classifiers Using Out-of-Distribution Data. In Proceedings of the 17th International Conference on Agents and Artificial Intelligence - Volume 3: ICAART; ISBN 978-989-758-737-5, SciTePress, pages 40-47. DOI: 10.5220/0013076600003890


in Bibtex Style

@conference{icaart25,
author={István Megyeri and Gergely Pap},
title={Towards More Robust Transcription Factor Binding Site Classifiers Using Out-of-Distribution Data},
booktitle={Proceedings of the 17th International Conference on Agents and Artificial Intelligence - Volume 3: ICAART},
year={2025},
pages={40-47},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0013076600003890},
isbn={978-989-758-737-5},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 17th International Conference on Agents and Artificial Intelligence - Volume 3: ICAART
TI - Towards More Robust Transcription Factor Binding Site Classifiers Using Out-of-Distribution Data
SN - 978-989-758-737-5
AU - Megyeri I.
AU - Pap G.
PY - 2025
SP - 40
EP - 47
DO - 10.5220/0013076600003890
PB - SciTePress