Efficient Representation of Biochemical Structures for Supervised and Unsupervised Machine Learning Models Using Multi-Sensoric Embeddings

Katrin Bohnsack, Alexander Engelsberger, Marika Kaden, Thomas Villmann

2023

Abstract

We present an approach to efficiently embed complex data objects from the chem- and bioinformatics domain like graph structures into Euclidean vector spaces such that those data bases can be handled by machine learning models. The method is denoted as sensoric response principle (SRP). It uses a small subset of objects serving as so-called sensors. Only for these sensors, the computationally demanding dissimilarity calculations, e.g. graph kernel computations, have to be executed and the resulting response values are used to generate the object embedding into an Euclidean representation space. Thus, the SRP avoids to calculate all object dissimilarities for embedding, which usually is computationally costly due to the complex proximity measures in use. Particularly, we consider strategies to determine the number of sensors for an appropriate embedding as well as selection strategies for SRP. Finally, the quality of the embedding is evaluated w.r.t. to the preservation of the original object relations in the embedding space. The SRP can be used for unsupervised and supervised machine learning. We demonstrate the ability of the approach for classification learning in context of an interpretable machine learning classifier.

Download


Paper Citation


in Harvard Style

Bohnsack K., Engelsberger A., Kaden M. and Villmann T. (2023). Efficient Representation of Biochemical Structures for Supervised and Unsupervised Machine Learning Models Using Multi-Sensoric Embeddings. In Proceedings of the 16th International Joint Conference on Biomedical Engineering Systems and Technologies (BIOSTEC 2023) - Volume 3: BIOINFORMATICS; ISBN 978-989-758-631-6, SciTePress, pages 59-69. DOI: 10.5220/0011644000003414


in Bibtex Style

@conference{bioinformatics23,
author={Katrin Bohnsack and Alexander Engelsberger and Marika Kaden and Thomas Villmann},
title={Efficient Representation of Biochemical Structures for Supervised and Unsupervised Machine Learning Models Using Multi-Sensoric Embeddings},
booktitle={Proceedings of the 16th International Joint Conference on Biomedical Engineering Systems and Technologies (BIOSTEC 2023) - Volume 3: BIOINFORMATICS},
year={2023},
pages={59-69},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0011644000003414},
isbn={978-989-758-631-6},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 16th International Joint Conference on Biomedical Engineering Systems and Technologies (BIOSTEC 2023) - Volume 3: BIOINFORMATICS
TI - Efficient Representation of Biochemical Structures for Supervised and Unsupervised Machine Learning Models Using Multi-Sensoric Embeddings
SN - 978-989-758-631-6
AU - Bohnsack K.
AU - Engelsberger A.
AU - Kaden M.
AU - Villmann T.
PY - 2023
SP - 59
EP - 69
DO - 10.5220/0011644000003414
PB - SciTePress