Biomedical Dataset Recommendation

Xu Wang, Frank van Harmelen, Zhisheng Huang

2021

Abstract

Dataset search is a special application of information retrieval, which aims to help scientists with finding the datasets they want. Current dataset search engines are query-driven, which implies that the results are limited by the ability of the user to formulate the appropriate query. In this paper we aim to solve this limitation by framing dataset search as a recommendation task: given a dataset by the user, the search engine recommends similar datasets. We solve this dataset recommendation task using a similarity approach. We provide a simple benchmark task to evaluate different approaches for this dataset recommendation task. We also evaluate the recommendation task with several similarity approaches in the biomedical domain. We benchmark 8 different similarity metrics between datasets, including both ontology-based techniques and techniques from machine learning. Our results show that the task of recommending scientific datasets based on meta-data as it occurs in realistic dataset collections is a hard task. None of the ontology-based methods manage to perform well on this task, and are outscored by the majority of the machine-learning methods. Of these ML methods only one of the approaches performs reasonably well, and even then only reaches 70% accuracy.

Download


Paper Citation


in Harvard Style

Wang X., van Harmelen F. and Huang Z. (2021). Biomedical Dataset Recommendation. In Proceedings of the 10th International Conference on Data Science, Technology and Applications - Volume 1: DATA, ISBN 978-989-758-521-0, pages 192-199. DOI: 10.5220/0010521801920199


in Bibtex Style

@conference{data21,
author={Xu Wang and Frank van Harmelen and Zhisheng Huang},
title={Biomedical Dataset Recommendation},
booktitle={Proceedings of the 10th International Conference on Data Science, Technology and Applications - Volume 1: DATA,},
year={2021},
pages={192-199},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0010521801920199},
isbn={978-989-758-521-0},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 10th International Conference on Data Science, Technology and Applications - Volume 1: DATA,
TI - Biomedical Dataset Recommendation
SN - 978-989-758-521-0
AU - Wang X.
AU - van Harmelen F.
AU - Huang Z.
PY - 2021
SP - 192
EP - 199
DO - 10.5220/0010521801920199