Papers Papers/2022 Papers Papers/2022



Paper Unlock

Authors: Xu Wang ; Frank van Harmelen and Zhisheng Huang

Affiliation: Vrije University Amsterdam, De Boelelaan 1105, 1081 HV Amsterdam, The Netherlands

Keyword(s): Dataset Recommendation, Scientific Datasets.

Abstract: Dataset search is a special application of information retrieval, which aims to help scientists with finding the datasets they want. Current dataset search engines are query-driven, which implies that the results are limited by the ability of the user to formulate the appropriate query. In this paper we aim to solve this limitation by framing dataset search as a recommendation task: given a dataset by the user, the search engine recommends similar datasets. We solve this dataset recommendation task using a similarity approach. We provide a simple benchmark task to evaluate different approaches for this dataset recommendation task. We also evaluate the recommendation task with several similarity approaches in the biomedical domain. We benchmark 8 different similarity metrics between datasets, including both ontology-based techniques and techniques from machine learning. Our results show that the task of recommending scientific datasets based on meta-data as it occurs in realistic data set collections is a hard task. None of the ontology-based methods manage to perform well on this task, and are outscored by the majority of the machine-learning methods. Of these ML methods only one of the approaches performs reasonably well, and even then only reaches 70% accuracy. (More)


Sign In Guest: Register as new SciTePress user now for free.

Sign In SciTePress user: please login.

PDF ImageMy Papers

You are not signed in, therefore limits apply to your IP address

In the current month:
Recent papers: 100 available of 100 total
2+ years older papers: 200 available of 200 total

Paper citation in several formats:
Wang, X.; van Harmelen, F. and Huang, Z. (2021). Biomedical Dataset Recommendation. In Proceedings of the 10th International Conference on Data Science, Technology and Applications - DATA; ISBN 978-989-758-521-0; ISSN 2184-285X, SciTePress, pages 192-199. DOI: 10.5220/0010521801920199

author={Xu Wang. and Frank {van Harmelen}. and Zhisheng Huang.},
title={Biomedical Dataset Recommendation},
booktitle={Proceedings of the 10th International Conference on Data Science, Technology and Applications - DATA},


JO - Proceedings of the 10th International Conference on Data Science, Technology and Applications - DATA
TI - Biomedical Dataset Recommendation
SN - 978-989-758-521-0
IS - 2184-285X
AU - Wang, X.
AU - van Harmelen, F.
AU - Huang, Z.
PY - 2021
SP - 192
EP - 199
DO - 10.5220/0010521801920199
PB - SciTePress