loading
Papers Papers/2022 Papers Papers/2022

Research.Publish.Connect.

Paper

Paper Unlock

Authors: Xu Wang ; Frank Van Harmelen and Zhisheng Huang

Affiliation: Vrije University Amsterdam, De Boelelaan 1105, 1081 HV Amsterdam, The Netherlands

Keyword(s): Ontology Classification, Domain Classification, Semantic Similarity, Data Science, Google Distance.

Abstract: Scientific datasets are increasingly stored, published, and re-used online. This has prompted major search engines to start services dedicated to finding research datasets online. However, to date such services are limited to keyword search, and provide little or no semantic guidance. Determining the scientific domain for a given dataset is a crucial part in dataset recommendation and search: ”Which research domain does this dataset belong to?”. In this paper we investigate and compare a number of novel ontology-based methods to answer that question, using the distance between a domain-ontology and a dataset as an estimator for the domain(s) into which the dataset should be classified. We also define a simple keyword-based classifier based on the Normalized Google Distance, and we evaluate all classifiers on a hand-constructed gold standard. Our two main findings are that the seemingly simple task of determining the domain(s) of a dataset is surprisingly much harder than expected (ev en when performed under highly simplified circumstances), and that (again surprisingly), the use of ontologies seems to be of little help in this task, with the simple keyword-based classifier outperforming every ontology-based classifier. We constructed a gold-standard benchmark for our experiments which we make available online for others to use. (More)

CC BY-NC-ND 4.0

Sign In Guest: Register as new SciTePress user now for free.

Sign In SciTePress user: please login.

PDF ImageMy Papers

You are not signed in, therefore limits apply to your IP address 18.220.66.151

In the current month:
Recent papers: 100 available of 100 total
2+ years older papers: 200 available of 200 total

Paper citation in several formats:
Wang, X.; Van Harmelen, F. and Huang, Z. (2020). Ontology-based Methods for Classifying Scientific Datasets into Research Domains: Much Harder than Expected. In Proceedings of the 12th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K 2020) - KDIR; ISBN 978-989-758-474-9; ISSN 2184-3228, SciTePress, pages 153-160. DOI: 10.5220/0010056101530160

@conference{kdir20,
author={Xu Wang. and Frank {Van Harmelen}. and Zhisheng Huang.},
title={Ontology-based Methods for Classifying Scientific Datasets into Research Domains: Much Harder than Expected},
booktitle={Proceedings of the 12th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K 2020) - KDIR},
year={2020},
pages={153-160},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0010056101530160},
isbn={978-989-758-474-9},
issn={2184-3228},
}

TY - CONF

JO - Proceedings of the 12th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K 2020) - KDIR
TI - Ontology-based Methods for Classifying Scientific Datasets into Research Domains: Much Harder than Expected
SN - 978-989-758-474-9
IS - 2184-3228
AU - Wang, X.
AU - Van Harmelen, F.
AU - Huang, Z.
PY - 2020
SP - 153
EP - 160
DO - 10.5220/0010056101530160
PB - SciTePress