Authors:
Muhammad Asif Suryani
;
Saurav Karmakar
;
Brigitte Mathiak
and
Philipp Mayr
Affiliation:
Knowledge Technologies for the Social Sciences GESIS – Leibniz-Institut für Sozialwissenschaften, Köln, Germany
Keyword(s):
Hugging Face, Metadata Exploration, Metadata Collection, Large Language Models, Research Data Management, Multidisciplinary Research, Dataset.
Abstract:
Metadata features generally exhibit valuable meta information which may facilitate researchers in their tasks. Several studies incorporated scholarly metadata by highlighting its usefulness in certain granularity to assist numerous research tasks. The emergence of Large Language Models (LLMs) has brought an exciting change in the field of Artificial Intelligence (AI) and Machine Learning (ML), which is equally supported by Open Science initiative and FAIR principles. One of the prominent platforms, which ensures the availability of these models to research communities is the Hugging Face. It provides democratized access to models while experiencing rapid growth as a repository. As of March 2025, Hugging Face hosts more than 1.4 million models, which were 0.5 million approximately in February 2024. In this dataset paper, we provide information on a large fraction of Hugging Face model cards. Our dataset comprises of a wide range of metadata features which showcase the meta information
about each model card. In this work, we aim to provide democratized access to a collection of diverse metadata features from Hugging Face model cards and present an insightful overview of these cards by leveraging the metadata to support the research communities by facilitating model adoption.
(More)