
be shared on Zenodo to facilitate the adoption of AI
across research communities.
This work have exciting future prospects such as
Model Recommender System, Hugging Face Model
card Leader-boards which provide users valuable rec-
ommendations based on metadata features of model
cards alongside technological description available on
the landing pages of each model cards. Moreover, in-
clusion of current state of development of model cards
in the recommendations will be interesting and indi-
cate the current state of development of the models.
Furthermore, an exciting direction is the harmo-
nization of research artifacts across scholarly repos-
itories, which would enrich the research ecosystem
with more linked information for researchers such as
“Authors with Models” and “Organization with Mod-
els’. In addition, exploiting model card provenance
will also be interesting and may yield valuable in-
sights.
ACKNOWLEDGEMENTS
This work has been partially funded by the Deutsche
Forschungsgemeinschaft (DFG, German Research
Foundation), NFDI4DS (Grant number 460234259).
Authors also acknowledge the Hugging Face as data
sources and also thanks the individuals involved in
this research.
REFERENCES
Casta
˜
no, J., Mart
´
ınez-Fern
´
andez, S., Franch, X., and
Bogner, J. (2023). Exploring the carbon footprint
of hugging face’s ml models: A repository mining
study. In 2023 ACM/IEEE International Symposium
on Empirical Software Engineering and Measurement
(ESEM), pages 1–12. IEEE.
Culbert, J. H. (2023). 4tct, a 4chan text collection tool.
arXiv preprint arXiv:2307.03556.
Dang, V.-N., Aussenac-Gilles, N., Megdiche, I., and Ravat,
F. (2023). Interoperability of open science metadata:
What about the reality? In International Conference
on Research Challenges in Information Science, pages
467–482. Springer.
Di Sipio, C., Rubei, R., Di Rocco, J., Di Ruscio, D., and
Nguyen, P. T. (2024). Automated categorization of
pre-trained models for software engineering: A case
study with a hugging face dataset. arXiv preprint
arXiv:2405.13185.
Du, N., Guo, J., Wu, C. Q., Hou, A., Zhao, Z., and
Gan, D. (2020). Recommendation of academic pa-
pers based on heterogeneous information networks.
In 2020 IEEE/ACS 17th International Conference on
Computer Systems and Applications (AICCSA), pages
1–6. IEEE.
Face, H. (2025). Hugging face apis. https://huggingface.co.
Accessed: 2025-02-20.
Hulsebos, M., Lin, W., Shankar, S., and Parameswaran, A.
(2024). It took longer than i was expecting: Why is
dataset search still so hard? In Proceedings of the
2024 Workshop on Human-In-the-Loop Data Analyt-
ics, pages 1–4.
Liu, J., Tang, T., Wang, W., Xu, B., Kong, X., and Xia, F.
(2018). A survey of scholarly data visualization. Ieee
Access, 6:19205–19221.
McMillan-Major, A., Osei, S., Rodriguez, J. D., Am-
manamanchi, P. S., Gehrmann, S., and Jernite, Y.
(2021). Reusable templates and guides for document-
ing datasets and models for natural language process-
ing and generation: A case study of the hugging-
face and gem data and model cards. arXiv preprint
arXiv:2108.07374.
Pepe, F., Nardone, V., Mastropaolo, A., Bavota, G., Can-
fora, G., and Di Penta, M. (2024). How do hug-
ging face models document datasets, bias, and li-
censes? an empirical study. In Proceedings of the
32nd IEEE/ACM International Conference on Pro-
gram Comprehension, pages 370–381.
Registry, R. O. (2025). Ror data.
Suryani, M. A., Karmakar, S., and Mathiak, B. (2024). Ex-
ploration of hugging face models by heterogeneous in-
formation network and linking across scholarly repos-
itories. In International Conference on Advances in
Social Networks Analysis and Mining, pages 371–386.
Springer.
Suryani, M. A., Karmakar, S., Mathiak, B., Mutschke, P.,
and Mayr, P. (2025). Hugging face model cards meta-
data dataset.
Warzel, D., Fitzmartin, R., Zhou, F., et al. (2020). Fair
data sharing: the roles of common data elements and
harmonization. Journal of biomedical informatics,
107:103421.
Yang, Z., Shi, J., Devanbu, P., and Lo, D. (2024). Ecosys-
tem of large language models for code. arXiv preprint
arXiv:2405.16746.
Yang, Z., Wang, C., Shi, J., Hoang, T., Kochhar, P., Lu,
Q., Xing, Z., and Lo, D. (2023). What do users ask
in open-source ai repositories? an empirical study of
github issues. In 2023 IEEE/ACM 20th International
Conference on Mining Software Repositories (MSR),
pages 79–91. IEEE.
DATA 2025 - 14th International Conference on Data Science, Technology and Applications
590