
extracting keyphrases and relations from scientific
publications. arXiv preprint arXiv:1704.02853.
Chalkidis, I., Fergadiotis, M., Malakasiotis, P., and An-
droutsopoulos, I. (2019). Large-scale multi-label
text classification on eu legislation. arXiv preprint
arXiv:1906.02192.
Chin-Yew, L. (2004). Rouge: A package for automatic eval-
uation of summaries. In Proceedings of the Workshop
on Text Summarization Branches Out, 2004.
Cohan, A., Dernoncourt, F., Kim, D. S., Bui, T., Kim, S.,
Chang, W., and Goharian, N. (2018). A Discourse-
Aware Attention Model for Abstractive Summariza-
tion of Long Documents. In Proceedings of the 2018
Conference of the North American Chapter of the As-
sociation for Computational Linguistics: Human Lan-
guage Technologies, Volume 2 (Short Papers), pages
615–621, New Orleans, Louisiana. Association for
Computational Linguistics.
Dalianis, H. (2018). Clinical Text Mining: Secondary Use
of Electronic Patient Records. Springer International
Publishing : Imprint: Springer, Cham, 1st ed. 2018
edition.
Deng, D., Fernandez, R. C., Abedjan, Z., Wang, S., Stone-
braker, M., Elmagarmid, A. K., Ilyas, I. F., Madden,
S., Ouzzani, M., and Tang, N. (2017). The data civ-
ilizer system. In 8th Biennial Conference on Innova-
tive Data Systems Research, CIDR 2017, Chaminade,
CA, USA, January 8-11, 2017, Online Proceedings.
www.cidrdb.org.
Ehrlinger, L., Schrott, J., Melichar, M., Kirchmayr, N.,
and W
¨
oß, W. (2021). Data Catalogs: A System-
atic Literature Review and Guidelines to Implementa-
tion. In Kotsis, G., Tjoa, A. M., Khalil, I., Moser, B.,
Mashkoor, A., Sametinger, J., Fensel, A., Martinez-
Gil, J., Fischer, L., Czech, G., Sobieczky, F., and
Khan, S., editors, Database and Expert Systems Appli-
cations - DEXA 2021 Workshops, volume 1479, pages
148–158. Springer International Publishing, Cham.
Eichler, R., Giebler, C., Gr
¨
oger, C., Hoos, E., Schwarz, H.,
and Mitschang, B. (2021). Enterprise-Wide Metadata
Management: An Industry Case on the Current State
and Challenges. Business Information Systems, pages
269–279.
Eichler, R., Gr
¨
oger, C., and Hoos, E. (2022). Data shopping
— how an enterprise data marketplace supports data
democratization in companies. Lecture Notes in Busi-
ness Information Processing, 452:19–26. © 2022, The
Author(s), under exclusive license to Springer Nature
Switzerland AG.
Google LLC (2024). Text embeddings api documentation.
Online documentation. Retrieved November 13, 2024,
from https://cloud.google.com/vertex-ai/generative-
ai/docs/model-reference/text-embeddings-api.
Goyal, N., Gao, C., Chaudhary, V., Chen, P.-J., Wenzek, G.,
Ju, D., Krishnan, S., Ranzato, M., Guzm
´
an, F., and
Fan, A. (2022). The flores-101 evaluation benchmark
for low-resource and multilingual machine transla-
tion. Transactions of the Association for Computa-
tional Linguistics, 10:522–538.
Grandini, M., Bagli, E., and Visani, G. (2020). Metrics for
multi-class classification: an overview. arXiv preprint
arXiv:2008.05756.
Guo, M., Ainslie, J., Uthus, D., Ontanon, S., Ni, J., Sung,
Y.-H., and Yang, Y. (2021). Longt5: Efficient text-
to-text transformer for long sequences. arXiv preprint
arXiv:2112.07916.
Hadi, M. U., Qureshi, R., Shah, A., Irfan, M., Zafar, A.,
Shaikh, M. B., Akhtar, N., Wu, J., and Mirjalili,
S. (2023). Large language models: A comprehen-
sive survey of its applications, challenges, limitations,
and future prospects. Authorea Preprints, 1:1–26.
Preprint.
Hasselaar, E., Silva, A., Zahidi, S., Decety, N., Daugh-
erty, P., Espinosa, H., Horn, A., Ryan, M., Nanan, C.,
O’Reilly, K., and Yosef, L. (2023). Jobs of Tomor-
row large Language Models and Jobs – A Business
Toolkit. White Paper, World Economic Forum.
Hattori, T., Takahashi, K., and Tamura, K. (2022). IGES
NDC Database.
Horodyski, J. (2022). Metadata matters. Taylor and Fran-
cis, Boca Raton.
Hulth, A. (2003). Improved automatic keyword extraction
given more linguistic knowledge. In Proceedings of
the 2003 conference on Empirical methods in natural
language processing, pages 216–223.
Jahnke, N. and Otto, B. (2023). Data catalogs in the en-
terprise: applications and integration. Datenbank-
Spektrum, 23(2):89–96.
Jeffery, K. (2020). Data curation and preservation. In To-
wards Interoperable Research Infrastructures for En-
vironmental and Earth Sciences: A Reference Model
Guided Approach for Common Challenges, pages
123–139. Springer.
Jenkins, C., Jackson, M., Burden, P., and Wallis, J. (1999).
Automatic rdf metadata generation for resource dis-
covery. Computer Networks, 31(11–16):1305–1320.
Jurafsky, D. and James, M. (2024). Speech and Language
Processing. https://web.stanford.edu/, 3rd ed edition.
Kalyanathaya, K. P., Akila, D., and Suseendren, G. (2019).
A fuzzy approach to approximate string matching
for text retrieval in nlp. J. Comput. Inf. Syst. USA,
15(3):26–32.
Kim, S. N., Medelyan, O., Kan, M.-Y., and Baldwin, T.
(2010). SemEval-2010 task 5: Automatic keyphrase
extraction from scientific articles. In Proceedings of
the 5th International Workshop on Semantic Evalu-
ation, SemEval ’10, pages 21–26, USA. Association
for Computational Linguistics.
Kong, A., Zhao, S., Chen, H., Li, Q., Qin, Y., Sun,
R., and Bai, X. (2023). Promptrank: Unsupervised
keyphrase extraction using prompt. arXiv preprint
arXiv:2305.04490.
Labadie, C., Legner, C., Eurich, M., and Fadler, M. (2020).
Fair enough? enhancing the usage of enterprise data
with data catalogs. In 2020 IEEE 22nd Conference
on Business Informatics (CBI), volume 1, pages 201–
210. IEEE.
M-Files (2019). The 2019 intelligent information manage-
ment benchmark report. White paper, M-Files. Ac-
cessed: July 7, 2024.
DATA 2025 - 14th International Conference on Data Science, Technology and Applications
308