A Hybrid Approach for Mining the Organizational Structure from University Websites
Arman Arzani, Theodor Josef Vogl, Marcus Handte, Pedro José Marrón
2025
Abstract
To support innovation coaches in scouting activities such as discovering expertise, trends inside a university and finding potential innovators, we designed INSE, an innovation search engine which automates the data gathering and analysis processes. The primary goal of INSE is to provide comprehensive system support across all stages of innovation scouting, reducing the need for manual data collection and aggregation. To provide innovation coaches with the necessary information on individuals, INSE must first establish the structure of the organization. This includes identifying the associated staff and researchers in order to assess their academic activities. While this could in theory be done manually, this task is error-prone and virtually impossible to do for large organizations. In this paper, we propose a generic organization mining approach that combines a rule-based algorithm, LLMs and finetuned sequence-to-sequence classifier on university websites, independent of web technologies, content management systems or website layout. We implement the approach and evaluate the results against four different universities, namely Duisburg-Essen, Münster, Dortmund, and Wuppertal. The evaluation indicate that our approach is generic and enables the identification of university aggregators pages with F1 score of above 85% and landing pages of entities with F1 scores of 100% for faculties, above 78% for institutes and chairs.
DownloadPaper Citation
in Harvard Style
Arzani A., Vogl T., Handte M. and Marrón P. (2025). A Hybrid Approach for Mining the Organizational Structure from University Websites. In Proceedings of the 17th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - Volume 1: KDIR; ISBN , SciTePress, pages 188-199. DOI: 10.5220/0013658600004000
in Bibtex Style
@conference{kdir25,
author={Arman Arzani and Theodor Vogl and Marcus Handte and Pedro Marrón},
title={A Hybrid Approach for Mining the Organizational Structure from University Websites},
booktitle={Proceedings of the 17th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - Volume 1: KDIR},
year={2025},
pages={188-199},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0013658600004000},
isbn={},
}
in EndNote Style
TY - CONF
JO - Proceedings of the 17th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - Volume 1: KDIR
TI - A Hybrid Approach for Mining the Organizational Structure from University Websites
SN -
AU - Arzani A.
AU - Vogl T.
AU - Handte M.
AU - Marrón P.
PY - 2025
SP - 188
EP - 199
DO - 10.5220/0013658600004000
PB - SciTePress