The WikiWooW Dataset: Harnessing Semantic Similarity and Clickstream-Data for Serendipitous Hyperlinked-Paths Mining in Wikipedia

Cosimo Palma, Bence Molnár

2025

Abstract

This paper introduces WikiWooW, a dataset generator designed for distilling a formal model of Wikipedia entity-pairs serendipity. The task, foundational to mining serendipitous hyperlinked paths, builds upon cognitive theory and exploits serendipity sub-components: graph centrality, popularity, clickstream, corpus-based and knowledge-based similarity. Two proof-of-concept experiments were conducted, based on two different datasets. The first one uses a single Wikipedia entity linked through the DBpedia dbo:wikiPageWikiLink property to other 413 entities. These pairs are searched in Wikimedia clickstream data and scored for interestingness according to a principled mathematical model, which is validated against Amazon Mechanical Turk- and author annotations. The second dataset contains 146 random Wikipedia entity-pairs annotated by 10 postgraduate students following detailed guidelines. Average serendipity scores are then correlated with dataset features using the original model and four alternatives. The proposed dataset-generator aims to support Serendipity Mining for Computational Creativity, particularly Knowledge-based Automatic Story Generation, where serendipity matters more than similarity-based interestingness metrics. First results, despite their limitations, confirm the principles initially deduced for modelling serendipity, showing that serendipity can be effectively modeled through comprehensive parameter optimization.

Download


Paper Citation


in Harvard Style

Palma C. and Molnár B. (2025). The WikiWooW Dataset: Harnessing Semantic Similarity and Clickstream-Data for Serendipitous Hyperlinked-Paths Mining in Wikipedia. In Proceedings of the 17th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - Volume 1: KDIR; ISBN , SciTePress, pages 415-425. DOI: 10.5220/0013747000004000


in Bibtex Style

@conference{kdir25,
author={Cosimo Palma and Bence Molnár},
title={The WikiWooW Dataset: Harnessing Semantic Similarity and Clickstream-Data for Serendipitous Hyperlinked-Paths Mining in Wikipedia},
booktitle={Proceedings of the 17th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - Volume 1: KDIR},
year={2025},
pages={415-425},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0013747000004000},
isbn={},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 17th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - Volume 1: KDIR
TI - The WikiWooW Dataset: Harnessing Semantic Similarity and Clickstream-Data for Serendipitous Hyperlinked-Paths Mining in Wikipedia
SN -
AU - Palma C.
AU - Molnár B.
PY - 2025
SP - 415
EP - 425
DO - 10.5220/0013747000004000
PB - SciTePress