# A Web Scraping Algorithm to Improve the Computation of the Maximum Common Subgraph

### Andrea Calabrese, Lorenzo Cardone, Salvatore Licata, Marco Porro, Stefano Quer

#### 2023

#### Abstract

The Maximum Common Subgraph, a generalization of subgraph isomorphism, is a well-known problem in the computer science area. Albeit being NP-complete, finding Maximum Common Subgraphs has countless practical applications, and researchers are continuously exploring scalable heuristic approaches. One of the state-of-the-art algorithms to solve this problem is a recursive branch-and-bound procedure called McSplit. The algorithm exploits an intelligent invariant to pair vertices with the same label and adopts an effective bound prediction to prune the search space. However, McSplit original version uses a simple heuristic to pair vertices and to build larger subgraphs. As a consequence, a few researchers have already focused on improving the sorting heuristics to converge faster. This paper concentrate on these aspects and presents a collection of heuristics to improve McSplit and its state-of-the-art variants. We present a sorting strategy based on the famous PageRank algorithm, and then we mix it with other approaches. We compare all the heuristics with the original McSplit procedure, and against each other. In particular, we distinguish the heuristics based on the node degree and novel ones based on the PageRank algorithm. Our experimental section shows that PageRank can improve both McSplit and its variants significantly regarding convergence speed and solution size.

#### in Harvard Style

Calabrese A., Cardone L., Licata S., Porro M. and Quer S. (2023). **A Web Scraping Algorithm to Improve the Computation of the Maximum Common Subgraph**. In *Proceedings of the 18th International Conference on Software Technologies - Volume 1: ICSOFT*; ISBN 978-989-758-665-1, SciTePress, pages 197-206. DOI: 10.5220/0012130800003538

