A Scoring Map Algorithm for Automatically Detecting Structural Similarity of DOM Elements

Julián Grigera, Julián Grigera, Julián Grigera, Juan Gardey, Juan Gardey, Alejandra Garrido, Alejandra Garrido, Gustavo Rossi, Gustavo Rossi

2021

Abstract

Most documents in the WWW are generated from templates that represent user interface (UI) elements, and later filled with contents. In the field of information extraction, many approaches emerged to analyze the documents’ structure, obtain similar features amongst them, and generate wrappers that are used to extract the raw contents from such documents. Therefore, most techniques documented in the literature are optimized to compare full documents, but there are other fields of applicability that require analyzing structural similarity on smaller UI components, like web augmentation or transcoding. In this paper we present two flexible algorithms to measure similarity between DOM Elements by using a mixed approach that considers both elements’ location and inner structure. The proposed algorithms were used in the context of two projects: an approach for automatic usability refactoring, and a web accessibility helper. We also present a wrapper induction technique based on such algorithms. Additionally, we present a precision & recall evaluation of our algorithms as compared with other known approaches, applied to DOM elements of different sizes, but smaller than full scaled documents. The proposed algorithms run in linear time, so they are faster than most approaches that analyze structural similarity.

Download


Paper Citation


in Harvard Style

Grigera J., Gardey J., Garrido A. and Rossi G. (2021). A Scoring Map Algorithm for Automatically Detecting Structural Similarity of DOM Elements. In Proceedings of the 17th International Conference on Web Information Systems and Technologies - Volume 1: WEBIST, ISBN 978-989-758-536-4, pages 174-185. DOI: 10.5220/0010716300003058


in Bibtex Style

@conference{webist21,
author={Julián Grigera and Juan Gardey and Alejandra Garrido and Gustavo Rossi},
title={A Scoring Map Algorithm for Automatically Detecting Structural Similarity of DOM Elements},
booktitle={Proceedings of the 17th International Conference on Web Information Systems and Technologies - Volume 1: WEBIST,},
year={2021},
pages={174-185},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0010716300003058},
isbn={978-989-758-536-4},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 17th International Conference on Web Information Systems and Technologies - Volume 1: WEBIST,
TI - A Scoring Map Algorithm for Automatically Detecting Structural Similarity of DOM Elements
SN - 978-989-758-536-4
AU - Grigera J.
AU - Gardey J.
AU - Garrido A.
AU - Rossi G.
PY - 2021
SP - 174
EP - 185
DO - 10.5220/0010716300003058