loading
Documents

Research.Publish.Connect.

Paper

Paper Unlock

Authors: Laura Fontán ; Rafael López-García ; Manuel Álvarez and Alberto Pan

Affiliation: University of A Coruña, Spain

ISBN: 978-989-8565-29-7

Keyword(s): Information Retrieval, Automatic Web Data Extraction.

Related Ontology Subjects/Areas/Topics: Artificial Intelligence ; Clustering and Classification Methods ; Information Extraction ; Knowledge Discovery and Information Retrieval ; Knowledge-Based Systems ; Mining Text and Semi-Structured Data ; Soft Computing ; Symbolic Systems ; Web Mining

Abstract: This paper presents a new technique for detecting and extracting lists of structured records from Web pages. With respect to most of the state-of-the-art systems, our approach is capable of detecting nested data structures (sublists) and it also incorporates some heuristics to delete unwanted content such as banners and navigation menus from the data region. This article also describes the experiments we have performed to validate the system. The precision and recall we have obtained in our tests surpass 90%.

PDF ImageFull Text

Download
Sign In Guest: Register as new SciTePress user now for free.

Sign In SciTePress user: please login.

PDF ImageMy Papers

You are not signed in, therefore limits apply to your IP address 35.175.248.25

In the current month:
Recent papers: 100 available of 100 total
2+ years older papers: 200 available of 200 total

Paper citation in several formats:
Fontán, L.; López-García, R.; Álvarez, M. and Pan, A. (2012). Automatically Extracting Complex Data Structures from the Web.In Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2012) ISBN 978-989-8565-29-7, pages 246-251. DOI: 10.5220/0004140802460251

@conference{kdir12,
author={Laura Fontán. and Rafael López{-}García. and Manuel Álvarez. and Alberto Pan.},
title={Automatically Extracting Complex Data Structures from the Web},
booktitle={Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2012)},
year={2012},
pages={246-251},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0004140802460251},
isbn={978-989-8565-29-7},
}

TY - CONF

JO - Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2012)
TI - Automatically Extracting Complex Data Structures from the Web
SN - 978-989-8565-29-7
AU - Fontán, L.
AU - López-García, R.
AU - Álvarez, M.
AU - Pan, A.
PY - 2012
SP - 246
EP - 251
DO - 10.5220/0004140802460251

Login or register to post comments.

Comments on this Paper: Be the first to review this paper.