loading
Papers Papers/2022 Papers Papers/2022

Research.Publish.Connect.

Paper

Paper Unlock

Authors: Hieu Quang Le and Stefan Conrad

Affiliation: Heinrich-Heine University, Germany

Keyword(s): Deep web, Classification, Database, Feature selection, Gaussian processes.

Related Ontology Subjects/Areas/Topics: Databases and Datawarehouses ; e-Business and e-Commerce ; Internet Technology ; Society, e-Business and e-Government ; System Integration ; Web Information Systems and Technologies

Abstract: This paper studies the problem of classifying structured data sources on the Web. While prior works use all features, once extracted from search interfaces, we further refine the feature set. In our research, each search interface is treated simply as a bag-of-words. We choose a subset of words, which is suited to classify web sources, by our feature selection methods with new metrics and a novel simple ranking scheme. Using aggressive feature selection approach, together with a Gaussian process classifier, we obtained high classification performance in an evaluation over real web data.

CC BY-NC-ND 4.0

Sign In Guest: Register as new SciTePress user now for free.

Sign In SciTePress user: please login.

PDF ImageMy Papers

You are not signed in, therefore limits apply to your IP address 18.118.205.186

In the current month:
Recent papers: 100 available of 100 total
2+ years older papers: 200 available of 200 total

Paper citation in several formats:
Quang Le, H. and Conrad, S. (2009). CLASSIFYING STRUCTURED WEB SOURCES USING AGGRESSIVE FEATURE SELECTION. In Proceedings of the Fifth International Conference on Web Information Systems and Technologies - WEBIST; ISBN 978-989-8111-81-4; ISSN 2184-3252, SciTePress, pages 613-620. DOI: 10.5220/0001824706130620

@conference{webist09,
author={Hieu {Quang Le}. and Stefan Conrad.},
title={CLASSIFYING STRUCTURED WEB SOURCES USING AGGRESSIVE FEATURE SELECTION},
booktitle={Proceedings of the Fifth International Conference on Web Information Systems and Technologies - WEBIST},
year={2009},
pages={613-620},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0001824706130620},
isbn={978-989-8111-81-4},
issn={2184-3252},
}

TY - CONF

JO - Proceedings of the Fifth International Conference on Web Information Systems and Technologies - WEBIST
TI - CLASSIFYING STRUCTURED WEB SOURCES USING AGGRESSIVE FEATURE SELECTION
SN - 978-989-8111-81-4
IS - 2184-3252
AU - Quang Le, H.
AU - Conrad, S.
PY - 2009
SP - 613
EP - 620
DO - 10.5220/0001824706130620
PB - SciTePress