loading
Papers

Research.Publish.Connect.

Paper

Paper Unlock

Authors: António Videira and Nuno Goncalves

Affiliation: University of Coimbra, Portugal

ISBN: 978-989-758-024-6

Keyword(s): Web Page Classification, Feature Extraction, Feature Selection, Machine Learning.

Related Ontology Subjects/Areas/Topics: Searching and Browsing ; Web Information Systems and Technologies ; Web Interfaces and Applications

Abstract: There is a constantly increasing requirement for automatic classification techniques with greater classification accuracy. To automatically classify and process web pages, the current systems use the text content of those pages. However, little work has been done on using the visual content of a web page. On this account, our work is focused on performing web page classification using only their visual content. First a descriptor is constructed, by extracting different features from each page. The features used are the simple color and edge histograms, Gabor and Tamura features. Then two methods of feature selection, one based on the Chi-Square criterion, the other on the Principal Components Analysis are applied to that descriptor, to select the top discriminative attributes. Another approach involves using the Bag of Words (BoW) model to treat the SIFT local features extracted from each image as words, allowing to construct a dictionary. Then we classify web pages based on their aesthetic value, their recency and type of content. The machine learning methods used in this work are the Naive Bayes, Support Vector Machine, Decision Tree and AdaBoost. Different tests are performed to evaluate the performance of each classifier. Finally, we thus prove that the visual appearance of a web page has rich content not explored by current web crawlers based only on text content. (More)

PDF ImageFull Text

Download
CC BY-NC-ND 4.0

Sign In Guest: Register as new SciTePress user now for free.

Sign In SciTePress user: please login.

PDF ImageMy Papers

You are not signed in, therefore limits apply to your IP address 34.231.21.123

In the current month:
Recent papers: 100 available of 100 total
2+ years older papers: 200 available of 200 total

Paper citation in several formats:
Videira, A. and Goncalves, N. (2014). Automatic Web Page Classification Using Visual Content.In Proceedings of the 10th International Conference on Web Information Systems and Technologies - Volume 1: WEBIST, ISBN 978-989-758-024-6, pages 193-204. DOI: 10.5220/0004856201930204

@conference{webist14,
author={António Videira. and Nuno Goncalves.},
title={Automatic Web Page Classification Using Visual Content},
booktitle={Proceedings of the 10th International Conference on Web Information Systems and Technologies - Volume 1: WEBIST,},
year={2014},
pages={193-204},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0004856201930204},
isbn={978-989-758-024-6},
}

TY - CONF

JO - Proceedings of the 10th International Conference on Web Information Systems and Technologies - Volume 1: WEBIST,
TI - Automatic Web Page Classification Using Visual Content
SN - 978-989-758-024-6
AU - Videira, A.
AU - Goncalves, N.
PY - 2014
SP - 193
EP - 204
DO - 10.5220/0004856201930204

Login or register to post comments.

Comments on this Paper: Be the first to review this paper.