loading
Papers

Research.Publish.Connect.

Paper

Paper Unlock

Authors: Shuhua Liu and Thomas Forss

Affiliation: Arcada University of Applied Sciences, Finland

ISBN: 978-989-758-048-2

Keyword(s): Web Content Classification, Text Summarization, Topic Similarity, Sentiment Analysis, Online Safety Solutions.

Related Ontology Subjects/Areas/Topics: Artificial Intelligence ; Knowledge Discovery and Information Retrieval ; Knowledge-Based Systems ; Mining Text and Semi-Structured Data ; Soft Computing ; Symbolic Systems ; Web Mining

Abstract: Automatic classification of web content has been studied extensively, using different learning methods and tools, investigating different datasets to serve different purposes. Most of the studies have made use of the content and structural features of web pages. However, previous experience has shown that certain groups of web pages, such as those that contain hatred and violence, are much harder to classify with good accuracy when both content and structural features are already taken into consideration. In this study we present a new approach for automatically classifying web pages into pre-defined topic categories. We apply text summarization and sentiment analysis techniques to extract topic and sentiment indicators of web pages. We then build classifiers based on combined topic and sentiment features. A large amount of experiments were carried out. Our results suggest that incorporating the sentiment dimension can indeed bring much added value to web content classification. Topic similarity based classifiers solely did not perform well, but when topic similarity and sentiment features are combined, the classification model performance is significantly improved for many web categories. Our study offers valuable insights and inputs to the development of web detection systems and Internet safety solutions. (More)

PDF ImageFull Text

Download
CC BY-NC-ND 4.0

Sign In Guest: Register as new SciTePress user now for free.

Sign In SciTePress user: please login.

PDF ImageMy Papers

You are not signed in, therefore limits apply to your IP address 100.24.122.228

In the current month:
Recent papers: 100 available of 100 total
2+ years older papers: 200 available of 200 total

Paper citation in several formats:
Liu, S. and Forss, T. (2014). Web Content Classification based on Topic and Sentiment Analysis of Text.In Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2014) ISBN 978-989-758-048-2, pages 300-307. DOI: 10.5220/0005101803000307

@conference{kdir14,
author={Shuhua Liu. and Thomas Forss.},
title={Web Content Classification based on Topic and Sentiment Analysis of Text},
booktitle={Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2014)},
year={2014},
pages={300-307},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0005101803000307},
isbn={978-989-758-048-2},
}

TY - CONF

JO - Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2014)
TI - Web Content Classification based on Topic and Sentiment Analysis of Text
SN - 978-989-758-048-2
AU - Liu, S.
AU - Forss, T.
PY - 2014
SP - 300
EP - 307
DO - 10.5220/0005101803000307

Login or register to post comments.

Comments on this Paper: Be the first to review this paper.