loading
Documents

Research.Publish.Connect.

Paper

Paper Unlock

Authors: Rüdiger Gleim ; Alexander Mehler ; Matthias Dehmer and Olga Pustylnikov

Affiliation: Bielefeld University, Germany

ISBN: 978-972-8865-78-8

Keyword(s): Social Tagging, Wikipedia, Category System, Corpus Construction.

Related Ontology Subjects/Areas/Topics: Artificial Intelligence ; Knowledge Discovery and Information Retrieval ; Knowledge-Based Systems ; Multimedia and User Interfaces ; Ontology and the Semantic Web ; Searching and Browsing ; Soft Computing ; Symbolic Systems ; Usability and Ergonomics ; Web Information Systems and Technologies ; Web Interfaces and Applications ; Web Mining

Abstract: The Word Wide Web is a continuous challenge to machine learning. Established approaches have to be enhanced and new methods be developed in order to tackle the problem of finding and organising relevant information. It has often been motivated that semantic classifications of input documents help solving this task. But while approaches of supervised text categorisation perform quite well on genres found in written text, newly evolved genres on the web are much more demanding. In order to successfully develop approaches to web mining, respective corpora are needed. However, the composition of genre- or domain-specific web corpora is still an unsolved problem. It is time consuming to build large corpora of good quality because web pages typically lack reliable meta information. Wikipedia along with similar approaches of collaborative text production offers a way out of this dilemma. We examine how social tagging, as supported by the MediaWiki software, can be utilised as a sourc e of corpus building. Further, we describe a representation format for social ontologies and present the Wikipedia Category Explorer, a tool which supports categorical views to browse through the Wikipedia and to construct domain specific corpora for machine learning. (More)

PDF ImageFull Text

Download
CC BY-NC-ND 4.0

Sign In Guest: Register as new SciTePress user now for free.

Sign In SciTePress user: please login.

PDF ImageMy Papers

You are not signed in, therefore limits apply to your IP address 34.204.194.190

In the current month:
Recent papers: 100 available of 100 total
2+ years older papers: 200 available of 200 total

Paper citation in several formats:
Gleim R.; Mehler A.; Dehmer M.; Pustylnikov O. and (2007). AISLES THROUGH THE CATEGORY FOREST - Utilising the Wikipedia Category System for Corpus Building in Machine Learning.In Proceedings of the Third International Conference on Web Information Systems and Technologies - Volume 2: WEBIST, ISBN 978-972-8865-78-8, pages 142-149. DOI: 10.5220/0001267101420149

@conference{webist07,
author={Rüdiger Gleim and Alexander Mehler and Matthias Dehmer and Olga Pustylnikov},
title={AISLES THROUGH THE CATEGORY FOREST - Utilising the Wikipedia Category System for Corpus Building in Machine Learning},
booktitle={Proceedings of the Third International Conference on Web Information Systems and Technologies - Volume 2: WEBIST,},
year={2007},
pages={142-149},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0001267101420149},
isbn={978-972-8865-78-8},
}

TY - CONF

JO - Proceedings of the Third International Conference on Web Information Systems and Technologies - Volume 2: WEBIST,
TI - AISLES THROUGH THE CATEGORY FOREST - Utilising the Wikipedia Category System for Corpus Building in Machine Learning
SN - 978-972-8865-78-8
AU - Gleim, R.
AU - Mehler, A.
AU - Dehmer, M.
AU - Pustylnikov, O.
PY - 2007
SP - 142
EP - 149
DO - 10.5220/0001267101420149

Login or register to post comments.

Comments on this Paper: Be the first to review this paper.