Authors:
            
                    Rüdiger Gleim
                    
                        
                    
                    ; 
                
                    Alexander Mehler
                    
                        
                    
                    ; 
                
                    Matthias Dehmer
                    
                        
                    
                     and
                
                    Olga Pustylnikov
                    
                        
                    
                    
                
        
        
            Affiliation:
            
                    
                        
                    
                    Bielefeld University, Germany
                
        
        
        
        
        
             Keyword(s):
            Social Tagging, Wikipedia, Category System, Corpus Construction.
        
        
            
                Related
                    Ontology
                    Subjects/Areas/Topics:
                
                        Artificial Intelligence
                    ; 
                        Knowledge Discovery and Information Retrieval
                    ; 
                        Knowledge-Based Systems
                    ; 
                        Multimedia and User Interfaces
                    ; 
                        Ontology and the Semantic Web
                    ; 
                        Searching and Browsing
                    ; 
                        Soft Computing
                    ; 
                        Symbolic Systems
                    ; 
                        Usability and Ergonomics
                    ; 
                        Web Information Systems and Technologies
                    ; 
                        Web Interfaces and Applications
                    ; 
                        Web Mining
                    
            
        
        
            
                Abstract: 
                The Word Wide Web is a continuous challenge to machine learning. Established approaches have to be enhanced and new methods be developed in order to tackle the problem of finding and organising relevant
information. It has often been motivated that semantic classifications of input documents help solving this
task. But while approaches of supervised text categorisation perform quite well on genres found in written
text, newly evolved genres on the web are much more demanding. In order to successfully develop approaches
to web mining, respective corpora are needed. However, the composition of genre- or domain-specific web
corpora is still an unsolved problem. It is time consuming to build large corpora of good quality because web
pages typically lack reliable meta information. Wikipedia along with similar approaches of collaborative text
production offers a way out of this dilemma. We examine how social tagging, as supported by the MediaWiki
software, can be utilised as a sour
                ce of corpus building. Further, we describe a representation format for social
ontologies and present the Wikipedia Category Explorer, a tool which supports categorical views to browse
through the Wikipedia and to construct domain specific corpora for machine learning.
                (More)