Authors:
            
                    Ferdaous Jenhani
                    
                        
                    
                    ; 
                
                    Mohamed Salah Gouider
                    
                        
                    
                     and
                
                    Lamjed Bensaid
                    
                        
                    
                    
                
        
        
            Affiliation:
            
                    
                        
                    
                    Institut Superieur de Gestion de Tunis, Tunisia
                
        
        
        
        
        
             Keyword(s):
            Hadoop, Social Data, Twitter, Information Extraction, Drug Abuse.
        
        
            
                Related
                    Ontology
                    Subjects/Areas/Topics:
                
                        Artificial Intelligence
                    ; 
                        Information Extraction
                    ; 
                        Knowledge Discovery and Information Retrieval
                    ; 
                        Knowledge-Based Systems
                    ; 
                        Symbolic Systems
                    
            
        
        
            
                Abstract: 
                Social data analysis becomes a real business requirement regarding the frequent use of social media as a new business strategy. However, their volume, velocity and variety are challenging their storage and processing. In a previous contribution [11, 12], we proposed an events extraction system in which we focused only on data variety and we did not handle volume and velocity dimensions. So, our solution cannot be considered a big data system.
In this work, we port previously proposed system to a parallel and distributed framework in order to reduce the complexity of task and scale up to larger volumes of data continuously growing. We propose two loosely coupled Hadoop clusters for entity recognition and events extraction. In experiments, we carried time test and accuracy test to check the performance of the system on extracting drug abuse behavioral events from 1000000 tweets. Hadoop-based system achieves better performance compared to old system.