Authors:
            
                    Montserrat Batet
                    
                        
                    
                    ; 
                
                    Arnau Erola
                    
                        
                    
                    ; 
                
                    David Sánchez
                    
                        
                    
                     and
                
                    Jordi Castellà-Roca
                    
                        
                    
                    
                
        
        
            Affiliation:
            
                    
                        
                    
                    Universitat Rovira i Virgili, Spain
                
        
        
        
        
        
             Keyword(s):
            Data Semantics, Set-valued Data, Privacy, Microaggregation, Knowledge Bases.
        
        
            
                Related
                    Ontology
                    Subjects/Areas/Topics:
                
                        Agents
                    ; 
                        Applications
                    ; 
                        Artificial Intelligence
                    ; 
                        Data Mining
                    ; 
                        Databases and Information Systems Integration
                    ; 
                        e-Business
                    ; 
                        Enterprise Engineering
                    ; 
                        Enterprise Information Systems
                    ; 
                        Enterprise Ontologies
                    ; 
                        Formal Methods
                    ; 
                        Knowledge Engineering and Ontology Development
                    ; 
                        Knowledge-Based Systems
                    ; 
                        Natural Language Processing
                    ; 
                        Ontologies
                    ; 
                        Pattern Recognition
                    ; 
                        Privacy, Safety and Security
                    ; 
                        Sensor Networks
                    ; 
                        Signal Processing
                    ; 
                        Simulation and Modeling
                    ; 
                        Soft Computing
                    ; 
                        Symbolic Systems
                    
            
        
        
            
                Abstract: 
                It is quite common that companies and organisations require of releasing and exchanging information related to individuals. Due to the usual sensitive nature of these data, appropriate measures should be applied to reduce the risk of re-identification of individuals while keeping as much data utility as possible. Many anonymisation mechanisms have been developed up to present, even though most of them focus on structured/relational databases containing numerical or categorical data. However, the anonymisation of transactional data, also known as set-valued data, has received much less attention. The management and transformation of these data presents additional challenges due to their variable cardinality and their usually textual and unbounded nature. Current approaches focusing on set-valued data are based on the generalisation of original values; however, this suffers from a high information loss derived from the reduced granularity of the output values. To tackle this problem, i
                n this paper we adapt a well-known microaggregation anonymisation mechanism so that it can be applied to textual set-valued data. Moreover, since the utility of textual data is closely related to their meaning, special care has been put in preserving data semantics. To do so, appropriate semantic similarity and aggregation functions are proposed. Experiments conducted on a real set-valued data set show that our proposal better preserves data utility in comparison with non-semantic approaches.
                (More)