Authors:
Ahmet Cumhur Öztürk
and
Belgin Ergenç Bostanoğlu
Affiliation:
Izmir Institute of Technology, Turkey
Keyword(s):
Privacy Preserving Association Rule Mining, Itemset Hiding, Multiple Sensitive Support Thresholds.
Related
Ontology
Subjects/Areas/Topics:
Artificial Intelligence
;
Communication, Collaboration and Information Sharing
;
Data Reduction and Quality Assessment
;
Foundations of Knowledge Discovery in Databases
;
Information Extraction
;
Information Security
;
Knowledge Discovery and Information Retrieval
;
Knowledge Management and Information Sharing
;
Knowledge-Based Systems
;
Symbolic Systems
Abstract:
Itemset mining is the challenging step of association rule mining that aims to extract patterns among items
from transactional databases. In the case of applying itemset mining on the shared data of organizations, each
party needs to hide its sensitive knowledge before extracting global knowledge for mutual benefit. Ensuring
the privacy of the sensitive itemsets is not the only challenge in the itemset hiding process, also the distortion
given to the non-sensitive knowledge and data should be kept at minimum. Most of the previous works related
to itemset hiding allow database owner to assign unique sensitive threshold for each sensitive itemset however
itemsets may have different count and utility. In this paper we propose a new heuristic based hiding algorithm
which 1) allows database owner to assign multiple sensitive threshold values for sensitive itemsets, 2) hides
all user defined sensitive itemsets, 3) uses heuristics that minimizes loss of information and distortion on
the
shared database. In order to speed up hiding steps we represent the database as Pseudo Graph and perform
scan operations on this data structure rather than the actual database. Performance evaluation of our algorithm
Pseudo Graph Based Sanitization (PGBS) is conducted on 4 real databases. Distortion given to the non-sensitive
itemsets (information loss), distortion given to the shared data (distance) and execution time in
comparison to three similar algorithms is measured. Experimental results show that PGBS is competitive in
terms of execution time and distortion and achieves reasonable performance in terms of information loss
amongst the other algorithms.
(More)