Authors:
Florian Thaeter
and
Rüdiger Reischuk
Affiliation:
Institut für Theoretische Informatik, Universität zu Lübeck, Ratzeburger Allee 160, Lübeck, Germany
Keyword(s):
Microaggregation, k-anonymity, Data Clustering.
Abstract:
k-anonymous microaggregation is a standard technique to improve privacy of individuals whose personal data is used in microdata databases. Unlike semantic privacy requirements like differential privacy, k-anonymity allows the unrestricted publication of data, suitable for all kinds of analysis since every individual is hidden in a cluster of size at least k. Microaggregation can preserve a high level of utility, that means small information loss caused by the aggregation procedure, compared to other anonymization techniques like generalization or suppression. Minimizing the information loss in k-anonymous microaggregation is an NP-hard clustering problem for k ≥ 3. Even more, no efficient approximation algorithms with a nontrivial approximation ratio are known. Therefore, a bunch of heuristics have been developed to restrain high utility – all with quadratic time complexity in the size of the database at least.
We improve this situation in several respects providing a tradeoff betwe
en computational effort and utility. First, a quadratic time algorithm ONA* is presented that achieves significantly better utility for standard benchmarks. Next, an almost linear time algorithm is developed that gives worse, but still acceptable utility. This is achieved by a suitable adaption of the Mondrian clustering algorithm. Finally, combining both techniques a new class MONA of parameterized algorithms is designed that deliver competitive utility for user-specified time constraints between almost linear and quadratic.
(More)