Towards Practical k-Anonymization: Correlation-based Construction of Generalization Hierarchy

Tomoaki Mimoto, Anirban Basu, Shinsaku Kiyomoto


The privacy of individuals included in the datasets must be preserved when sensitive datasets are published. Anonymization algorithms such as k-anonymization have been proposed in order to reduce the risk of individuals in the dataset being identified. k-anonymization is the most common technique of modifying attribute values in a dataset until at least k identical records are generated. There are many algorithms that can be used to achieve k-anonymity. However, existing algorithms have the problem of information loss due to a tradeoff between data quality and anonymity. In this paper, we propose a novel method of constructing a generalization hierarchy for k anonymization algorithms. Our method analyses the correlation between attributes and generates an optimal hierarchy according to the correlation. The effect of the proposed scheme has been verified using the actual data: the average of k of the datasets is 83:14, and it is around 1=3 of the value obtained by conventional methods.


  1. Aggarwal, G., Feder, T., Kenthapadi, K., Motwani, R., Panigrahy, R., Thomas, D., and Zhu, A. (2005). Anonymizing tables. In Proc. of ICDT 2005, LNCS, volume 3363, pages 246-258.
  2. Al-Fedaghi, S. S. (2005). Balanced k-anonymity. In Proc. of WASET, volume 6, pages 179-182.
  3. Basu, A., Nakamura, T., Hidano, S., and Kiyomoto, S. (2015). k-anonymity: risks and the reality, accepted for publication. In IEEE International Symposium on Recent Advances of Trust, Security and Privacy in Computing and Communications (RATSP, collocated with the IEEE TrustCom).
  4. Byun, J.-W., Kamra, A., Bertino, E., and Li, N. (2007). Efficient k-anonymity using clustering technique. In Proc. of the International Conference on Database Systems for Advanced Applications, pages 188-200.
  5. Dalenius, T. (1986). Finding a needle in a haystack -or identifying anonymous census record. In Journal of Official Statistics , volume 2(3), pages 329-336.
  6. Dwork, C. (2006). Differential privacy. In Proc. of ICALP 2006, volume 4052, pages 1-12.
  7. Dwork, C. (2008). Differential privacy: A survey of results. In Proc. of TAMC 2008, volume 4978, pages 1-19.
  8. Dwork, C., Kenthapadi, K., McSherry, F., Mironov, I., and Naor, M. (2006a). Our data, ourselves: Privacy via distributed noise generation. In Proc. of Eurocrypt 2006, LNCS, volume 4004, pages 486-503.
  9. Dwork, C., McSherry, F., Nissim, K., and Smith, A. (2006b). Calibrating noise to sensitivity in private data analysis. In Proc. of TCC 2006, LNCS, volume 3876, pages 265-284.
  10. Fienberg, S. E. and McIntyre, J. (2004). Data swapping: Variations on a theme by dalenius and reiss. In Proc. of PSD 2004, LNCS, volume 3050, pages 14-29.
  11. Freidman, J. H., Bentley, J. L., and Finkel, R. A. (2009). An algorithm for finding best matches in logarithmic expected time. In ACM Transactions on Mathematical Software, volume 16 (5), pages 670-682.
  12. Guha, S., Cheng, B., and Francis, P. (2010). Challenges in measuring online advertising systems. In Proceedings of the 10th ACM SIGCOMM conference on Internet measurement, IMC 7810, pages 81-87.
  13. Guttman, A. (1984). R-trees: A dynamic index structure for spatial searching. In Proceedings of the 1984 ACM SIGMOD international conference on Management of data, volume 14, page 47.
  14. He, X., Chen, H., Chen, Y., Dong, Y., Wang, P., and Huang, Z. (2012). Clustering-based k-anonymity. In Advances in Knowledge Discovery and Data Mining SE, volume 7301, pages 405-417. Springer-Verlag.
  15. Iwuchukwu, T. and Naughton, J. F. (2007). Kanonymization as spatial indexing: Toward scarable and incremental anonymization. In Proceeding of the 33rd International Conference on Very Large Data Bases, VLDB, pages 746-757.
  16. Kiyomoto, S. and Martin, K. M. (2010). Towards a common notion of privacy leakage on public database. In Proc. of BWCCA 2010, to appear, pages 186-191. IEEE.
  17. Korolova, A. (2010). Privacy violations using microtargeted ads: A case study. In Proceedings of the 2010 IEEE International Conference on Data Mining Workshops, ICDMW 7810, pages 474-482.
  18. LeFevre, K., DeWitt, D. J., and Ramakrishnan, R. (2006). Mondrian multidimensional k-anonymity. In Proc. of the 22nd International Conference on Data Engineering (ICDE 7806), pages 25-35. IEEE.
  19. Lin, J.-L. and Wei, M.-C. (2008). An efficient clustering method for k-anonymization. In Proc. of the 2008 international workshop on Privacy and anonymity in information society (PAIS 7808), pages 46-50. ACM.
  20. Machanavajjhala, A., Gehrke, J., and Kifer, D. (2006). ldiversity: Privacy beyond k-anonymity. In Proc. of ICDE'06, pages 24-35.
  21. Machanavajjhala, A., Gehrke, J., and Kifer, D. (2007). t-closeness: Privacy beyond k-anonymity and ldiversity. In Proc. of ICDE'07, pages 106-115.
  22. Meyerson, A. and Williams, R. (2004). On the complexity of optimal k-anonymity. In Proc. of PODS 2004, pages 223-228.
  23. Samarati, P. (2001). Protecting respondents' identities in microdata release. IEEE Trans. on Knowledge and Data Engineering, 13(6):1010-1027.
  24. Samarati, P. and Sweeney, L. (1998a). Generalizing data to provide anonymity when disclosing information. In Proc. of PODS 1998, page 188.
  25. Samarati, P. and Sweeney, L. (1998b). Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and suppression. Technical Report SRI-CSL-98-04, SRI Computer Science Lab.
  26. Sun, X., Wang, H., Li, J., Truta, T. M., and Li, P. (2008). (p+, a)-sensitive k-anonymity: a new enhanced privacy protection model. In Proc. of CIT'08, pages 59- 64.
  27. Sweeney, L. (2002a). Achieving k-anonymity privacy protection using generalization and suppression. In J. Uncertainty, Fuzziness, and Knowledge-Base Systems, volume 10(5), pages 571-588.
  28. Sweeney, L. (2002b). k-anonymity: a model for protecting privacy. In International Journal on Uncertainty, Fuzziness and Knowledge-based Systems, volume 10 (5), pages 557-570.
  29. Truta, T. M. and Vinay, B. (2006). Privacy protection: psensitive k-anonymity property. In Proc. of ICDE'06, pages 94-103.
  30. Willenborg, L. and de Waal, T. (2001). Elements of Statistical Disclosure Control, volume 155. LNS, SpringerVerlag.
  31. Winkler, W. E. (2004). Masking and re-identification methods for public-use microdata: Overview and research problems. In Proc. of PSD 2004, LNCS, volume 3050, pages 231-246.
  32. Wong, R. C.-W., Li, J., Fu, A. W.-C., and Wang, K. (2006). (a, k)-anonymity: an enhanced k-anonymity model for privacy preserving data publishing. In Proc. of ACM SIGKDD'06, pages 754-759.

Paper Citation

in Harvard Style

Mimoto T., Basu A. and Kiyomoto S. (2016). Towards Practical k-Anonymization: Correlation-based Construction of Generalization Hierarchy . In Proceedings of the 13th International Joint Conference on e-Business and Telecommunications - Volume 4: SECRYPT, (ICETE 2016) ISBN 978-989-758-196-0, pages 411-418. DOI: 10.5220/0005963804110418

in Bibtex Style

author={Tomoaki Mimoto and Anirban Basu and Shinsaku Kiyomoto},
title={Towards Practical k-Anonymization: Correlation-based Construction of Generalization Hierarchy},
booktitle={Proceedings of the 13th International Joint Conference on e-Business and Telecommunications - Volume 4: SECRYPT, (ICETE 2016)},

in EndNote Style

JO - Proceedings of the 13th International Joint Conference on e-Business and Telecommunications - Volume 4: SECRYPT, (ICETE 2016)
TI - Towards Practical k-Anonymization: Correlation-based Construction of Generalization Hierarchy
SN - 978-989-758-196-0
AU - Mimoto T.
AU - Basu A.
AU - Kiyomoto S.
PY - 2016
SP - 411
EP - 418
DO - 10.5220/0005963804110418