Investigations in Privacy Preserving Data Mining

Kshitij Pathak; Sanjay Silakari; Narendra S. Chaudhari

Authors

Kshitij Pathak Department of CSE, University Institute of Technology, RGPV Bhopal (M.P.), India
Sanjay Silakari Professor, Department of CSE, University Institute of Technology, RGPV Bhopal (M.P.), India
Narendra S. Chaudhari Dean, Research and Development, IIT,Indore, India

Keywords:

Privacy Preserving Data Mining, Association Rule Hiding, Data Hiding in Database, Knowledge Hiding in Database.

Abstract

Data Mining, Data Sharing and Privacy-Preserving are fast emerging as a field of the high level of the research study. A close review of the research based on Privacy Preserving Data Mining revealed the twin fold problems, first is the protection of private data (Data Hiding in Database) and second is the protection of sensitive rules (Knowledge) ingrained in data (Knowledge Hiding in the database). The first problem has its impetus on how to obtain accurate results even when private data is concealed. The second issue focuses on how to protect sensitive association rule contained in the database from being discovered, while non-sensitive association rules can still be mined with traditional data mining projects. Undoubtedly, performance is a major concern with knowledge hiding techniques. This paper focuses on the description of approaches for Knowledge Hiding in the database as well as discuss issues and challenges about the development of an integrated solution for Data Hiding in Database and Knowledge Hiding in Database. This study also highlights directions for the future studies so that suggestive pragmatic measures can be incorporated in ongoing research process on hiding sensitive association rules.

References

[1] Charu C. Aggarwal. On k-anonymity and the curse of dimensionality. In Proc. of the 31th VLDB Conference, Trondheim, Norway, September 2005.
[2] Pierangela Samarati. Protecting respondents’ identities in microdata release. IEEE Transactions on Knowledge and Data Engineering, 13(6):1010–1027, November 2001.
[3] Pierangela Samarati and Latanya Sweeney. Generalizing data to provide anonymity when disclosing information (abstract). In Proc. of the 17th ACM-SIGMOD-SIGACT-SIGART Symposium on the Principles of Database Systems, page 188, Seattle, WA, 1998.
[4] Gagan Aggarwal, Tomas Feder, Krishnaram Kenthapadi, Rajeev Motwani, Rina Panigrahy, Dilys Thomas, and An Zhu. Anonymizing tables. In Proc. of the 10th International Conference on Database Theory (ICDT’05), Edinburgh, Scotland, January 2005.
[5] Gagan Aggarwal, Tomas Feder, Krishnaram Kenthapadi, Rajeev Motwani, Rina Panigrahy, Dilys Thomas, and An Zhu. Approximation algorithms for k-anonymity. Journal of Privacy Technology, November 2005.
[6] Adam Meyerson and Ryan Williams On the complexity of optimal k- anonymity. In Proc. of the 23rd ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, Paris, France, June 2004.
[7] Machanavajjhala A., Gehrke J., Kifer D., and Venkitasubramaniam M.: l-Diversity: Privacy Beyond k-Anonymity. ICDE, 2006.
[8] Xiao X., Tao Y. Anatomy: Simple and Effective Privacy Preservation. VLDB Conference, pp. 139-150, 2006.
[9] Aggarwal C. C. On k-anonymity and the curse of dimensionality. VLDB Conference, 2005.
[10] Li N., Li T., Venkatasubramanian S: t-Closeness: Privacy beyond k-anonymity and l-diversity. ICDE Conference, 2007.
[11] Warner S. L. Randomized Response: A survey technique for eliminating evasive answer bias. Journal of American Statistical Association, 60(309):63–69, March 1965.
[12] Liew C. K., Choi U. J., Liew C. J. A data distortion by probability distribution. ACM TODS, 10(3):395–411, 1985.
[13] Agrawal R., Srikant R. Privacy-Preserving Data Mining. Proceedings of the ACM SIGMOD Conference, 2000.
[14] Evfimievski A., Srikant R., Agrawal R., Gehrke J.: Privacy-Preserving Mining of Association Rules. ACM KDD Conference, 2002.
[15] V.S. Verykios, A. Elmagarmid, E. Bertino, Y. Saygin, and E. Dasseni, ?Association rule hiding,IEEE Trans. Knowledge and Data Engineering, vol. KDE-16, no. 4, pp. 434-447, 2004.
[16] S.R.M. Oliveira and O.R. Zaïane, ?Privacy preserving frequent itemset mining,in Proc. 2 nd IEEE-ICDM Workshop on Privacy, Security and Data Mining, Australian Computer Society, 2002, pp. 43-54.
[17] Oliveira, S.R.M. and Zaïane, O.R. Protecting sensitive knowledge by data sanitization. In: Proc. of the 3Prd IEEE Int’l Conf. on Data Mining (ICDM'03). IEEE Computer Society, USA, 2003. 613-616.
[18] Saygin, Y., Verykios, V.S., and Clifton, C. Using unknowns to prevent discovery of association rules. SIGMOD Record, 2001, 30(4):45-54
[19] Zhang, Xiaoming. Knowledge Hiding in Data Mining by Transaction Adding and Removing. In Computer Software and Applications Conference, 2007. COMPSAC 2007. 31st Annual International, vol. 1, pp. 233-240. IEEE, 2007.
[20] Guo, Yuhong. Reconstruction-based association rule hiding. Proceedings of SIGMOD2007 Ph. D. Workshop on Innovative Database Research. Vol. 2007. 2007.
[21] Chen, X., Orlowska, M., and Li, X. A new framework for privacy preserving data sharing. In: Proc. of the 4th IEEE ICDM Workshop: Privacy and Security Aspects of Data Mining. IEEE Computer Society, 2004. 47-56.
[22] Gkoulalas-Divanis, A.; Verykios, V.S. "Exact Knowledge Hiding through Database Extension", Knowledge and Data Engineering, IEEE Transactions on, On page(s): 699 - 713 Volume: 21, Issue: 5, May 2009
[23] Pathak, K.; Chaudhari, N.S.; Tiwari, A., "Privacy preserving association rule mining by introducing concept of impact factor," Industrial Electronics and Applications (ICIEA), 2012 7th IEEE Conference on , vol., no., pp.1458,1461, 18-20 July 2012 doi: 10.1109/ICIEA.2012.6360953
[24] Atallah, M., Bertino, E., Elmagarmid, A., Ibrahim, M., and Verykios, V.S. Disclosure limitation of sensitive rules. In: Scheuermann P, ed. Proc. of the IEEE Knowledge and Data Exchange Workshop (KDEX'99). ,1999. PP. 45-52.
[25] Sun, X., & Philip, S. Y. (2007). Hiding Sensitive Frequent Itemsets by a Border-Based Approach. JCSE, 1(1), 74-94.
[26] X. Sun and P. S. Yu. A border–based approach for hiding sensitive frequent itemsets. In Proceedings of the 5th IEEE International Conference on Data Mining (ICDM), pages 426–433, 2005.
[27] X. Sun and P. S. Yu., Hiding sensitive frequent itemsets by a border–based approach. Computing science and engineering, 1(1):74–94, 2007.
[28] G. V. Moustakides and V. S. Verykios. A max–min approach for hiding frequent itemsets. In Workshops Proceedings of the 6th IEEE International Conference on Data Mining (ICDM), pages 502–506, 2006.
[29] G. V. Moustakides and V. S. Verykios. A maxmin approach for hiding frequent itemsets. Data and Knowledge Engineering, 65(1):75–89, 2008.
[30] S. Menon, S. Sarkar, and S. Mukherjee. Maximizing accuracy of shared databases when concealing sensitive patterns. Information Systems Research, 16(3):256–270, 2005.
[31] A. Gkoulalas-Divanis and V. S. Verykios. An integer programming approach for frequent itemset hiding. In Proceedings of the 15th ACM International Conference on Information and Knowledge Management (CIKM), pages 748–757, 2006.
[32] A. Gkoulalas-Divanis and V. S. Verykios. Hiding sensitive knowledge without side effects. Knowledge and Information Systems, 20(3):263–299, 2009.

Investigations in Privacy Preserving Data Mining

Authors

Keywords:

Abstract

References

Downloads

Published

Issue

Section

License

How to Cite

Make a Submission

Information

Developed By

Language

Announcements

Latest publications