# IMPROVING QUALITY OF RULE SETS BY INCREASING INCOMPLETENESS OF DATA SETS - A Rough Set Approach

### Jerzy W. Grzymala-Busse, Witold J. Grzymala-Busse

#### 2008

#### Abstract

This paper presents a new methodology to improve the quality of rule sets. We performed a series of data mining experiments on completely specified data sets. In these experiments we removed some specified attribute values, or, in different words, replaced such specified values by symbols of missing attribute values, and used these data for rule induction while original, complete data sets were used for testing. In our experiments we used the MLEM2 rule induction algorithm of the LERS data mining system, based on rough sets. Our approach to missing attribute values was based on rough set theory as well. Results of our experiments show that for some data sets and some interpretation of missing attribute values, the error rate was smaller than for the original, complete data sets. Thus, rule sets induced from some data sets may be improved by increasing incompleteness of data sets. It appears that by removing some attribute values, the rule induction system, forced to induce rules from remaining information, may induce better rule sets.

#### References

- Booker, L. B., Goldberg, D. E., and F., H. J. (1990). Classifier systems and genetic algorithms. In Carbonell, J. G., editor, Machine Learning. Paradigms and Methods, pages 235-282. MIT Press, Boston.
- Chan, C. C. and Grzymala-Busse, J. W. (1991). On the attribute redundancy and the learning programs ID3, PRISM, and LEM2. Technical report, Department of Computer Science, University of Kansas.
- Grzymala-Busse, J. W. (1988). Knowledge acquisition under uncertainty-A rough set approach. Journal of Intelligent & Robotic Systems, 1:3-16.
- Grzymala-Busse, J. W. (1991). On the unknown attribute values in learning from examples. In Proceedings of the ISMIS-91, 6th International Symposium on Methodologies for Intelligent Systems, pages 368- 377.
- Grzymala-Busse, J. W. (1997). A new version of the rule induction system LERS. Fundamenta Informaticae, 31:27-39.
- Grzymala-Busse, J. W. (2002). MLEM2: A new algorithm for rule induction from imperfect data. In Proceedings of the 9th International Conference on Information Processing and Management of Uncertainty in Knowledge-Based Systems, (IPMU 2002), pages 243- 250.
- Grzymala-Busse, J. W. (2003). Rough set strategies to data with missing attribute values. In Workshop Notes, Foundations and New Directions of Data Mining, in conjunction with the 3-rd International Conference on Data Mining, pages 56-63.
- Grzymala-Busse, J. W. (2004). Three approaches to missing attribute values-a rough set perspective. In Proceedings of the Workshop on Foundation of Data Mining, in conjunction with the Fourth IEEE International Conference on Data Mining, pages 55-62.
- Grzymala-Busse, J. W. and Grzymala-Busse, W. J. (2007). An experimental comparison of three rough set approaches to missing attribute values. In Peters, J. F. and Skowron, A., editors, Transactions on Rough Sets, pages 31-50. Springer-Verlag, Berlin, Heidelberg.
- Grzymala-Busse, J. W. and Hu, M. (2000). A comparison of several approaches to missing attribute values in data mining. In Proceedings of the Second International Conference on Rough Sets and Current Trends in Computing, pages 340-347.
- Grzymala-Busse, J. W. and Rzasa, W. (2006). Local and global approximations for incomplete data. In Proceedings of the RSCTC 2006, the Fifth International Conference on Rough Sets and Current Trends in Computing, pages 244-253.
- Grzymala-Busse, J. W. and Rzasa, W. (2007). Definability of approximations for a generalization of the indiscernibility relation. In Proceedings of the 2007 IEEE Symposium on Foundations of Computational Intelligence (IEEE FOCI 2007), pages 65-72.
- Grzymala-Busse, J. W. and Wang, A. Y. (1997). Modified algorithms LEM1 and LEM2 for rule induction from data with missing attribute values. In Proceedings of the Fifth International Workshop on Rough Sets and Soft Computing (RSSC'97) at the Third Joint Conference on Information Sciences (JCIS'97), pages 69-72.
- Holland, J. H., Holyoak, K. J., and Nisbett, R. E. (1986). Induction. Processes of Inference, Learning, and Discovery. MIT Press, Boston.
- Kryszkiewicz, M. (1995). Rough set approach to incomplete information systems. In Proceedings of the Second Annual Joint Conference on Information Sciences, pages 194-197.
- Kryszkiewicz, M. (1999). Rules in incomplete information systems. Information Sciences, 113:271-292.
- Lin, T. Y. (1992). Topological and fuzzy rough sets. In Slowinski, R., editor, Intelligent Decision Support. Handbook of Applications and Advances of the Rough Sets Theory, pages 287-304. Kluwer Academic Publishers, Dordrecht, Boston, London.
- Pawlak, Z. (1982). Rough sets. International Journal of Computer and Information Sciences, 11:341-356.
- Pawlak, Z. (1991). Rough Sets. Theoretical Aspects of Reasoning about Data. Kluwer Academic Publishers, Dordrecht, Boston, London.
- Slowinski, R. and Vanderpooten, D. (2000). A generalized definition of rough approximations based on similarity. IEEE Transactions on Knowledge and Data Engineering, 12:331-336.
- Stefanowski, J. and Tsoukias, A. (1999). On the extension of rough sets under incomplete information. In Proceedings of the RSFDGrC'1999, 7th International Workshop on New Directions in Rough Sets, Data Mining, and Granular-Soft Computing, pages 73-81.
- Stefanowski, J. and Tsoukias, A. (2001). Incomplete information tables and rough classification. Computational Intelligence, 17:545-566.
- Wang, G. (2002). Extension of rough set under incomplete information systems. In Proceedings of the IEEE International Conference on Fuzzy Systems (FUZZ IEEE'2002), pages 1098-1103.
- Yao, Y. Y. (1998). Relational interpretations of neighborhood operators and rough set approximation operators. Information Sciences, 111:239-259.

#### Paper Citation

#### in Harvard Style

W. Grzymala-Busse J. and J. Grzymala-Busse W. (2008). **IMPROVING QUALITY OF RULE SETS BY INCREASING INCOMPLETENESS OF DATA SETS - A Rough Set Approach** . In *Proceedings of the Third International Conference on Software and Data Technologies - Volume 1: ICSOFT,* ISBN 978-989-8111-51-7, pages 241-248. DOI: 10.5220/0001881902410248

#### in Bibtex Style

@conference{icsoft08,

author={Jerzy W. Grzymala-Busse and Witold J. Grzymala-Busse},

title={IMPROVING QUALITY OF RULE SETS BY INCREASING INCOMPLETENESS OF DATA SETS - A Rough Set Approach},

booktitle={Proceedings of the Third International Conference on Software and Data Technologies - Volume 1: ICSOFT,},

year={2008},

pages={241-248},

publisher={SciTePress},

organization={INSTICC},

doi={10.5220/0001881902410248},

isbn={978-989-8111-51-7},

}

#### in EndNote Style

TY - CONF

JO - Proceedings of the Third International Conference on Software and Data Technologies - Volume 1: ICSOFT,

TI - IMPROVING QUALITY OF RULE SETS BY INCREASING INCOMPLETENESS OF DATA SETS - A Rough Set Approach

SN - 978-989-8111-51-7

AU - W. Grzymala-Busse J.

AU - J. Grzymala-Busse W.

PY - 2008

SP - 241

EP - 248

DO - 10.5220/0001881902410248