ON THE EVALUATION OF MINED FREQUENT SEQUENCES - An Evidence Theory-based Method

Francisco Guil, Francisco Palacios, Manuel Campos, Roque Marín

2010

Abstract

Frequent sequences (or temporal associations) mining is a very important topic within the temporal data mining area. Syntactic simplicity, combined with the dual characteristics (descriptive and predictive) of the mined temporal patterns, allow the extraction of useful knowledge from dynamic domains, which are timevarying in nature. Some of the most representative algorithms for mining sequential patterns or frequent associations are Apriori-like algorithms and, therefore, they cannot handle numeric attributes or items. This peculiarity makes it necessary to add a new process in the data preparation step, the discretization process. An important fact is that, depending on the discretization technique used, the number and type of discovered temporal patterns change dramatically. In this paper, we propose a method based on the Shafer’s Theory of Evidence that uses two information measures proposed by Yager for the quality evaluation of the extracted sets of temporal patterns. From a practical point of view, the main goal is to select, for a given dataset, the best discretization technique that leads to the discovery of useful knowledge. Nevertheless the underlying idea is to propose a formal method for assessing the mined patterns, seen as a belief structure, in terms of certainty in the information that represents. In this work, we also present a practical example, describing an application of this proposal in the Intensive Care Burn Unit domain.

References

  1. Agrawal, R., Imielinski, T., and Swami, A. (1993). Mining association rules between sets of items in large databases. In Proc. of the ACM SIGMOD Int. Conf. on Management of Data, pages 207-216.
  2. Dubois, D. and Prade, H. (1985). A note on measures of specificity for fuzzy sets. International Journal of General Systems, 10:279-283.
  3. Dubois, D. and Prade, H. (1999). Properties of measures of information in evidence and possibility theories. Fuzzy Sets and System, 100:35-49.
  4. Guil, F., Bosch, A., and Marín, R. (2004). Tset: An algorithm for mining frequent temporal patterns. In Proc. of the 1st Int. Workshop on Knowledge Discovery in Data Streams, in conjunction with ECML/PKDD 2004, pages 65-74.
  5. Shafer, G. (1976). A Mathematical Theory of Evidence. Princenton University Press, Princenton, NJ.
  6. Srikant, R. and Agrawal, R. (1996). Mining sequential patterns: Generalizations and performance improvements. In Proc. of the 5th Int. Conf. on Extending Database Technology, pages 3-17.
  7. Yager, R. (1981). Measurement of properties of fuzzy sets and possibility distributions. In Proc. of the Third Int. Seminar on Fuzzy Sets, pages 211-222.
  8. Yager, R. (1983). Entropy and specificity in a mathematical theory of evidence. International Journal of General Systems, 9:249-260.
  9. Zaki, M. (2001). Spade: An efficient algorithm for mining frequent sequences. Machine Learning, 42(1/2):31- 60.
Download


Paper Citation


in Harvard Style

Guil F., Palacios F., Campos M. and Marín R. (2010). ON THE EVALUATION OF MINED FREQUENT SEQUENCES - An Evidence Theory-based Method . In Proceedings of the Third International Conference on Health Informatics - Volume 1: HEALTHINF, (BIOSTEC 2010) ISBN 978-989-674-016-0, pages 263-268. DOI: 10.5220/0002736202630268


in Bibtex Style

@conference{healthinf10,
author={Francisco Guil and Francisco Palacios and Manuel Campos and Roque Marín},
title={ON THE EVALUATION OF MINED FREQUENT SEQUENCES - An Evidence Theory-based Method},
booktitle={Proceedings of the Third International Conference on Health Informatics - Volume 1: HEALTHINF, (BIOSTEC 2010)},
year={2010},
pages={263-268},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0002736202630268},
isbn={978-989-674-016-0},
}


in EndNote Style

TY - CONF
JO - Proceedings of the Third International Conference on Health Informatics - Volume 1: HEALTHINF, (BIOSTEC 2010)
TI - ON THE EVALUATION OF MINED FREQUENT SEQUENCES - An Evidence Theory-based Method
SN - 978-989-674-016-0
AU - Guil F.
AU - Palacios F.
AU - Campos M.
AU - Marín R.
PY - 2010
SP - 263
EP - 268
DO - 10.5220/0002736202630268