Generative Modeling of Itemset Sequences Derived from Real Databases

Rui Henriques, Cláudia Antunes

2014

Abstract

The increasingly studied problem of discovering temporal and attribute dependencies from multi-sets of events derived from real-world databases can be mapped as a sequential pattern mining task over itemset sequences. Still, the length and local nature of pattern-based models have been limiting its application. Although generative approaches can offer a critical compact and probabilistic view of sequential patterns, existing contributions are only prepared to deal with sequences of single elements. This work targets the task of modeling itemset sequences under a Markov assumption using models centered on sequential patterns. Experimental results hold evidence for the ability to model sequential patterns with acceptable completeness and precision levels, and with superior efficiency for dense or large datasets. We show that the proposed learning setting allows: i) compact representations; ii) the probabilistic decoding of patterns; iii) the inclusion of user-driven constraints through simple parameterizations; and iv) the use of the generative pattern-centered models to support key tasks such as classification. Relevance is demonstrated on retail and administrative databases.

References

  1. Agrawal, R. and Srikant, R. (1995). Mining sequential patterns. In ICDE, pages 3-14. IEEE CS.
  2. Baldi, P. and Brunak, S. (2001). Bioinformatics: The Machine Learning Approach. Adaptive Comp. and Mach. Learning. MIT Press, 2nd edition.
  3. Bishop, C. (2006). Pattern Recognition and Machine Learning. Info. Science and Stat. Springer.
  4. Brand, M. (1999). Structure learning in conditional probability models via an entropic prior and parameter extinction. Neural Comput., 11(5):1155-1182.
  5. Brown, M., Hughey, R., Krogh, A., Mian, I. S., Sjölander, K., and Haussler, D. (1993). Using dirichlet mixture priors to derive hidden markov models for protein families. In 1st IC on Int. Sys. for Molecular Bio., pages 47-55. AAAI Press.
  6. Cao, L., Ou, Y., Yu, P. S., and Wei, G. (2010). Detecting abnormal coupled sequences and sequence changes in group-based manipulative trading behaviors. In ACM SIGKDD, pages 85-94. ACM.
  7. Chudova, D. and Smyth, P. (2002). Pattern discovery in sequences under a markov assumption. In ACM SIGKDD, pages 153-162. ACM.
  8. Fujiwara, Y., Asogawa, M., and Konagaya, A. (1994). Stochastic motif extraction using hidden markov model. In ISMB, pages 121-129. AAAI.
  9. Ge, X. and Smyth, P. (2000). Deformable markov model templates for time-series pattern matching. In ACM SIGKDD, pages 81-90. ACM.
  10. Guralnik, V., Wijesekera, D., and Srivastava, J. (1998). Pattern directed mining of sequence data. In ACM SIGKDD, pages 51-57.
  11. Henriques, R. and Antunes, C. (2014). Learning predictive models from integrated healthcare data: Capturing temporal and cross-attribute dependencies. In HICSS. IEEE.
  12. Henriques, R., Pina, S. M., and Antunes, C. (2013). Temporal mining of integrated healthcare data: Methods, revealings and implications. In SDM: 2nd IW on Data Mining for Medicine and Healthcare. SIAM Pub.
  13. Jacquemont, S., Jacquenet, F., and Sebban, M. (2009). Mining probabilistic automata: a statistical view of sequential pattern mining. Mach. Learn., 75(1):91-127.
  14. Laxman, S., Sastry, P., and Unnikrishnan, K. (2005). Discovering frequent episodes and learning hidden markov models: A formal connection. IEEE TKDE, 17:1505-1517.
  15. Liu, J., Neuwald, A., and Lawrence, C. (1995). Bayesian models for multiple local sequence alignment and gibbs sampling strategies. American Stat. Ass., 90(432):1156-1170.
  16. Mannila, H. and Meek, C. (2000). Global partial orders from sequential data. In ACM SIGKDD, pages 161- 168. ACM.
  17. Murphy, K. (2002). Dynamic Bayesian Networks: Representation, Inference and Learning. PhD thesis, UC Berkeley, CS.
  18. Pei, J., Han, J., Mortazavi-Asl, B., Pinto, H., Chen, Q., Dayal, U., and Hsu, M. (2001). Prefixspan: Mining sequential patterns by prefix-projected growth. In ICDE, pages 215-224. IEEE CS.
  19. Xiang, R., Neville, J., and Rogati, M. (2010). Modeling relationship strength in online social networks. In IC on World wide web, WWW, pages 981-990. ACM.
Download


Paper Citation


in Harvard Style

Henriques R. and Antunes C. (2014). Generative Modeling of Itemset Sequences Derived from Real Databases . In Proceedings of the 16th International Conference on Enterprise Information Systems - Volume 1: ICEIS, ISBN 978-989-758-027-7, pages 264-272. DOI: 10.5220/0004898302640272


in Bibtex Style

@conference{iceis14,
author={Rui Henriques and Cláudia Antunes},
title={Generative Modeling of Itemset Sequences Derived from Real Databases},
booktitle={Proceedings of the 16th International Conference on Enterprise Information Systems - Volume 1: ICEIS,},
year={2014},
pages={264-272},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0004898302640272},
isbn={978-989-758-027-7},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 16th International Conference on Enterprise Information Systems - Volume 1: ICEIS,
TI - Generative Modeling of Itemset Sequences Derived from Real Databases
SN - 978-989-758-027-7
AU - Henriques R.
AU - Antunes C.
PY - 2014
SP - 264
EP - 272
DO - 10.5220/0004898302640272