INTEGRATED CANDIDATE GENERATION IN PROCESSING BATCHES OF FREQUENT ITEMSET QUERIES USING APRIORI

Piotr Jedrzejczak, Marek Wojciechowski

2010

Abstract

Frequent itemset mining can be regarded as advanced database querying where a user specifies constraints on the source dataset and patterns to be discovered. Since such frequent itemset queries can be submitted to the data mining system in batches, a natural question arises whether a batch of queries can be processed more efficiently than by executing each query individually. So far, two methods of processing batches of frequent itemset queries have been proposed for the Apriori algorithm: Common Counting, which integrates only the database scans required to process the queries, and Common Candidate Tree, which extends the concept by allowing the queries to also share their main memory structures. In this paper we propose a new method called Common Candidates, which further integrates processing of the queries from a batch by performing integrated candidate generation.

References

  1. Agrawal, R., Imielinski, T., Swami, A., 1993. Mining Association Rules Between Sets of Items in Large Databases, In Proc. of the 1993 ACM SIGMOD Conf.
  2. Agrawal, R., Mehta, M., Shafer, J., Srikant, R., Arning, A., Bollinger, T., 1996. The Quest Data Mining System, In Proc. of the 2nd KDD Conference.
  3. Agrawal, R., Srikant, R., 1994. Fast Algorithms for Mining Association Rules, In Proc. of the 20th VLDB Conference.
  4. Baralis, E., Psaila, G.,1999. Incremental Refinement of Mining Queries, In Proceedings of the 1st DaWaK Conference.
  5. Blockeel, H., Dehaspe, L., Demoen, B., Janssens, G., Ramon, J., Vandecasteele, H., 2002. Improving the Efficiency of Inductive Logic Programming Through the Use of Query Packs, Journal of Artificial Intelligence Research, Vol. 16.
  6. Grudzinski, P., Wojciechowski, M., 2007. Integration of Candidate Hash Trees in Concurrent Processing of Frequent Itemset Queries Using Apriori, In Proc. of the 3rd ADMKD Workshop.
  7. Imielinski, T., Mannila, H., 1996. A Database Perspective on Knowledge Discovery, Communications of the ACM, Vol. 39.
  8. Jin, R., Sinha, K., Agrawal, G., 2005. Simultaneous Optimization of Complex Mining Tasks with a Knowledgeable Cache, In Proc. of the 11th KDD Conference.
  9. Meo, R., 2003. Optimization of a Language for Data Mining, In Proc. of the ACM SAC Conference.
  10. Pei, J., Han, J., 2000. Can We Push More Constraints into Frequent Pattern Mining?, In Proc. of the 6th KDD Conference.
  11. Sellis, T., 1988. Multiple-query optimization, ACM Transactions on Database Systems, Vol. 13.
  12. Wojciechowski, M., Zakrzewicz, M., 2002. Methods for Batch Processing of Data Mining Queries, In Proc. of the 5th DB&IS Conference.
Download


Paper Citation


in Harvard Style

Jedrzejczak P. and Wojciechowski M. (2010). INTEGRATED CANDIDATE GENERATION IN PROCESSING BATCHES OF FREQUENT ITEMSET QUERIES USING APRIORI . In Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2010) ISBN 978-989-8425-28-7, pages 487-490. DOI: 10.5220/0003099704870490


in Bibtex Style

@conference{kdir10,
author={Piotr Jedrzejczak and Marek Wojciechowski},
title={INTEGRATED CANDIDATE GENERATION IN PROCESSING BATCHES OF FREQUENT ITEMSET QUERIES USING APRIORI},
booktitle={Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2010)},
year={2010},
pages={487-490},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0003099704870490},
isbn={978-989-8425-28-7},
}


in EndNote Style

TY - CONF
JO - Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2010)
TI - INTEGRATED CANDIDATE GENERATION IN PROCESSING BATCHES OF FREQUENT ITEMSET QUERIES USING APRIORI
SN - 978-989-8425-28-7
AU - Jedrzejczak P.
AU - Wojciechowski M.
PY - 2010
SP - 487
EP - 490
DO - 10.5220/0003099704870490