# TO AGGREGATE OR NOT TO AGGREGATE: THAT IS THE QUESTION

### Eric Paquet, Herna L. Viktor, Hongyu Guo

#### Abstract

Consider a scenario where one aims to learn models from data being characterized by very large fluctuations that are neither attributable to noise nor outliers. This may be the case, for instance, when examining supermarket ketchup sales, predicting earthquakes and when conducting financial data analysis. In such a situation, the standard central limit theorem does not apply, since the associated Gaussian distribution exponentially suppresses large fluctuations. In this paper, we argue that, in many cases, the incorrect assumption leads to misleading and incorrect data mining results. We illustrate this argument against synthetic data, and show some results against stock market data.

#### References

- Groot, R. D. (2005). Lévy distribution and long correlation times in supermarket sales. Lvy distribution and long correlation times in supermarket sales, 353:501-514.
- Han, J., Kamber, M., and Pei, J. (2006). Data Mining: Concepts and Techniques (2nd edition). Morgan Kauffman.
- Samoradnitsky, G. and Taqqu, M. (1994). Stable NonGaussian Random Processes: Stochastic Models with Infinite Variance. Chapman & Hall, New York.
- Véhel, J. L. and Walter, C. (2002). Les marchés fractals (The fractal markets). Universitaires de France, Paris.
- Walter, C. (1999). Lévy-stability-under-addition and fractal structure of markets: implications for the investment management industry and emphasized examination of matif notional contract. Mathematical and Computer Modelling, 29(10-12):37-56.

#### Paper Citation

#### in Harvard Style

Paquet E., L. Viktor H. and Guo H. (2011). **TO AGGREGATE OR NOT TO AGGREGATE: THAT IS THE QUESTION** . In *Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2011)* ISBN 978-989-8425-79-9, pages 346-349. DOI: 10.5220/0003686903540357

#### in Bibtex Style

@conference{kdir11,

author={Eric Paquet and Herna L. Viktor and Hongyu Guo},

title={TO AGGREGATE OR NOT TO AGGREGATE: THAT IS THE QUESTION},

booktitle={Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2011)},

year={2011},

pages={346-349},

publisher={SciTePress},

organization={INSTICC},

doi={10.5220/0003686903540357},

isbn={978-989-8425-79-9},

}

#### in EndNote Style

TY - CONF

JO - Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2011)

TI - TO AGGREGATE OR NOT TO AGGREGATE: THAT IS THE QUESTION

SN - 978-989-8425-79-9

AU - Paquet E.

AU - L. Viktor H.

AU - Guo H.

PY - 2011

SP - 346

EP - 349

DO - 10.5220/0003686903540357