# A Formal Approach to Anomaly Detection

### André Eriksson, Hedvig Kjellström

#### Abstract

While many advances towards effective anomaly detection techniques targeting specific applications have been made in recent years, little work has been done to develop application-agnostic approaches to the subject. In this article, we present such an approach, in which anomaly detection methods are treated as formal, structured objects. We consider a general class of methods, with an emphasis on methods that utilize structural properties of the data they operate on. For this class of methods, we develop a decomposition into sub-methods—simple, restricted objects, which may be reasoned about independently and combined to form methods. As we show, this formalism enables the construction of software that facilitates formulating, implementing, evaluating, as well as algorithmically finding and calibrating anomaly detection methods.

#### References

- Abraham, B. and Box, G. E. (1979). Bayesian analysis of some outlier problems in time series. Biometrika, 66(2):229-236.
- Abraham, B. and Chuang, A. (1989). Outlier detection and time series modeling. Technometrics, 31(2):241-248.
- Agyemang, M., Barker, K., and Alhajj, R. (2006). A comprehensive survey of numeric and symbolic outlier mining techniques. Intelligent Data Analysis, 10(6):521-538.
- Basu, S. and Meckesheimer, M. (2007). Automatic outlier detection for time series: an application to sensor data. Knowledge and Information Systems, 11(2):137-154.
- Berndt, D. J. and Clifford, J. (1994). Using dynamic time warping to find patterns in time series. In KDD workshop, volume 10, pages 359-370.
- Chandola, V. (2009). Anomaly detection for symbolic sequences and time series data. PhD thesis, University of Minnesota.
- Chandola, V., Banerjee, A., and Kumar, V. (2009). Anomaly detection: A survey. ACM Computing Surveys (CSUR), 41(3):15.
- Chandola, V., Banerjee, A., and Kumar, V. (2012). Anomaly detection for discrete sequences: A survey. Knowledge and Data Engineering, IEEE Transactions on, 24(5):823-839.
- Chen, Y., Keogh, E., Hu, B., Begum, N., Bagnall, A., Mueen, A., and Batista, G. (2014). The ucr time series classification archive. www.cs.ucr.edu/e˜amonn/time series data/. Accessed: 2014-09-13.
- Ding, H., Trajcevski, G., Scheuermann, P., Wang, X., and Keogh, E. (2008). Querying and mining of time series data: experimental comparison of representations and distance measures. Proceedings of the VLDB Endowment, 1(2):1542-1552.
- Etsy (2015). Etsy Skyline. github.com/etsy/skyline. Accessed: 2015-02-10.
- Fox, A. J. (1972). Outliers in time series. Journal of the Royal Statistical Society. Series B (Methodological), pages 350-363.
- Fu, A. W.-C., Leung, O. T.-W., Keogh, E., and Lin, J. (2006). Finding time series discords based on Haar transform. In Advanced Data Mining and Applications, pages 31-41. Springer.
- Fu, T.-c. (2011). A review on time series data mining. Engineering Applications of Artificial Intelligence, 24(1):164-181.
- Galeano, P., Pen˜a, D., and Tsay, R. S. (2006). Outlier detection in multivariate time series by projection pursuit. Journal of the American Statistical Association, 101(474):654-669.
- Hodge, V. J. and Austin, J. (2004). A Survey of Outlier Detection Methodologies. Artificial Intelligence Review, 22(2):85-126.
- Keogh, E., Lin, J., and Fu, A. (2005). Hot sax: Efficiently finding the most unusual time series subsequence. In Data mining, fifth IEEE international conference on.
- Keogh, E., Lin, J., Lee, S.-H., and Van Herle, H. (2007). Finding the most unusual time series subsequence: algorithms and applications. Knowledge and Information Systems, 11(1):1-27.
- Lazarevic, A., Ertöz, L., Kumar, V., Ozgur, A., and Srivastava, J. (2003). A comparative study of anomaly detection schemes in network intrusion detection. In SDM, pages 25-36.
- Lin, J., Keogh, E., Wei, L., and Lonardi, S. (2007). Experiencing sax: a novel symbolic representation of time series. Data Mining and Knowledge Discovery, 15(2):107-144.
- Ma, J. and Perkins, S. (2003). Online novelty detection on temporal sequences. In Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 613-618.
- Markou, M. and Singh, S. (2003a). Novelty detection: a review, part 1: statistical approaches. Signal processing, 83(12):2481-2497.
- Markou, M. and Singh, S. (2003b). Novelty detection: a review-part 2:: neural network based approaches. Signal processing, 83(12):2499-2521.
- Phua, C., Lee, V., Smith, K., and Gayler, R. (2010). A comprehensive survey of data mining-based fraud detection research. arXiv preprint arXiv:1009.6119.
- Tsay, R. S., Pen˜a, D., and Pankratz, A. E. (2000). Outliers in multivariate time series. Biometrika, 87(4):789-804.
- Twitter (2015). AnomalyDetection R Package. github.com/twitter/anomalydetection. Accessed: 2015-02-10.

#### Paper Citation

#### in Harvard Style

Eriksson A. and Kjellström H. (2016). **A Formal Approach to Anomaly Detection** . In *Proceedings of the 5th International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM,* ISBN 978-989-758-173-1, pages 317-326. DOI: 10.5220/0005710803170326

#### in Bibtex Style

@conference{icpram16,

author={André Eriksson and Hedvig Kjellström},

title={A Formal Approach to Anomaly Detection},

booktitle={Proceedings of the 5th International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM,},

year={2016},

pages={317-326},

publisher={SciTePress},

organization={INSTICC},

doi={10.5220/0005710803170326},

isbn={978-989-758-173-1},

}

#### in EndNote Style

TY - CONF

JO - Proceedings of the 5th International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM,

TI - A Formal Approach to Anomaly Detection

SN - 978-989-758-173-1

AU - Eriksson A.

AU - Kjellström H.

PY - 2016

SP - 317

EP - 326

DO - 10.5220/0005710803170326