A Comparative Study on Outlier Removal from a Large-scale Dataset

using Unsupervised Anomaly Detection

Markus Goldstein and Seiichi Uchida

Department of Advanced Information Technology, Kyushu University,

744 Motooka, Nishi-ku, Fukuoka, 819-0395, Japan

Keywords:

Outlier Removal, Unsupervised Anomaly Detection, Handwritten Digit Recognition, Large-scale Dataset,

Data Cleansing, Inﬂuence of Outliers.

Abstract:

Outlier removal from training data is a classical problem in pattern recognition. Nowadays, this problem

becomes more important for large-scale datasets by the following two reasons: First, we will have a higher

risk of “unexpected” outliers, such as mislabeled training data. Second, a large-scale dataset makes it more

difﬁcult to grasp the distribution of outliers. On the other hand, many unsupervised anomaly detection methods

have been proposed, which can be also used for outlier removal. In this paper, we present a comparative study

of nine different anomaly detection methods in the scenario of outlier removal from a large-scale dataset.

For accurate performance observation, we need to use a simple and describable recognition procedure and

thus utilize a nearest neighbor-based classiﬁer. As an adequate large-scale dataset, we prepared a handwritten

digit dataset comprising of more than 800,000 manually labeled samples. With a data dimensionality of

16×16 = 256, it is ensured that each digit class has at least 100 times more instances than data dimensionality.

The experimental results show that the common understanding that outlier removal improves classiﬁcation

performance on small datasets is not true for high-dimensional large-scale datasets. Additionally, it was found

that local anomaly detection algorithms perform better on this data than their global equivalents.

1 INTRODUCTION

Outliers are instances in a dataset, which deviate

clearly from the norm. It seems to be logical to elimi-

nate outliers before classiﬁcation takes place. Indeed,

this was the main motivation of Grubbs (Grubbs,

1969), when he developed his ﬁrst outlier test. At that

time, parametric classiﬁcation models like a simple

Gaussian ﬁt were very sensitive to outliers. With the

development of more sophisticated classiﬁcation al-

gorithms, for example the Support Vector Machines

(SVM) (Sch

olkopf and Smola, 2002) or Artiﬁcial

Neural Networks (ANN) (Mehrotra et al., 1997), the

need for outlier removal decreased. The reason for

this trend was that these classiﬁers are not very sensi-

tive to outliers in the dataset any more, or have even

built-in outlier suppression techniques. However, the

research for the detection of outliers experienced a re-

vival from the year 2000 onwards, when many new

methods have been developed for anomaly detection.

In this research area, one is typically interested in the

anomalies (the outliers) itself, not primarily in their

removal. Anomalies can carry important information

for a variety of applications and are therefore of inter-

est in intrusion detection (Portnoy et al., 2001), medi-

cal diagnosis (Lin et al., 2005), fraud detection (Geb-

hardt et al., 2013) and surveillance (Basharat et al.,

2008).

Today, the terms outlier and anomaly are mainly

used as a synonym, whereas the removal of outliers

from a dataset is also often referred to as data cleans-

ing and the search for the outliers as anomaly detec-

tion. In the anomaly detection research domain, three

different learning modes based on the availability of

labels exist (Chandola et al., 2009; Goldstein, 2014).

In the case of having a fully labeled dataset with the

labels normal and anomalous, supervised anomaly

detection algorithms are used, which is very similar

to a standard classiﬁcation task. Second, if a dataset

contains only normal data and no anomalies, semi-

supervised anomaly detection algorithms could be ap-

plied. In this setup, typically a model of the norm is

learned and the deviation of the test data to that model

is used as an indicator for abnormality. A well-known

semi-supervised anomaly detection algorithm is the

One-class SVM (Sch

olkopf et al., 1999). The third

setup is unsupervised anomaly detection. Here, no

assumption about the data is made and it is only ana-

Goldstein, M. and Uchida, S.

A Comparative Study on Outlier Removal from a Large-scale Dataset using Unsupervised Anomaly Detection.

DOI: 10.5220/0005701302630269

In Proceedings of the 5th International Conference on Pattern Recognition Applications and Methods (ICPRAM 2016), pages 263-269

ISBN: 978-989-758-173-1

263

lyzed using its internal structure. The result of todays

unsupervised anomaly detection algorithms is often a

score instead of a binary label such that the results

can be ranked and further processing can draw more

sophisticated conclusions.

Unsupervised anomaly detection is in general a

challenging task since it is solely based on intrinsic

information and does not have a ground truth to opti-

mize a decision boundary. In this context, is is also of-

ten hard to decide what actually should be considered

as an anomaly and what not. An important concept is

the differentiation between global and local anoma-

lies. Global anomalies are suspicious instances with

respect to the whole dataset whereas local anomalies

are only noticeable with respect to their immediate

neighborhood. More information and detailed exam-

ples can be found in (Goldstein, 2014). Please note

that anomaly detection algorithms focus on the detec-

tion on either global or local anomalies.

Of course, unsupervised anomaly detection algo-

rithms can also be used for data cleansing by remov-

ing the top anomalies from the training data. In this

work, we utilize a variety of unsupervised anomaly

detection algorithms in order to study the effect of

outlier removal on handwritten character recognition.

The goal of this research is to gain a deeper under-

standing of the importance of anomalies in a dataset,

not the improvement of classiﬁcation accuracy. The

use of a large-scale dataset is of particular interest to

us in order to learn whether anomalies have signiﬁ-

cant inﬂuence at all in this situation.

2 RELATED WORK

Outlier detection and removal for improving accu-

racy has been studied extensively (Barnett and Lewis,

1994). In this context, it is important to stress out that

there exist multiple views of what an outlier is. Es-

pecially in research conducted on focusing on clas-

siﬁcation performance improvement, data instances

close to the decision boundary and also misclassiﬁ-

cations are named outliers. It is correct that these

are outliers with respect to a classiﬁcation problem,

but these instances are not necessarily outliers in a

statistical sense. This view on addressing an inter-

class anomaly detection is also often understood as

a preprocessing step for classiﬁcation (Sharma et al.,

2015).

This research focuses on an intra-class outlier def-

inition, which is a more statistical perspective. This

more general view can also be used to boost clas-

siﬁcation, but it detects also outliers far away from

decision boundaries. Although these anomalies will

very likely have no inﬂuence on classiﬁcation per-

formance, they might still be of particular interest.

In the application scenario of handwritten character

recognition, this could be mislabeled data, errors in

scanning, segmentation and binarization as well as

strong image distortions. Prior experiments were per-

formed (Smith and Martinez, 2011) with unsuper-

vised anomaly detection for outlier removal similar to

this work, but evaluation was only based on two un-

supervised anomaly detection algorithms (k-NN and

LOF) as well as a restriction on small datasets due to

implementation restrictions as stated by the authors.

Concerning handwritten digits, it was found (Guyon

et al., 1996) that outlier removal improves recognition

performance for a small dataset with less than 8,000

instances. In this work, we want to verify whether

this is still true for a large-scale dataset or whether

nowadays the huge amount of data compensates out-

lier elimination.

3 METHODOLOGY

3.1 Anomaly Detection

A huge variety of unsupervised anomaly detection

algorithms exit today. A comprehensive overview

as well as a categorization is presented by (Chan-

dola et al., 2009). The vast majority of the differ-

ent approaches is very resource demanding in terms

of time and memory. For our primary goal, the

analysis of a large-scale dataset, only a small sub-

set of algorithms can be utilized. In this work, the

algorithms will not be described in detail. Instead,

we only brieﬂy summarize their main characteristics.

As a categorization attempt, the algorithms might be

classiﬁed in three main groups: (1) Nearest-neighbor

based methods, (2) Clustering-based methods and (3)

Statistical methods. Statistical methods can again

be sub-classiﬁed into parametric or non-parametric

methods such as histograms (Goldstein and Dengel,

2012), Kernel-density estimation (Turlach, 1993) or

Gaussian Mixture Models (Lindsay, 1995). Besides

that, other methods based on classiﬁcation techniques

exist, such as Support Vector Machines (Amer et al.,

2013) or autoencoders (Hawkins et al., 2000). Due

to its complexity, most of the methods are not suit-

able for large-scale datasets. For that reason, we use

the histogram-based HBOS (Goldstein and Dengel,

2012) algorithm from this group only.

For the nearest-neighbor based approaches, the

global k-NN algorithm (Ramaswamy et al., 2000;

Angiulli and Pizzuti, 2002), the well-known Lo-

cal Outlier Factor (LOF) (Breunig et al., 2000), the

ICPRAM 2016 - International Conference on Pattern Recognition Applications and Methods

264

Connectivity-based Outlier Factor (COF) (Tang et al.,

2002), the Local Outlier Probability (LoOP) (Kriegel

et al., 2009) as well as the Inﬂuenced Outlierness

(INFLO) (Jin et al., 2006) were selected. Please note

that the k-NN algorithm is global and all the remain-

ing ones focus on detecting local anomalies. The idea

of LOF is to estimate a local density of a data instance

and then comparing it with the local densities of the k

neighbors. This procedure results in a spherical den-

sity estimation. COF works similar to LOF, but the

density estimation uses a minimum spanning tree ap-

proach instead. INFLO addresses a problem arising

in LOF, when clusters of different densities are close

to each other. In contrast, LoOP has a different ba-

sic approach using probabilities to identify anomalies.

Here, the local density is estimated by a half-Gaussian

distribution.

For the clustering-based approaches, the

Clustering-based Local Outlier Factor (CBLOF) (He

et al., 2003) and a modiﬁed version uCBLOF (Amer

and Goldstein, 2012) are the representative candi-

dates. The basic idea is to cluster the data using

k-means, remove too small clusters and afterwards

use the distance of each instance to the centroid

as an anomaly score. In CBLOF, additionally a

weighting factor is utilized according to the cluster’s

size. The Local Density Cluster-based Outlier

Factor (LDCOF) (Amer and Goldstein, 2012) also

uses k-means clustering as a basis, but additionally

estimates the cluster’s local density for computing the

anomaly score. In contrast to the CBLOF variants,

this procedure can be considered as taking local

cluster densities better into account. Additionally,

the Clustering-based Multivariate Gaussian Outlier

Score (CMGOS) (Goldstein, 2014) carries out this

idea further and uses the Mahalanobis distance for

computing the anomaly score. Since k-means clus-

tering is not deterministic, multiple runs might lead

to different anomaly detection results making it hard

to compare the different algorithms. For that reason,

the k-means clustering algorithm was performed 10

times and the most stable result was chosen as a basis

for all the clustering-based algorithms.

All used algorithms are available within an open

source anomaly detection plug-in

of the Rapid-

Miner (Mierswa et al., 2006) data mining software.

One goal of this implementation is the focus on large-

scale dataset processing.

3.2 Classiﬁcation

After utilizing the anomaly detection algorithms on

the training data, our goal is to remove anomalies

More information and download at http://git.io/vnD6n

from the training set and study the effect on the clas-

siﬁcation results using the test set for evaluation. Our

hypothesis is, that removing strong anomalies should

increase the classiﬁcation performance.

As already stated in the introduction, our focus is

not to tweak recognition rates, but to gain insight of

the internal structure of the large-scale data. For this

reason, we explicitly chose a classiﬁer being sensitive

to anomalies in order to immediately study their ef-

fect. To this end, we choose a one-nearest-neighbor

classiﬁer for evaluation. This has the big advantage

that a single removed outlier might directly inﬂuence

the classiﬁcation result. Of course, we are aware that

using a k-NN approach would be in general much bet-

ter and more robust with respect to maximizing clas-

siﬁcation performance.

As a distance measure, the Hamming distance was

used. It is intuitively interpretable and corresponds to

the Euclidean distance on binarized images.

4 EVALUATION

4.1 Dataset

Our large-scale character image dataset comprises in

total 819,725 handwritten digit images, separated ran-

domly into a 614,794 instances for anomaly detection

and training as well as 204,931 instances for the test

set. The size of each character image is 16 × 16 pixel

resulting in a feature vector of 256. The distribution

of the digits is not balanced since the data was pulled

from a real-world environment and the digit “0” oc-

curred approximately three times more often than the

other digits. The data has been binarized and was la-

beled manually. It is unknown if the labeling is abso-

lutely accurate. Also, the number of different writers

in the dataset is unknown.

4.2 Experimental Setup

First, the unsupervised anomaly detection algorithm

is applied separately on each of the 10 classes in the

training set. This results in 10 different lists with

scores describing the “outlierliness” of each instance.

These lists are then merged together and sorted by the

anomaly score. This ensures that the statistically most

signiﬁcant anomaly ranks top in this list, regardless

of its class. All nine algorithms presented in Sec-

tion 3.1 were used for evaluation with the exception

of CBLOF. The reason why CBLOF was excluded is

that it weights the resulting score with the cluster size.

Since the digit “0” occurs more often in the data, all

outliers from this class are ranked ﬁrst. However, the

A Comparative Study on Outlier Removal from a Large-scale Dataset using Unsupervised Anomaly Detection

265

unweighted version uCBLOF (Amer and Goldstein,

2012) was used instead. In a second step, the top N

outliers are removed and the performance of the one-

nearest-neighbor classiﬁer using the reduced training

set is evaluated on the test set. For N, the following

numbers were selected: 4, 8, 16, 32, 64, 128, 256,

512, 600, 1200, 1800, 2400, 3000, 3600, 4200, 4800,

5400 and 6000. The last value for N corresponds to

approximately removing 1% of the training data. The

reason for a more dense evaluation for small N is the

assumption that the classiﬁcation performance will

increase when removing the most obvious anomalies.

4.3 Anomaly Detection Results

The results of the anomaly detection algorithms can-

not directly be evaluated quantitatively due to the fact

that there is no ground truth. Nevertheless, we can

show the top anomalies detected by plotting the im-

ages with the highest scores. Figure 1 shows exem-

plary the top-10 result of the k-NN global anomaly

detection algorithm for all classes. The anomalies are

ordered according to their score with the highest score

on the left column. It can be seen that the anomaly

detection results are reasonable. For the class “8” the

two top anomalies seem to be mislabeled instances.

Also, the top-3 anomalies of the digit “5” might be

worth considering a mislabeling. The results of the

local outlier factor (LOF) algorithm are illustrated in

Figure 2 as a representative of a local anomaly de-

tector. As mentioned in the introduction, global and

local anomalies may differ a lot. It can be seen nicely

that some global anomalies cannot be detected by

LOF, for example for the digit “8”, but for the dig-

its “1”, “2” and “4” new interesting anomalies show

up. The results are also remarkable since it has been

shown that local anomaly detection algorithms tend to

perform worse than global algorithms on large-scale

datasets (Goldstein, 2014).

4.4 Classiﬁcation Results

First, the results of the unsupervised anomaly detec-

tion algorithms were sorted according to their outlier

score. Then, the top N anomalies of that list were re-

moved from the training data and the performance of

the 1-NN classiﬁer was measured. Since the dataset

is very large, removing few instances does not lead

to a huge change in the classiﬁcation accuracy mea-

sured as a percentage. For this reason, absolute num-

bers were used in the following plots. Please keep in

mind that classiﬁcation improvement is not our goal

at all, and that the presented insigniﬁcant change with

respect to accuracy should only be interpreted as a

Figure 1: The top-10 anomalies of the large-scale dataset

for every digit. The results have been computed using the

global k-NN method.

Figure 2: Showing the top-10 anomalies using the LOF al-

gorithm. Some global anomalies are not detected.

trend to answer the question of the effect of outlier

removal. Figure 4 and 5 show the classiﬁcation re-

sults of all algorithms, whereas the latter is a mag-

niﬁed view to verify our hypothesis that removing

the most obvious anomalies should increase recog-

nition performance. The plots also show a baseline,

among which no anomalies are removed as well as

a random strategy when N instances are removed by

chance from the training data.

The results were very astonishing to us. First of

all it can be seen that the performances of the differ-

ent anomaly detection algorithms differ a lot. While

typically the global k-NN and the LOF deliver good

results on average, the earlier performs very poorly

on our large-scale dataset. The results also show that

ICPRAM 2016 - International Conference on Pattern Recognition Applications and Methods

266

202720

202740

202760

202780

202800

202820

202840

202860

0 1000 2000 3000 4000 5000 6000

Correctly classified test data

Removed anomalies from training data

baseline

random

LOF

k-NN

INFLO

COF

LoOP

HBOS

uCBLOF

CMGOS

LDCOF

Figure 4: Results of the one-nearest-neighbor classiﬁer after removing the top-N anomalies using nine different unsupervised

anomaly detection algorithms.

Figure 3: The 10 most normal digit images of each digit

class for comparison determined using the global k-NN.

INFLO performs best on our dataset being at the same

time the only one improving classiﬁcation accuracy

in total. Figure 4 also illustrates that local anomaly

detection algorithms perform much better than the

global algorithms (4 lowest curves).

The most important result to us is that we could

not verify our initial hypothesis. Removing anoma-

lies, even only the most prominent ones, does not

guarantee an improvement of classiﬁcation accuracy.

On the contrary, chances are high that removing

anomalies is going to drop recognition performance

if too many of them are removed.

Table 1 shows the percentage of each digit class

among the top-1000 anomalies for each of the evalu-

ated algorithms. Some of the algorithm have a strong

bias to detect anomalies of a speciﬁc class, whereas

the digits “0” and “1” seems to have on average more

detected anomalies than the other digits.

5 CONCLUSIONS

In this paper we evaluated the effect of removing

intra-class anomalies from a large-scale handwritten

digit dataset. Nine different unsupervised anomaly

detection algorithms have been used in order to cover

a wide range, taking global and local approaches

into account as well as covering all the basic under-

lying mathematical methodologies. A one-nearest-

neighbor classiﬁer was then used to evaluate the ef-

fect of anomaly removal from the training data with

respect to classiﬁcation accuracy. The goal was not to

tweak the accuracy but to derive a general statement

about the usefulness of anomaly removal. For smaller

datasets, it was shown previously that outlier removal

is beneﬁcial. Our experiments showed that remov-

ing anomalies from large-scale character datasets is in

general not a good idea. Summarizing our results, the

beneﬁt from removing the obvious anomalies is very

low compared to the risk of dropping performance

A Comparative Study on Outlier Removal from a Large-scale Dataset using Unsupervised Anomaly Detection

267

202836

202838

202840

202842

202844

202846

202848

202850

202852

202854

4 8 16 32 64 128 256 512

Correctly classified test data

Removed anomalies from training data

baseline

random

LOF

k-NN

INFLO

COF

LoOP

HBOS

uCBLOF

CMGOS

LDCOF

Figure 5: Magniﬁed view of Figure 4 showing the effect of the most important anomalies.

Table 1: Outlier class distribution. The percentage of anomalies among the top-1000 for each anomaly detection algorithm.

0 1 2 3 4 5 6 7 8 9

LOF 5.2 63.0 5.0 3.4 1.1 1.4 5.6 5.8 2.1 7.4

k-NN 3.9 0.1 8.1 9.4 16.5 11.5 4.1 1.5 42.1 2.7

INFLO 3.8 71.3 3.6 2.4 0.7 0.7 5.2 4.9 1.5 5.9

COF 2.2 71.9 4.3 1.4 2.4 1.3 4.2 3.0 1.6 7.7

LoOP 15.2 14.7 16.3 7.3 8.2 6.0 7.6 10.2 7.6 6.9

HBOS 21.7 6.2 0.1 11.4 0.9 5.6 18.7 18.3 3.3 13.8

uCBLOF 35.4 1.1 3.6 7.5 6.5 7.3 11.5 9.0 11.1 7.0

CMGOS 28.4 6.1 5.2 8.5 3.7 10.1 8.8 7.2 12.7 9.3

LDCOF 11.4 53.4 0.3 5.7 0.1 1.0 8.8 7.1 0.3 11.9

due to removing too many important instances. When

comparing our anomaly removal with a random re-

moval strategy, it is even possible to state that anoma-

lies are very important for the classiﬁcation accuracy

and should remain in the large-scale dataset.

However, our experiments additionally showed

that unsupervised anomaly detection algorithms can

be used to manually review the top anomalies – on our

dataset we gained insight about incorrectly labeled in-

stances, found upside-down images as well as images

which can be considered as noise.

ACKNOWLEDGEMENTS

This research is supported by The Japan Science and

Technology Agency (JST) through its “Center of In-

novation Science and Technology based Radical In-

novation and Entrepreneurship Program (COI Pro-

gram)”.

REFERENCES

Amer, M. and Goldstein, M. (2012). Nearest-neighbor

and clustering based anomaly detection algorithms for

rapidminer. In Simon Fischer, I. M., editor, Pro-

ceedings of the 3rd RapidMiner Community Meeting

and Conferernce (RCOMM 2012), pages 1–12. Shaker

Verlag GmbH.

Amer, M., Goldstein, M., and Abdennadher, S. (2013). En-

hancing one-class support vector machines for unsu-

pervised anomaly detection. In Proceedings of the

ACM SIGKDD Workshop on Outlier Detection and

Description (ODD ’13), pages 8–15, New York, NY,

USA. ACM Press.

Angiulli, F. and Pizzuti, C. (2002). Fast outlier detection in

high dimensional spaces. In Elomaa, T., Mannila, H.,

ICPRAM 2016 - International Conference on Pattern Recognition Applications and Methods

268

and Toivonen, H., editors, Principles of Data Mining

and Knowledge Discovery, volume 2431 of Lecture

Notes in Computer Science, pages 43–78. Springer

Berlin / Heidelberg.

Barnett, V. and Lewis, T. (1994). Outliers in Statistical

Data. Wiley Series in Probability & Statistics. Wiley.

Basharat, A., Gritai, A., and Shah, M. (2008). Learning

object motion patterns for anomaly detection and im-

proved object detection. In Computer Vision and Pat-

tern Recognition. (CVPR 2008). IEEE Conference on,

pages 1–8. IEEE Computer Society Press.

Breunig, M. M., Kriegel, H.-P., Ng, R. T., and Sander, J.

(2000). LOF: Identifying density-based local outliers.

In Proceedings of the 2000 ACM SIGMOD Interna-

tional Conference on Management of Data, pages 93–

104, Dallas, Texas, USA. ACM Press.

Chandola, V., Banerjee, A., and Kumar, V. (2009).

Anomaly detection: A survey. ACM Computing Sur-

veys, 41(3):1–58.

Gebhardt, J., Goldstein, M., Shafait, F., and Dengel, A.

(2013). Document authentication using printing tech-

nique features and unsupervised anomaly detection. In

Proceedings of the 12th International Conference on

Document Analysis and Recognition (ICDAR 2013),

pages 479–483. IEEE Computer Society Press.

Goldstein, M. (2014). Anomaly Detection in Large

Datasets. Phd-thesis, University of Kaiserslautern,

Germany.

Goldstein, M. and Dengel, A. (2012). Histogram-based out-

lier score (hbos): A fast unsupervised anomaly detec-

tion algorithm. In W

olﬂ, S., editor, KI-2012: Poster

and Demo Track, pages 59–63. Online.

Grubbs, F. E. (1969). Procedures for Detecting Outlying

Observations in Samples. Technometrics, 11(1):1–21.

Guyon, I., Matic, N., and Vapnik, V. (1996). Discovering

informative patterns and data cleaning. Advances in

Knowledge Discovery and Data Mining, pages 181–

203.

Hawkins, S., He, H., Williams, G. J., and Baxter, R. A.

(2000). Outlier detection using replicator neural net-

works. In Proceedings of the 4th International Con-

ference on Data Warehousing and Knowledge Dis-

covery (DaWaK 2000), pages 170–180, London, UK.

Springer-Verlag.

He, Z., Xu, X., and Deng, S. (2003). Discovering cluster-

based local outliers. Pattern Recognition Letters,

24(9-10):1641–1650.

Jin, W., Tung, A., Han, J., and Wang, W. (2006). Ranking

outliers using symmetric neighborhood relationship.

In Ng, W.-K., Kitsuregawa, M., Li, J., and Chang, K.,

editors, Advances in Knowledge Discovery and Data

Mining, volume 3918 of Lecture Notes in Computer

Science, pages 577–593. Springer Berlin / Heidelberg.

Kriegel, H.-P., Kr

oger, P., Schubert, E., and Zimek, A.

(2009). Loop: Local outlier probabilities. In Proceed-

ing of the 18th ACM Conference on Information and

Knowledge Management (CIKM ’09), pages 1649–

1652, New York, NY, USA. ACM Press.

Lin, J., Keogh, E., Fu, A., and Herle, H. V. (2005). Approx-

imations to magic: Finding unusual medical time se-

ries. In In 18th IEEE Symposium on Computer-Based

Medical Systems (CBMS), pages 23–24. IEEE Com-

puter Society Press.

Lindsay, B. (1995). Mixture Models: Theory, Geometry,

and Applications. NSF-CBMS Regional Conference

Series in Probability and Statistics. Institute of Math-

ematical Statistics, Penn. State University.

Mehrotra, K., Mohan, C. K., and Ranka, S. (1997). Ele-

ments of Artiﬁcial Neural Networks. MIT Press, Cam-

bridge, MA, USA.

Mierswa, I., Wurst, M., Klinkenberg, R., Scholz, M.,

and Euler, T. (2006). Yale: Rapid prototyping for

complex data mining tasks. In Proceedings of the

ACM SIGKDD International Conference on Knowl-

edge Discovery and Data Mining (KDD 2006), pages

935–940, New York, NY, USA. ACM Press.

Portnoy, L., Eskin, E., and Stolfo, S. (2001). Intrusion de-

tection with unlabeled data using clustering. In In Pro-

ceedings of ACM CSS Workshop on Data Mining Ap-

plied to Security (DMSA-2001), pages 5–8.

Ramaswamy, S., Rastogi, R., and Shim, K. (2000). Efﬁcient

algorithms for mining outliers from large data sets. In

Proceedings of the 2000 ACM SIGMOD International

Conference on Management of Data (SIGMOD ’00),

pages 427–438, New York, NY, USA. ACM Press.

Sch

olkopf, B. and Smola, A. J. (2002). Learning with Ker-

nels: Support Vector Machines, Regularization, Op-

timization, and Beyond. Adaptive Computation and

Machine Learning. MIT Press, Cambridge, MA.

Sch

olkopf, B., Williamson, R. C., Smola, A. J., Shawe-

Taylor, J., and Platt, J. C. (1999). Support vector

method for novelty detection. In Advances in Neu-

ral Information Processing Systems 12 (NIPS), pages

582–588. The MIT Press.

Sharma, P. K., Haleem, H., and Ahmad, T. (2015). Improv-

ing classiﬁcation by outlier detection and removal. In

Emerging ICT for Bridging the Future - Proceedings

of the 49th Annual Convention of the Computer So-

ciety of India CSI Volume 2, volume 338 of Advances

in Intelligent Systems and Computing, pages 621–628.

Springer International Publishing.

Smith, M. and Martinez, T. (2011). Improving classiﬁcation

accuracy by identifying and removing instances that

should be misclassiﬁed. In Neural Networks (IJCNN),

The 2011 International Joint Conference on, pages

2690–2697.

Tang, J., Chen, Z., Fu, A., and Cheung, D. (2002). Enhanc-

ing effectiveness of outlier detections for low density

patterns. In Chen, M.-S., Yu, P., and Liu, B., editors,

Advances in Knowledge Discovery and Data Mining,

volume 2336 of Lecture Notes in Computer Science,

pages 535–548. Springer Berlin / Heidelberg.

Turlach, B. A. (1993). Bandwidth selection in kernel den-

sity estimation: A review.

A Comparative Study on Outlier Removal from a Large-scale Dataset using Unsupervised Anomaly Detection

269