loading
Papers

Research.Publish.Connect.

Paper

Paper Unlock

Authors: Markus Goldstein and Seiichi Uchida

Affiliation: Kyushu University, Japan

ISBN: 978-989-758-173-1

Keyword(s): Outlier Removal, Unsupervised Anomaly Detection, Handwritten Digit Recognition, Large-scale Dataset,Data Cleansing, Influence of Outliers.

Related Ontology Subjects/Areas/Topics: Applications ; Classification ; Clustering ; Computer Vision, Visualization and Computer Graphics ; Density Estimation ; Image Understanding ; Pattern Recognition ; Theory and Methods

Abstract: Outlier removal from training data is a classical problem in pattern recognition. Nowadays, this problem becomes more important for large-scale datasets by the following two reasons: First, we will have a higher risk of “unexpected” outliers, such as mislabeled training data. Second, a large-scale dataset makes it more difficult to grasp the distribution of outliers. On the other hand, many unsupervised anomaly detection methods have been proposed, which can be also used for outlier removal. In this paper, we present a comparative study of nine different anomaly detection methods in the scenario of outlier removal from a large-scale dataset. For accurate performance observation, we need to use a simple and describable recognition procedure and thus utilize a nearest neighbor-based classifier. As an adequate large-scale dataset, we prepared a handwritten digit dataset comprising of more than 800,000 manually labeled samples. With a data dimensionality of 16×16 = 256, it is ensured that each digit class has at least 100 times more instances than data dimensionality. The experimental results show that the common understanding that outlier removal improves classification performance on small datasets is not true for high-dimensional large-scale datasets. Additionally, it was found that local anomaly detection algorithms perform better on this data than their global equivalents. (More)

PDF ImageFull Text

Download
CC BY-NC-ND 4.0

Sign In Guest: Register as new SciTePress user now for free.

Sign In SciTePress user: please login.

PDF ImageMy Papers

You are not signed in, therefore limits apply to your IP address 3.231.229.89

In the current month:
Recent papers: 100 available of 100 total
2+ years older papers: 200 available of 200 total

Paper citation in several formats:
Goldstein, M. and Uchida, S. (2016). A Comparative Study on Outlier Removal from a Large-scale Dataset using Unsupervised Anomaly Detection.In Proceedings of the 5th International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM, ISBN 978-989-758-173-1, pages 263-269. DOI: 10.5220/0005701302630269

@conference{icpram16,
author={Markus Goldstein. and Seiichi Uchida.},
title={A Comparative Study on Outlier Removal from a Large-scale Dataset using Unsupervised Anomaly Detection},
booktitle={Proceedings of the 5th International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM,},
year={2016},
pages={263-269},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0005701302630269},
isbn={978-989-758-173-1},
}

TY - CONF

JO - Proceedings of the 5th International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM,
TI - A Comparative Study on Outlier Removal from a Large-scale Dataset using Unsupervised Anomaly Detection
SN - 978-989-758-173-1
AU - Goldstein, M.
AU - Uchida, S.
PY - 2016
SP - 263
EP - 269
DO - 10.5220/0005701302630269

Login or register to post comments.

Comments on this Paper: Be the first to review this paper.