loading
Documents

Research.Publish.Connect.

Paper

Authors: Philipp Baumann ; Dorit S. Hochbaum and Quico Spaen

Affiliation: University of California, United States

ISBN: 978-989-758-173-1

Keyword(s): Large-Scale Data Mining, Classification, Data Reduction, Supervised Normalized Cut.

Related Ontology Subjects/Areas/Topics: Classification ; Embedding and Manifold Learning ; ICA, PCA, CCA and other Linear Models ; Pattern Recognition ; Sparsity ; Theory and Methods

Abstract: Machine learning techniques that rely on pairwise similarities have proven to be leading algorithms for classification. Despite their good and robust performance, similarity-based techniques are rarely chosen for largescale data mining because the time required to compute all pairwise similarities grows quadratically with the size of the data set. To address this issue of scalability, we introduced a method called sparse computation, which efficiently generates a sparse similarity matrix that contains only significant similarities. Sparse computation achieves significant reductions in running time with minimal and often no loss in accuracy. However, for massively-large data sets even such a sparse similarity matrix may lead to considerable running times. In this paper, we propose an extension of sparse computation called sparse-reduced computation that not only avoids computing very low similarities but also avoids computing similarities between highly-similar or identical objects by compressing them to a single object. Our computational results show that sparse-reduced computation allows highly-accurate classification of data sets with millions of objects in seconds. (More)

PDF ImageFull Text

Download
Sign In Guest: Register as new SciTePress user now for free.

Sign In SciTePress user: please login.

PDF ImageMy Papers

You are not signed in, therefore limits apply to your IP address 54.92.174.226

In the current month:
Recent papers: 100 available of 100 total
2+ years older papers: 200 available of 200 total

Paper citation in several formats:
Baumann P., Hochbaum D. and Spaen Q. (2016). Sparse-Reduced Computation - Enabling Mining of Massively-large Data Sets.In Proceedings of the 5th International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM, ISBN 978-989-758-173-1, pages 224-231. DOI: 10.5220/0005690402240231

@conference{icpram16,
author={Philipp Baumann and Dorit S. Hochbaum and Quico Spaen},
title={Sparse-Reduced Computation - Enabling Mining of Massively-large Data Sets},
booktitle={Proceedings of the 5th International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM,},
year={2016},
pages={224-231},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0005690402240231},
isbn={978-989-758-173-1},
}

TY - CONF

JO - Proceedings of the 5th International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM,
TI - Sparse-Reduced Computation - Enabling Mining of Massively-large Data Sets
SN - 978-989-758-173-1
AU - Baumann P.
AU - Hochbaum D.
AU - Spaen Q.
PY - 2016
SP - 224
EP - 231
DO - 10.5220/0005690402240231

Login or register to post comments.

Comments on this Paper: Be the first to review this paper.