Authors:
Youngjoo Kim
1
;
Mateus Espadoto
2
;
Scott C. Trager
3
;
Jos B. T. M. Roerdink
1
and
Alexandru C. Telea
4
Affiliations:
1
Bernoulli Institute for Mathematics, Computer Science and Artificial Intelligence, University of Groningen, The Netherlands
;
2
Institute of Mathematics and Statistics, University of São Paulo, Brazil
;
3
Kapteyn Astronomical Institute, University of Groningen, The Netherlands
;
4
Department of Information and Computing Sciences, Utrecht University, The Netherlands
Keyword(s):
High-dimensional Visualization, Dimensionality Reduction, Mean Shift, Neural Networks.
Abstract:
Dimensionality reduction (DR) methods aim to map high-dimensional datasets to 2D scatterplots for visual exploration. Such scatterplots are used to reason about the cluster structure of the data, so creating well-separated visual clusters from existing data clusters is an important requirement of DR methods. Many DR methods excel in speed, implementation simplicity, ease of use, stability, and out-of-sample capabilities, but produce suboptimal cluster separation. Recently, Sharpened DR (SDR) was proposed to generically help such methods by sharpening the data-distribution prior to the DR step. However, SDR has prohibitive computational costs for large datasets. We present SDR-NNP, a method that uses deep learning to keep the attractive sharpening property of SDR while making it scalable, easy to use, and having the out-of-sample ability. We demonstrate SDR-NNP on seven datasets, applied on three DR methods, using an extensive exploration of its parameter space. Our results show that
SDR-NNP consistently produces projections with clear cluster separation, assessed both visually and by four quality metrics, at a fraction of the computational cost of SDR. We show the added value of SDR-NNP in a concrete use-case involving the labeling of astronomical data.
(More)