7 CONCLUSION
The primary goal of this research was to implement a
novel approach for EDA scenarios for the practical
use of data scientists and analysts. Existing query
methods require ongoing access to the underlying
data. Once training has taken place, our proposed
method does not require an on-line connection to the
data to explore the data. This opens a variety of
potential use cases for big data analytics.
REFERENCES
Acharya, S., Gibbons, P. B., Poosala, V. and Ramaswamy,
S. “The aqua approximate query answering system,” in
Proceedings of the 1999 ACM SIGMOD international
conference on Management of data, 1999, pp. 574–576.
Agarwal, S., Iyer, A. P., Panda, A., Madden, S., Mozafari,
B. and Stoica, I., “Blink and it’s done: interactive
queries on very large data,” ACM, 2012.
Agarwal, S., Milner, H., Kleiner, A., Talwalkar, A., Jordan,
M., Madden, S., Mozafari, B. and Stoica, I., “Knowing
when you’re wrong: building fast and reliable
approximate query processing systems,” in
Proceedings of the 2014 ACM SIGMOD International
Conf. on Management of data, 2014, pp. 481–492.
Agarwal, S., Mozafari, B., Panda, A., Milner, H., Madden,
S. and Stoica, I., “Blinkdb: queries with bounded errors
and bounded response times on very large data,” in
Proceedings of the 8th ACM European Conf. on
Computer Systems, 2013, pp. 29–42.
Babcock, B., Datar, M. and Motwani, R., “Sampling from
a moving window over streaming data,” in 2002 Annual
ACM-SIAM Symposium on Discrete Algorithms (SODA
2002). Stanford InfoLab, 2001.
Bagchi, A., Chaudhary, A., Eppstein, D. and Goodrich, M.
T., “Deterministic sampling and range counting in
geometric data streams,” ACM Transactions on
Algorithms (TALG), vol. 3, no. 2, pp. 16–es, 2007.
Chandramouli, B., Goldstein, J., Barnett, M., DeLine, R.,
Fisher, D., Platt, J. C., Terwilliger, J. F. and Wernsing,
J. “Trill: A high-performance incremental query
processor for diverse analytics,” Proceedings of the
VLDB Endowment, vol. 8, no. 4, pp. 401–412, 2014.
Chaudhuri, S., Ding, B. and Kandula, S. “Approximate
query processing: No silver bullet,” in Proceedings of
the 2017 ACM International Conference on
Management of Data, 2017, pp. 511–519.
Chuang, K.-t., Chen, H.-l. and Chen, M.-s., “Feature-
preserved sampling over streaming data,” ACM
Transactions on Knowledge Discovery from Data
(TKDD), vol. 2, no. 4, pp. 1–45, 2009.
Cormode, G., Garofalakis, M., Haas, P. J. and Jermaine, C.,
“Synopses for massive data: Samples, histograms,
wavelets, sketches,” Foundations and Trends in
Databases, vol. 4, no. 1–3, pp. 1–294, 2012.
Cuzzocrea A. and Saccà, D., “Exploiting compression and
approximation paradigms for effective and efficient
online analytical processing over sensor network
readings in data grid environments,” Concurrency and
Computation: Practice and Experience, vol. 25, no. 14,
pp. 2016–2035, 2013.
Dokeroglu, T., Ozal, S., Bayir, M. A., Cinar, M. S. and
Cosar, A., “Improving the performance of hadoop hive
by sharing scan and computation tasks,” Journal of
Cloud Computing, vol. 3, no. 1, p. 12, 2014.
Galakatos, A., Crotty, A., Zgraggen, E., Binnig, C. and
Kraska, T. “Revisiting reuse for approximate query
processing,” Proceedings of the VLDB Endowment,
vol. 10, no. 10, pp. 1142–1153, 2017.
Gibbons, P. B., Matias, Y. and Poosala, V., “Fast
incremental maintenance of approximate histograms,”
in VLDB, vol. 97. Citeseer, 1997, pp. 466– 475.
He, W., Park, Y., Hanafi, I., Yatvitskiy, J. and Mozafari, B.,
“Demonstration of verdictdb, the platform-independent
aqp system,” in Proceedings of the 2018 International
Conf. on Management of Data, 2018, pp. 1665–1668.
Jagadish, H. V., Koudas, N., Muthukrishnan, S., Poosala,
V., Sevcik, K. C. and Suel, T., “Optimal histograms
with quality guarantees,” in VLDB, vol. 98, 1998, pp.
24–27.
Jayachandran, P., Tunga, K., Kamat, N. and Nandi, A.,
“Combining user interaction, speculative query
execution and sampling in the dice system,”
Proceedings of the VLDB Endowment, vol. 7, no. 13,
pp. 1697– 1700, 2014.
Kamat, N., Jayachandran, P., Tunga, K. and Nandi, A.,
“Distributed and interactive cube exploration,” in 2014
IEEE 30th International Conference on Data
Engineering. IEEE, 2014, pp. 472–483.
Li K. and Li, G., “Approximate query processing: What is
new and where to go?” Data Science and Engineering,
vol. 3, no. 4, pp. 379–397, 2018.
Li K. and Li, G., “Approximate query processing: What is
new and where to go?” Data Science and Engineering,
vol. 3, no. 4, pp. 379–397, 2018.
Lipton, Z. C., Berkowitz, J. and Elkan, C., “A critical
review of recurrent neural networks for sequence
learning,” arXiv preprint arXiv:1506.00019, 2015.
Mozafari B. and Niu, N., “A handbook for building an
approximate query engine.” IEEE Data Eng. Bull., vol.
38, no. 3, pp. 3–29, 2015.
Mozafari, B., Ramnarayan, J., Menon, S., Mahajan, Y.,
Chakraborty, S., Bhanawat, H. and Bachhav, K.
“Snappydata: A unified cluster for streaming,
transactions and interactice analytics.” in CIDR, 2017.
Nguyen H. S. and Nguyen, S. H., “Fast split selection
method and its application in decision tree construction
from large databases,” International Journal of Hybrid
Intelligent Systems, vol. 2, no. 2, pp. 149–160, 2005.
Park, Y., Mozafari, B., Sorenson, J. and Wang, J.,
“Verdictdb: Universalizing approximate query
processing,” in Proceedings of the 2018 International
Conf. on Management of Data, 2018, pp. 1461–1476.
Ramnarayan, J., Mozafari, B., Wale, S., Menon, S., Kumar,
N., Bhanawat, H., Chakraborty, S., Mahajan, Y.,