loading
Papers Papers/2022 Papers Papers/2022

Research.Publish.Connect.

Paper

Paper Unlock

Authors: Andrew H. Sung 1 ; Bernardete Ribeiro 2 and Qingzhong Liu 3

Affiliations: 1 The University of Southern Mississippi, United States ; 2 University of Coimbra, Portugal ; 3 Sam Houston State University, United States

Keyword(s): Data Analytics, Knowledge Discovery, Sampling Methods, Quality of Datasets.

Abstract: The era of Internet of Things and big data has seen individuals, businesses, and organizations increasingly rely on data for routine operations, decision making, intelligence gathering, and knowledge discovery. As the big data is being generated by all sorts of sources at accelerated velocity, in increasing volumes, and with unprecedented variety, it is also increasingly being traded as commodity in the new “data economy” for utilization. With regard to data analytics for knowledge discovery, this leads to the question, among various others, of how much data is really necessary and/or sufficient for getting the analytic results that will reasonably satisfy the requirements of an application. In this work-in-progress paper, we address the sampling problem in big data analytics and propose that (1) the problem of sampling the big data for analytics is “hard”specifically, it is a theoretically intractable problem when formal measures are incorporated into performance evaluation; theref ore, (2) heuristic, rather than algorithmic, methods are necessarily needed in data sampling, and a plausible heuristic method is proposed (3) a measure of dataset quality is proposed to facilitate the evaluation of the worthiness of datasets with respect to model building and knowledge discovery in big data analytics. (More)

CC BY-NC-ND 4.0

Sign In Guest: Register as new SciTePress user now for free.

Sign In SciTePress user: please login.

PDF ImageMy Papers

You are not signed in, therefore limits apply to your IP address 18.191.13.255

In the current month:
Recent papers: 100 available of 100 total
2+ years older papers: 200 available of 200 total

Paper citation in several formats:
Sung, A.; Ribeiro, B. and Liu, Q. (2016). Sampling and Evaluating the Big Data for Knowledge Discovery. In Proceedings of the International Conference on Internet of Things and Big Data - IoTBD; ISBN 978-989-758-183-0, SciTePress, pages 378-382. DOI: 10.5220/0005932703780382

@conference{iotbd16,
author={Andrew H. Sung. and Bernardete Ribeiro. and Qingzhong Liu.},
title={Sampling and Evaluating the Big Data for Knowledge Discovery},
booktitle={Proceedings of the International Conference on Internet of Things and Big Data - IoTBD},
year={2016},
pages={378-382},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0005932703780382},
isbn={978-989-758-183-0},
}

TY - CONF

JO - Proceedings of the International Conference on Internet of Things and Big Data - IoTBD
TI - Sampling and Evaluating the Big Data for Knowledge Discovery
SN - 978-989-758-183-0
AU - Sung, A.
AU - Ribeiro, B.
AU - Liu, Q.
PY - 2016
SP - 378
EP - 382
DO - 10.5220/0005932703780382
PB - SciTePress