Big Data Preprocessing as the Bridge between Big Data and Smart Data: BigDaPSpark and BigDaPFlink Libraries

Diego García-Gil; Alejandro Alcalde-Barros; Julián Luengo; Salvador García; Francisco Herrera

Research.Publish.Connect.

*Please fill out at least one Field. *Value must be an number!

Title:
ISBN:
Year:
Acronym:
Subject:

Advanced Search Proceedings Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Title:
Author:
Affiliation:
Subject:

Advanced Search Papers Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Name:
Affiliation:
Country:
Conference:
Subject:

Advanced Search Authors Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Name:
Country:
Subject:

Advanced Search Affiliations Search

If you're looking for an exact phrase use quotation marks on text fields.

Proceedings

Proceedings Search *Please fill out at least one Field. *Value must be an number!

Title:
ISBN:
Year:
Acronym:
Subject:

Advanced Search Proceedings Search

If you're looking for an exact phrase use quotation marks on text fields.

Papers

Papers Search *Please fill out at least one Field.

Title:
Author:
Affiliation:
Subject:

Advanced Search Papers Search

If you're looking for an exact phrase use quotation marks on text fields.

Authors

Authors Search *Please fill out at least one Field.

Name:
Affiliation:
Country:
Conference:
Subject:

Advanced Search Authors Search

If you're looking for an exact phrase use quotation marks on text fields.

Advanced Search

Paper

Big Data Preprocessing as the Bridge between Big Data and Smart Data: BigDaPSpark and BigDaPFlink Libraries

Topics: Data Management for Large Data; Software Frameworks (MapReduce, Spark etc) and Simulations; Volume, Velocity, Variety, Veracity and Value

In Proceedings of the 4th International Conference on Internet of Things, Big Data and Security IoTBDS - Volume 1, 324-331, 2019 , Heraklion, Crete, Greece

Authors: Diego García-Gil ; Alejandro Alcalde-Barros ; Julián Luengo ; Salvador García and Francisco Herrera

Affiliation: Departamento de Ciencias de la Computación e Inteligencia Artificial, Universidad de Granada, Granada, 18071 and Spain

Keyword(s): Big Data, Apache Spark, Data Preprocessing, Smart Data, Imbalanced, Classification.

Abstract: With the advent of Big Data, terabytes of data are generated and stored every second. This raw data is far from being perfect, it contains many imperfections (noise, missing values, etc.) and is not suitable for analysis, as it will led to wrong conclusions. Data preprocessing is the set of techniques devoted to polish, clean, fix, and improve that raw data. With this preprocessed data, we would be able to find more patterns in it, and to better explain the underlaying distribution of the data. This is what is called Smart Data, raw data that has been preprocessed and is ready for being analyzed, data that contains valuable information that will led to knowledge. In this work, we present two Big Data libraries for achieving Smart Data from Big Data, BigDaPSpark and BigDaPFlink. They are built on top of two Big Data frameworks, Apache Spark and Apache Flink. Both libraries contain a series of algorithms for Big Data preprocessing, ranging from noise cleaning, to discretization, or dat a reduction, among many others. Additionally, we ilustrate the usage of the libraries with two cases of use. (More)

CC BY-NC-ND 4.0

Guest: Register as new SciTePress user now for free.

SciTePress user: please login.

My Papers

You are not signed in, therefore limits apply to your IP address 216.73.216.157

In the current month:

Recent papers: 100 available of 100 total

2⁺ years older papers: 200 available of 200 total

Paper citation in several formats:

García-Gil, D., Alcalde-Barros, A., Luengo, J., García, S., Herrera and F. (2019). Big Data Preprocessing as the Bridge between Big Data and Smart Data: BigDaPSpark and BigDaPFlink Libraries. In Proceedings of the 4th International Conference on Internet of Things, Big Data and Security - IoTBDS; ISBN 978-989-758-369-8; ISSN 2184-4976, SciTePress, pages 324-331. DOI: 10.5220/0007738503240331

@conference{iotbds19,
author={Diego García{-}Gil and Alejandro Alcalde{-}Barros and Julián Luengo and Salvador García and Francisco Herrera},
title={Big Data Preprocessing as the Bridge between Big Data and Smart Data: BigDaPSpark and BigDaPFlink Libraries},
booktitle={Proceedings of the 4th International Conference on Internet of Things, Big Data and Security - IoTBDS},
year={2019},
pages={324-331},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0007738503240331},
isbn={978-989-758-369-8},
issn={2184-4976},
}

TY - CONF

JO - Proceedings of the 4th International Conference on Internet of Things, Big Data and Security - IoTBDS
TI - Big Data Preprocessing as the Bridge between Big Data and Smart Data: BigDaPSpark and BigDaPFlink Libraries
SN - 978-989-758-369-8
IS - 2184-4976
AU - García-Gil, D.
AU - Alcalde-Barros, A.
AU - Luengo, J.
AU - García, S.
AU - Herrera, F.
PY - 2019
SP - 324
EP - 331
DO - 10.5220/0007738503240331
PB - SciTePress