loading
Papers Papers/2022 Papers Papers/2022

Research.Publish.Connect.

Paper

Authors: Hossein Haeri 1 ; Niket Kathiriya 2 ; Cindy Chen 2 and Kshitij Jerath 1

Affiliations: 1 Department of Mechanical Engineering, University of Massachusetts Lowell, Lowell, MA, U.S.A. ; 2 Department of Computer Science, University of Massachusetts Lowell, Lowell, MA, U.S.A.

Keyword(s): Data Granulation, Data Reduction, Data Aggregation, Training Set Size Reduction.

Abstract: In an era where data volume is growing exponentially, effective data management techniques are more crucial than ever. Traditional methods typically manage the size of large datasets by reducing or aggregating data using a pre-specified granularity. However, these methods often face challenges in retaining vital information when dealing with large and complex datasets, especially when such datasets reside in databases. We propose a novel and innovative approach called Adaptive Granulation that addresses this issue by performing data reduction or aggregation at the database level itself. A key concern that arises in the data reduction process is the potential trade-off between the reduction of data volume and the preservation of prediction accuracy. This is particularly relevant in scenarios where the primary goal is to leverage the reduced dataset for predictive modeling. Our method employs Allan variance, originally developed for frequency stability analysis of atomic clocks, to dyn amically adjust the granularity of data aggregation based on the inherent structure and characteristics of the dataset. By minimizing bias across different scales, Adaptive Granulation effectively manages trade-offs between diverse aspects of the data such as underlying patterns, noise levels, and sampling density. This paper outlines the algorithmic strategies for implementing Adaptive Granulation at the database level and assesses its performance through the reduction of the training set size for a downstream regression task on a variety of real-world and synthetic datasets. The results indicate that our method can adaptively optimize granule sizes to effectively balance data patterns, noise levels, and sample densities across the entire data space. Adaptive Granulation thus represents a significant advancement for efficient data management and reduction in the big data era. (More)

CC BY-NC-ND 4.0

Sign In Guest: Register as new SciTePress user now for free.

Sign In SciTePress user: please login.

PDF ImageMy Papers

You are not signed in, therefore limits apply to your IP address 18.222.96.135

In the current month:
Recent papers: 100 available of 100 total
2+ years older papers: 200 available of 200 total

Paper citation in several formats:
Haeri, H.; Kathiriya, N.; Chen, C. and Jerath, K. (2023). Adaptive Granulation: Data Reduction at the Database Level. In Proceedings of the 15th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - KMIS; ISBN 978-989-758-671-2; ISSN 2184-3228, SciTePress, pages 29-39. DOI: 10.5220/0012190700003598

@conference{kmis23,
author={Hossein Haeri. and Niket Kathiriya. and Cindy Chen. and Kshitij Jerath.},
title={Adaptive Granulation: Data Reduction at the Database Level},
booktitle={Proceedings of the 15th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - KMIS},
year={2023},
pages={29-39},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0012190700003598},
isbn={978-989-758-671-2},
issn={2184-3228},
}

TY - CONF

JO - Proceedings of the 15th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - KMIS
TI - Adaptive Granulation: Data Reduction at the Database Level
SN - 978-989-758-671-2
IS - 2184-3228
AU - Haeri, H.
AU - Kathiriya, N.
AU - Chen, C.
AU - Jerath, K.
PY - 2023
SP - 29
EP - 39
DO - 10.5220/0012190700003598
PB - SciTePress