Comparative Study of Query Performance in a Remote Health Framework using Cassandra and Hadoop

Himadri Sekhar Ray, Kausik Naguri, Poly Sil Sen, Nandini Mukherjee

Abstract

With the recent advancements in distributed processing, sensor networks, cloud computing and similar technologies, big data has gained importance and a number of big data applications can now be envisaged which could not be conceptualised earlier. However, gradually as technologists focus on storing, processing and management of big data, a number of big data solutions have come up. The objective of this paper is to study two such solutions, namely Hadoop and Cassandra, in order to find their suitability for healthcare applications. The paper considers a data model for a remote health framework and demonstrates mappings of the data model using Hadoop and Cassandra. The data model follows popular national and international standards for Electronic Health Records. It is shown in the paper that in order to obtain an efficient mapping of a given data model onto a big data solution, like Cassandra, sample queries must be considered. In this paper, health data is stored in Hadoop using xml files considering the same set of queries. Next, the performances of these queries in Hadoop are observed and later, performances of executing these queries on the same experimental setup using Hadoop and Cassandra are compared. YCSB guidelines are followed to design the experiments. The study provides an insight for the applicability of big data solutions in healthcare domain.

References

  1. Aydin. G., Hallac I.R., and Karakus B. (2015) Architecture and Implementation of a Scalable Sensor Data Storage and Analysis System Using Cloud Computing and Big Data Technologies. Journal of Sensors, Volume 2015, Article ID 834217, Hindwai Publishing Corporation.
  2. Belle A., Thiagarajan R., Soroushmehr S.M.R., Navidi F., Beard D.A., and Najarian K. (2015) Big Data Analytics in Healthcare, BioMed research international. Volume 2015, Article ID 370194, Hindwai Publishing Corporation.
  3. Bezerra A., Hernández P., Espinosa A., and Carlos J. (2013) Job scheduling for optimizing data locality in Hadoop clusters. Proceedings of the 20th European MPI Users' Group Meeting (EuroMPI'13). ACM, New York, NY, USA, pp 271-276.
  4. Guo Z., Fox G., and Zhou M. (2012) Investigation of data locality and fairness in MapReduce, In Proceedings of third international workshop on MapReduce and its Applications Date, pp. 25-32. ACM.
  5. Lourenço J.R., Cabral B., Carreiro P., Vieira M., and Bernardino J. (2015) Choosing the right NoSQL database for the job: a quality attribute evaluation. Journal of Big Data, 2 (1), pp 1-26.
  6. Manoj V. (2014) Comparative study of NoSQL Document, Column Store Databases And Evaluation Of Cassandra. International Journal of Database Management Systems, 6 (4), pp11-26.
  7. Ministry of Health and family Welfare, Government of India (2013) Approved “Electronic Health Record Standards for India”, August 2013.
  8. Mukherjee, N., Bhunia, S. S., and Sil Sen, P. (2014) A Sensor-Cloud Framework for Provisioning Remote Health-Care Services. Proceedings of the Computing & Networking for Internet of Things (ComNet-IoT) workshop co-located with 15th International Conference on Distributed Computing and Networking.
  9. Naguri, K., Sil Sen P., Mukherjee, N. (2015) Design of a Health-Data Model and a Query-driven Implementation in Cassandra, Proceedings of the 3rd International Workshop on Service Science for eHealth (SSH), co-located with IEEE HealthCom.
  10. Patel J. (2012) (Online)
  11. www.ebaytechblog.com/2012/07/16/cassandra-datamodeling-best-practices-part-1/ & -part-2/
  12. Sil Sen, P., Mukherjee, N. (2014) Standards of EHR and their scope of implementation in a sensor-cloud environment, Proceedings of the international Conference on Medical Imaging, m-health and Emerging Communication System (MedCom), IEEE, pp241-246.
Download


Paper Citation


in Harvard Style

Ray H., Naguri K., Sil Sen P. and Mukherjee N. (2016). Comparative Study of Query Performance in a Remote Health Framework using Cassandra and Hadoop . In Proceedings of the 9th International Joint Conference on Biomedical Engineering Systems and Technologies - Volume 5: HEALTHINF, (BIOSTEC 2016) ISBN 978-989-758-170-0, pages 330-337. DOI: 10.5220/0005706803300337


in Bibtex Style

@conference{healthinf16,
author={Himadri Sekhar Ray and Kausik Naguri and Poly Sil Sen and Nandini Mukherjee},
title={Comparative Study of Query Performance in a Remote Health Framework using Cassandra and Hadoop},
booktitle={Proceedings of the 9th International Joint Conference on Biomedical Engineering Systems and Technologies - Volume 5: HEALTHINF, (BIOSTEC 2016)},
year={2016},
pages={330-337},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0005706803300337},
isbn={978-989-758-170-0},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 9th International Joint Conference on Biomedical Engineering Systems and Technologies - Volume 5: HEALTHINF, (BIOSTEC 2016)
TI - Comparative Study of Query Performance in a Remote Health Framework using Cassandra and Hadoop
SN - 978-989-758-170-0
AU - Ray H.
AU - Naguri K.
AU - Sil Sen P.
AU - Mukherjee N.
PY - 2016
SP - 330
EP - 337
DO - 10.5220/0005706803300337