Fault Tolerance Logging-based Model for Deterministic Systems

Óscar Mortágua Pereira, David Simões, Rui L. Aguiar

2016

Abstract

Fault tolerance allows a system to remain operational to some degree when some of its components fail. One of the most common fault tolerance mechanisms consists on logging the system state periodically, and recovering the system to a consistent state in the event of a failure. This paper describes a general fault tolerance logging-based mechanism, which can be layered over deterministic systems. Our proposal describes how a logging mechanism can recover the underlying system to a consistent state, even if an action or set of actions were interrupted mid-way, due to a server crash. We also propose different methods of storing the logging information, and describe how to deploy a fault tolerant master-slave cluster for information replication. We adapt our model to a previously proposed framework, which provided common relational features, like transactions with atomic, consistent, isolated and durable properties, to NoSQL database management systems.

References

  1. Borthakur, D., 2007. The hadoop distributed file system: Architecture and design. Hadoop Project Website, 11(2007), p.21.
  2. Castro, M. and Liskov, B., 1999. Practical Byzantine fault tolerance. OSDI.
  3. Castro, M. and Liskov, B., 2002. Practical Byzantine fault tolerance and proactive recovery. ACM Transactions on Computer Systems (TOCS).
  4. Chun, B., Maniatis, P. and Shenker, S., 2008. Diverse Replication for Single-Machine Byzantine-Fault Tolerance. USENIX Annual Technical Conference.
  5. Cowling, J., Myers, D. and Liskov, B., 2006. HQ replication: A hybrid quorum protocol for Byzantine fault tolerance. Proceedings of the 7th ….
  6. Garcia-Molina, H. and Salem, K., 1987. Sagas, ACM.
  7. Gray, J. and others, 1981. The transaction concept: Virtues and limitations. In VLDB. pp. 144-154.
  8. Gray, J. and Reuter, A., 1992. Transaction Processing: Concepts and Techniques 1st ed., San Francisco, CA, USA: Morgan Kaufmann Publishers Inc.
  9. Gusella, R. and Zatti, S., 1985. An election algorithm for a distributed clock synchronization program,
  10. Huang, K.-H., Abraham, J. and others, 1984. Algorithmbased fault tolerance for matrix operations. Computers, IEEE Transactions on, 100(6), pp.518- 528.
  11. Johnson, D.B., 1989. Distributed System Fault Tolerance Using Message Logging and Checkpointing by. Sciences-New York, 1892(December).
  12. Kotla, R. and Dahlin, M., 2004. High throughput Byzantine fault tolerance. Dependable Systems and Networks, 2004 ….
  13. Merideth, M. and Iyengar, A., 2005. Thema: Byzantinefault-tolerant middleware for web-service applications. … , 2005. SRDS 2005. ….
  14. Mohan, C. et al., 1992. ARIES: a transaction recovery method supporting fine-granularity locking and partial rollbacks using write-ahead logging. ACM Transactions on Database Systems (TODS), 17(1), pp.94-162.
  15. Nakamoto, S., 2008. Bitcoin: A peer-to-peer electronic cash system. Available at: http://www.cryptovest.co.u k/resources/Bitcoin paper Original.pdf [Accessed February 15, 2016].
  16. Oki, B.M. and Liskov, B.H., 1988. Viewstamped replication: A new primary copy method to support highly-available distributed systems. In Proceedings of the seventh annual ACM Symposium on Principles of distributed computing. pp. 8-17.
  17. Pereira, Ó.M., Simões, D.A. and Aguiar, R.L., 2015. Endowing NoSQL DBMS with SQL Features Through Standard Call Level Interfaces. In SEKE 2015 - Intl. Conf. on Software Engineering and Knowledge Engineering. pp. 201-207.
  18. Rabin, M.O., 1989. Efficient dispersal of information for security, load balancing, and fault tolerance. Journal of the ACM (JACM), 36(2), pp.335-348.
  19. Randell, B., Lee, P. and Treleaven, P.C., 1978. Reliability Issues in Computing System Design. ACM Computing Surveys, 10(2), pp.123-165.
  20. Shih, K.-Y. and Srinivasan, U., 2003. Method and system for data replication.
  21. Sumathi, S. and Esakkirajan, S., 2007. Fundamentals of relational database management systems, Springer.
  22. Wolfson, O., Jajodia, S. and Huang, Y., 1997. An adaptive data replication algorithm. ACM Transactions on Database Systems (TODS), 22(2), pp.255-314.
  23. Ylönen, T., 1992. Concurrent Shadow Paging: A New Direction for Database Research.
Download


Paper Citation


in Harvard Style

Pereira Ó., Simões D. and Aguiar R. (2016). Fault Tolerance Logging-based Model for Deterministic Systems . In Proceedings of the 5th International Conference on Data Management Technologies and Applications - Volume 1: DATA, ISBN 978-989-758-193-9, pages 119-126. DOI: 10.5220/0005979101190126


in Bibtex Style

@conference{data16,
author={Óscar Mortágua Pereira and David Simões and Rui L. Aguiar},
title={Fault Tolerance Logging-based Model for Deterministic Systems},
booktitle={Proceedings of the 5th International Conference on Data Management Technologies and Applications - Volume 1: DATA,},
year={2016},
pages={119-126},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0005979101190126},
isbn={978-989-758-193-9},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 5th International Conference on Data Management Technologies and Applications - Volume 1: DATA,
TI - Fault Tolerance Logging-based Model for Deterministic Systems
SN - 978-989-758-193-9
AU - Pereira Ó.
AU - Simões D.
AU - Aguiar R.
PY - 2016
SP - 119
EP - 126
DO - 10.5220/0005979101190126