THE OVERHEAD OF SAFE BROADCAST PERSISTENCY

Rubén de Juan-Marín, Francesc D. Muñoz-Escoí, J. Enrique Armendáriz-Íñigo, J. R. González de Mendívil

2010

Abstract

Although the need of logging messages in secondary storage once they have been received has been stated in several papers that assumed a recoverable failure model, none of them analysed the overhead implied by that logging in case of using reliable broadcasts in a group communication system guaranteeing virtual synchrony. At a glance, it seems an excessive cost for its apparently limited advantages, but there are several scenarios that contradict this intuition. This paper surveys some of these configurations and outlines some benefits of this persistence-related approach.

References

  1. Aguilera, M. K., Chen, W., and Toueg, S. (1998). Failure detection and consensus in the crash-recovery model. In 12th Intnl. Symp. on Dist. Comp. (DISC), pages 231-245, Andros, Greece.
  2. Birman, K. P. (1994). Virtual synchrony model. In Birman, K. P. and van Renesse, R., editors, Reliable Distributed Computing with the Isis Toolkit, chapter 6, pages 101-106. IEEE-CS Press.
  3. Camargos, L., Pedone, F., and Wieloch, M. (2007). Sprint: a middleware for high-performance transaction processing. SIGOPS Oper. Syst. Rev., 41(3):385-398.
  4. Chandra, T. D. and Toueg, S. (1996). Unreliable failure detectors for reliable distributed systems. J. ACM, 43(2):225-267.
  5. Chockler, G. V., Keidar, I., and Vitenberg, R. (2001). Group communication specifications: A comprehensive study. ACM Comput. Surv., 33(4):1-43.
  6. Cristian, F. (1991). Understanding fault-tolerant distributed systems. Communications of the ACM, 34(2):56-78.
  7. de Juan-Marín, R., Irún-Briz, L., and Mun˜oz-Escoí, F. D. (2008). Ensuring progress in amnesiac replicated systems. In 3rd Intnl. Conf. on Availability, Reliability and Security (ARES), pages 390-396, Barcelona, Spain. IEEE-CS Press.
  8. Défago, X., Schiper, A., and Urbán, P. (2004). Total order broadcast and multicast algorithms: Taxonomy and survey. ACM Comput. Surv., 36(4):372-421.
  9. Dolev, D., Friedman, R., Keidar, I., and Malkhi, D. (1997). Failure detectors in omission failure environments. In 16th Annual ACM Symp. on Principles of Dist. Comp. (PODC), page 286, Santa Barbara, CA, USA.
  10. Elnozahy, E. N., Alvisi, L., Wang, Y.-M., and Johnson, D. B. (2002). A survey of rollback-recovery protocols in message-passing systems. ACM Comput. Surv., 34(3):375-408.
  11. Fekete, A., Lynch, N. A., and Shvartsman, A. A. (1997). Specifying and using a partitionable group communication service. In PODC, pages 53-62.
  12. Finkelstein, S., Brendle, R., and Jacobs, D. (2009). Principles for inconsistency. In 4th Biennial Conf. on Innovative Data Systems Research (CIDR), Asilomar, CA, USA.
  13. Gray, J. and Reuter, A. (1993). Transaction Processing: Concepts and Techniques. Morgan Kaufmann, San Mateo, CA, USA.
  14. Helland, P. and Campbell, D. (2009). Building on quicksand. In 4th Biennial Conf. on Innovative Data Systems Research (CIDR), Asilomar, CA, USA.
  15. Holliday, J. (2001). Replicated database recovery using multicast communication. In Intnl. Symp. on Network Computing and its Applications (NCA), pages 104- 107, Cambridge, MA, USA.
  16. Hurfin, M., Mostéfaoui, A., and Raynal, M. (1998). Consensus in asynchronous systems where processes can crash and recover. In 17th Symp. on Reliable Dist. Sys. (SRDS), pages 280-286, West Lafayette, IN, USA.
  17. Imation Corp. (2009). S-class solid state drives. Accessible at http://www.imation.com/en/ImationProducts/Solid-State-Drives/S-Class-Solid-StateDrives/.
  18. Jiménez, R., Patin˜o, M., and Alonso, G. (2002). An algorithm for non-intrusive, parallel recovery of replicated data and its correctness. In Intnl. Symp. on Reliable Distributed Systems (SRDS), pages 150-159, Osaka, Japan. IEEE-CS Press.
  19. Keidar, I. and Dolev, D. (1996). Efficient message ordering in dynamic networks. In 15th Annual ACM Symp. on Principles of Distributed Computing (PODC), pages 68-76, Philadelphia, Pennsylvania, USA.
  20. Kemme, B., Bartoli, A., and Babaoglu, O . (2001). Online reconfiguration in replicated databases based on group communication. In Intnl. Conf. on Dependable Systems and Networks (DSN), pages 117-130, Göteborg, Sweden.
  21. Koo, R. and Toueg, S. (1987). Checkpointing and rollbackrecovery for distributed systems. IEEE Trans. Software Eng., 13(1):23-31.
  22. Lamport, L. (1979). How to make a multiprocessor computer that correctly executes multiprocess programs. IEEE Trans. Computers, 28(9):690-691.
  23. Lamport, L. (1998). The part-time parliament. ACM Trans. Comput. Syst., 16(2):133-169.
  24. Mena, S. and Schiper, A. (2005). A new look at atomic broadcast in the asynchronous crash-recovery model. In Intnl. Symp. on Reliable Distributed Systems (SRDS), pages 202-214, Orlando, FL, USA. IEEE-CS Press.
  25. Mena, S., Schiper, A., and Wojciechowski, P. T. (2003). A step towards a new generation of group communication systems. In ACM/IFIP/USENIX Intnl. Middleware Conf., pages 414-432, Rio de Janeiro, Brazil.
  26. Moser, L. E., Amir, Y., Melliar-Smith, P. M., and Agarwal, D. A. (1994). Extended virtual synchrony. In Intnl. Conf. on Distr. Comp. Sys. (ICDCS), pages 56- 65, Poznan, Poland. IEEE-CS Press.
  27. Pedone, F. and Schiper, A. (1998). Optimistic atomic broadcast. In 12th Intnl. Symp. on Distributed Computing (DISC), pages 318-332, Andros, Greece. Springer.
  28. Peterson, L. L., Buchholz, N. C., and Schlichting, R. D. (1989). Preserving and using context information in interprocess communication. ACM Trans. Comput. Syst., 7(3):217-246.
  29. Rodrigues, L., Mocito, J., and Carvalho, N. (2006). From spontaneous total order to uniform total order: different degrees of optimistic delivery. In ACM Symp. on Applied Computing (SAC), pages 723-727, Dijon, France. ACM Press.
  30. Rodrigues, L. and Raynal, M. (2003). Atomic broadcast in asynchronous crash-recovery distributed systems and its use in quorum-based replication. IEEE Trans. Knowl. Data Eng., 15(5):1206-1217.
  31. Schlichting, R. D. and Schneider, F. B. (1983). Fail-stop processors: An approach to designing fault-tolerant systems. ACM Trans. Comput. Syst., 1(3).
  32. Strom, R. E. and Yemini, S. (1985). Optimistic recovery in distributed systems. ACM Trans. Comput. Syst., 3(3):204-226.
  33. Texas Memory Systems, Inc. (2008). RamSan-500 SSD Details. Accessible at http://www.superssd.com/ products/ramsan-500/.
  34. Memory Systems, Inc. (2009). RamSan620 SSD Technical Specs. Accessible at http://www.superssd.com/products/ramsan-620/.
  35. Transaction Processing Performance Council (2007). TPC benchmark C, standard specification, revision 5.9. Downloadable from http://www.tpc.org/tpcc/.
  36. Vandiver, B. M. (2008). Detecting and Tolerating Byzantine Faults in Database Systems. PhD thesis, Computer Science and Artifical Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA, USA.
  37. Wang, R., Salzberg, B., and Lomet, D. (2007). Log-based recovery for middleware servers. In ACM SIGMOD Intnl. Conf. on Management of Data, pages 425-436, New York, NY, USA.
  38. Wiesmann, M. and Schiper, A. (2004). Beyond 1-safety and 2-safety for replicated databases: Group-safety. Lecture Notes in Computer Science, 2992:165-182.
  39. Wiesmann, M. and Schiper, A. (2005). Comparison of database replication techniques based on total order broadcast. IEEE Trans. Knowl. Data Eng., 17(4):551- 566.
Download


Paper Citation


in Harvard Style

de Juan-Marín R., D. Muñoz-Escoí F., Enrique Armendáriz-Íñigo J. and R. González de Mendívil J. (2010). THE OVERHEAD OF SAFE BROADCAST PERSISTENCY . In Proceedings of the 5th International Conference on Software and Data Technologies - Volume 1: ICSOFT, ISBN 978-989-8425-22-5, pages 111-120. DOI: 10.5220/0002915201110120


in Bibtex Style

@conference{icsoft10,
author={Rubén de Juan-Marín and Francesc D. Muñoz-Escoí and J. Enrique Armendáriz-Íñigo and J. R. González de Mendívil},
title={THE OVERHEAD OF SAFE BROADCAST PERSISTENCY},
booktitle={Proceedings of the 5th International Conference on Software and Data Technologies - Volume 1: ICSOFT,},
year={2010},
pages={111-120},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0002915201110120},
isbn={978-989-8425-22-5},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 5th International Conference on Software and Data Technologies - Volume 1: ICSOFT,
TI - THE OVERHEAD OF SAFE BROADCAST PERSISTENCY
SN - 978-989-8425-22-5
AU - de Juan-Marín R.
AU - D. Muñoz-Escoí F.
AU - Enrique Armendáriz-Íñigo J.
AU - R. González de Mendívil J.
PY - 2010
SP - 111
EP - 120
DO - 10.5220/0002915201110120