ISE: A High Performance System for Processing Data Streams

Paolo Cappellari, Soon Ae Chun, Mark Roantree

Abstract

Many organizations require the ability to manage high-volume high-speed streaming data to perform analysis and other tasks in real-time. In this work, we present the Information Streaming Engine, a high-performance data stream processing system capable of scaling to high data volumes while maintaining very low-latency. The Information Streaming Engine adopts a declarative approach which enables processing and manipulation of data streams in a simple manner. Our evaluation demonstrates the high levels of performance achieved when compared to existing systems.

References

  1. Abadi, D. J., Ahmad, Y., Balazinska, M., Çetintemel, U., Cherniack, M., Hwang, J., Lindner, W., Maskey, A., Rasin, A., Ryvkina, E., Tatbul, N., Xing, Y., and Zdonik, S. B. (2005). The design of the borealis stream processing engine. In CIDR, pages 277-289.
  2. Akidau, T., Balikov, A., Bekiroglu, K., Chernyak, S., Haberman, J., Lax, R., McVeety, S., Mills, D., Nordstrom, P., and Whittle, S. (2013). Millwheel: Faulttolerant stream processing at internet scale. PVLDB, 6(11):1033-1044.
  3. Balazinska, M., Balakrishnan, H., Madden, S., and Stonebraker, M. (2008). Fault-tolerance in the borealis distributed stream processing system. ACM Trans. Database Syst., 33(1).
  4. Carney, D., Çetintemel, U., Cherniack, M., Convey, C., Lee, S., Seidman, G., Stonebraker, M., Tatbul, N., and Zdonik, S. B. (2002). Monitoring streams - A new class of data management applications. In VLDB 2002, Proceedings of 28th International Conference on Very Large Data Bases, August 20-23, 2002, Hong Kong, China, pages 215-226.
  5. Chandrasekaran, S., Cooper, O., Deshpande, A., Franklin, M. J., Hellerstein, J. M., Hong, W., Krishnamurthy, S., Madden, S., Reiss, F., and Shah, M. A. (2003). Telegraphcq: Continuous dataflow processing. In Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data, San Diego, California, USA, June 9-12, 2003, page 668.
  6. Chandrasekaran, S. and Franklin, M. J. (2002). Streaming queries over streaming data. In VLDB 2002, Proceedings of 28th International Conference on Very Large Data Bases, August 20-23, 2002, Hong Kong, China, pages 203-214.
  7. Cherniack, M., Balakrishnan, H., Balazinska, M., Carney, D., Çetintemel, U., Xing, Y., and Zdonik, S. B. (2003). Scalable distributed stream processing. In CIDR.
  8. Condie, T., Conway, N., Alvaro, P., Hellerstein, J. M., Gerth, J., Talbot, J., Elmeleegy, K., and Sears, R. (2010). Online aggregation and continuous query support in mapreduce. In Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2010, Indianapolis, Indiana, USA, June 6-10, 2010, pages 1115-1118.
  9. Falt, Z., Bednárek, D., Krulis, M., Yaghob, J., and Zavoral, F. (2014). Bobolang: a language for parallel streaming applications. In The 23rd International Symposium on High-Performance Parallel and Distributed Computing, HPDC'14, Vancouver, BC, Canada - June 23 - 27, 2014, pages 311-314.
  10. Ganglia (2015). Ganglia. http://ganglia.sourceforge.net/. [Online; accessed 24-November-2015].
  11. Gedik, B., Yu, P. S., and Bordawekar, R. (2007). Executing stream joins on the cell processor. In Proceedings of the 33rd International Conference on Very Large Data Bases, University of Vienna, Austria, September 23- 27, 2007, pages 363-374.
  12. Grinev, M., Grineva, M. P., Hentschel, M., and Kossmann, D. (2011). Analytics for the realtime web. PVLDB, 4(12):1391-1394.
  13. Gui, H. and Roantree, M. (2013a). Topological xml data cube construction. International Journal of Web Engineering and Technology, 8(4):347-368.
  14. Gui, H. and Roantree, M. (2013b). Using a pipeline approach to build data cube for large xml data streams. In Database Systems for Advanced Applications, pages 59-73. Springer Berlin Heidelberg.
  15. Gulisano, V., Jiménez-Peris, R., Patiño-Martínez, M., and Valduriez, P. (2010). Streamcloud: A large scale data streaming system. In 2010 International Conference on Distributed Computing Systems, ICDCS 2010, Genova, Italy, June 21-25, 2010, pages 126- 137.
  16. Infiniband (2015). Infiniband. http://www.infinibandta.org/. [Online; accessed 24-November-2015].
  17. InfoSphere streams (2015). InfoSphere streams. http:// www-03.ibm.com/software/products/en/infospherestreams. [Online; accessed 19-October-2015].
  18. Kang, J., Naughton, J. F., and Viglas, S. (2003). Evaluating window joins over unbounded streams. In Proceedings of the 19th International Conference on Data Engineering, March 5-8, 2003, Bangalore, India, pages 341-352.
  19. Li, J., Maier, D., Tufte, K., Papadimos, V., and Tucker, P. A. (2005). Semantics and evaluation techniques for window aggregates in data streams. In Proceedings of the ACM SIGMOD International Conference on Management of Data, Baltimore, Maryland, USA, June 14-16, 2005, pages 311-322.
  20. Madden, S., Shah, M. A., Hellerstein, J. M., and Raman, V. (2002). Continuously adaptive continuous queries over streams. In Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data, Madison, Wisconsin, June 3-6, 2002, pages 49- 60.
  21. Motwani, R., Widom, J., Arasu, A., Babcock, B., Babu, S., Datar, M., Manku, G. S., Olston, C., Rosenstein, J., and Varma, R. (2003). Query processing, approximation, and resource management in a data stream management system. In CIDR.
  22. MVAPICH2, The Ohio State University (2015). MVAPICH2, The Ohio State University. http:// mvapich.cse.ohio-state.edu/. [Online; accessed 24- November-2015].
  23. Neumeyer, L., Robbins, B., Nair, A., and Kesari, A. (2010). S4: Distributed stream computing platform. In Proceedings of the 2010 IEEE International Conference on Data Mining Workshops, ICDMW 7810, pages 170- 177, Washington, DC, USA. IEEE Computer Society.
  24. Peng, D. and Dabek, F. (2010). Large-scale incremental processing using distributed transactions and notifications. In 9th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2010, October 4-6, 2010, Vancouver, BC, Canada, Proceedings, pages 251-264.
  25. Plimpton, S. J. and Shead, T. M. (2014). Streaming data analytics via message passing with application to graph algorithms. J. Parallel Distrib. Comput., 74(8):2687- 2698.
  26. Slurm (2015). Slurm. http://slurm.schedmd.com/. [Online; accessed 24-November-2015].
  27. Teubner, J. and Müller, R. (2011). How soccer players would do stream joins. In Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2011, Athens, Greece, June 12-16, 2011, pages 625-636.
  28. Toshniwal, A., Taneja, S., Shukla, A., Ramasamy, K., Patel, J. M., Kulkarni, S., Jackson, J., Gade, K., Fu, M., Donham, J., Bhagat, N., Mittal, S., and Ryaboy, D. V. (2014). Storm@twitter. In International Conference on Management of Data, SIGMOD 2014, Snowbird, UT, USA, June 22-27, 2014, pages 147-156.
  29. Trident (2012). Trident. http://storm.apache.org/ documentation/Trident-tutorial.html. [Online; accessed 24-November-2015].
  30. Zaharia, M., Das, T., Li, H., Shenker, S., and Stoica, I. (2012). Discretized streams: An efficient and faulttolerant model for stream processing on large clusters. In 4th USENIX Workshop on Hot Topics in Cloud Computing, HotCloud'12, Boston, MA, USA, June 12- 13, 2012.
Download


Paper Citation


in Harvard Style

Cappellari P., Chun S. and Roantree M. (2016). ISE: A High Performance System for Processing Data Streams . In Proceedings of the 5th International Conference on Data Management Technologies and Applications - Volume 1: DATA, ISBN 978-989-758-193-9, pages 13-24. DOI: 10.5220/0005938000130024


in Bibtex Style

@conference{data16,
author={Paolo Cappellari and Soon Ae Chun and Mark Roantree},
title={ISE: A High Performance System for Processing Data Streams},
booktitle={Proceedings of the 5th International Conference on Data Management Technologies and Applications - Volume 1: DATA,},
year={2016},
pages={13-24},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0005938000130024},
isbn={978-989-758-193-9},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 5th International Conference on Data Management Technologies and Applications - Volume 1: DATA,
TI - ISE: A High Performance System for Processing Data Streams
SN - 978-989-758-193-9
AU - Cappellari P.
AU - Chun S.
AU - Roantree M.
PY - 2016
SP - 13
EP - 24
DO - 10.5220/0005938000130024