Authors:
            
                    Juliana Hildebrandt
                    
                        
                    
                    ; 
                
                    Dirk Habich
                    
                        
                    
                     and
                
                    Wolfgang Lehner
                    
                        
                    
                    
                
        
        
            Affiliation:
            
                    
                        
                    
                    Technische Universität Dresden, Database Systems Group, Dresden, Germany
                
        
        
        
        
        
             Keyword(s):
            Columnar Data, Data Formats, Apache Arrow, Lightweight Compression, Integration.
        
        
            
                
                
            
        
        
            
                Abstract: 
                With the ongoing shift to a data-driven world in almost all application domains, the management and in particular the analytics of large amounts of data gain in importance. For that reason, a variety of new big data systems has been developed in recent years. Aside from that, a revision of the data organization and formats has been initiated as a foundation for these big data systems. In this context, Apache Arrow is a novel cross-language development platform for in-memory data with a standardized language-independent columnar memory format. The data is organized for efficient analytic operations on modern hardware, whereby Apache Arrow only supports dictionary encoding as a specific compression approach. However, there exists a large corpus of lightweight compression algorithms for columnar data which helps to reduce the necessary memory space as well as to increase the processing performance. Thus, we present a flexible and language-independent approach integrating lightweight com
                pression algorithms into the Apache Arrow framework in this paper. With our so-called ArrowComp approach, we preserve the unique properties of Apache Arrow, but enhance the platform with a large variety of lightweight compression capabilities.
                (More)