
 
approximately 54% of the difference in the average 
runtimes is due to the DBMS optimizer choosing to 
use index with the star schema where table scan is 
faster. We let the DBMS re-analyze the star schema 
using a large sample (20%) of the data but, the 
behaviour of the optimizer remained unaffected. We 
don’t have explanation for it, but even discounting 
these 54% as a result of some inconsistency within 
the Oracle 10g optimizer (e.g. wrong selectivity 
estimates), the remaining advantage of the 
fragmented schema over the star schema is still 
impressive.  
 
Comparing the update performance of the Star 
schema  and the Fragmented Schema 
 
We updated 10% of the tuples in the LOPS table of 
the star schema and the L1, L2 and L3 fragments of 
the fragmented schema. Again, the fragmented 
schema offered better performance, however the 
difference was negligible – only 8% faster than the 
star scheme. 
4 RELATED LITERATURE 
Related literature includes works on data warehouse 
modelling - (Golfarelli et al., 1998; Kimball et al., 
1998; Bizarro et al., 2002), vertical partitioning - 
(Papadomano-lakis et al., 2004; Agrawal et al., 
2004) and new storage models – (Stonebraker et al., 
2005). Our work differs from them through one or 
more of the following: 
•  we use query model as input; 
•  we use overlapping partitioning instead of disjoint 
partitioning; 
•  we focus on reducing the overhead I/O, and not 
on minimizing the joins; 
•  our method allows for storage constraint; 
•  we focus on the read performance; 
•  our method is accommodated within the existing 
database technology. 
We will treat the relevant literature in more detail in 
a full text variant of this paper. 
5 CONCLUSIONS AND 
OUTLOOK 
We think that the current paradigm of data 
warehouse modelling commits the mistake of 
ignoring important information about the future 
workload. In this way many opportunities for 
performance improvement are wasted. We propose a 
simple, yet effective algorithm to derive a more 
query-responsive data warehouse schema. The 
schema created in this way was found to offer more 
than 3 times better read performance using the same 
storage as a star scheme.  
 
We are at the start of our research and many 
questions are open – “Are there algorithms which 
construct even more effective schema (e.g. merge 
not just two fragments at a time but more or merge 
unrelated fragments)?”, “Can any performance 
guarantees be established using reasonable 
assumptions (e.g. self-similarity of the attribute 
network)?” and others. We hope that other 
researchers will find these questions as interesting as 
we do. 
ACKNOWLEDGEMENTS 
I would like to thank Peter Stoehr for his support. 
REFERENCES 
Agrawal, S., et al., 2004: Integrating Vertical and Hori-
zontal Partitioning into Automated Physical Database 
Design. Proc. 2004 SIGMOD Int. Conf. on Manag. of 
Data. 
Bizarro, P., Madeira, H., 2002: Adding a Performance-
Oriented Perspective to Data Warehouse Design.  
Proc. of 4
th
 Int. Conf. on Data Warehousing and 
Knowledge Discovery (DaWaK). 
Golfarelli, M. et al., 1998 Conceptual Design of Data 
Warehouses from E/R Schemes. In Proc. 32th HICSS. 
Inmon, W., 1996. Building the data warehouse, John 
Wiley & Sons, Inc. New York, NY, USA. 
Kimball, R., 1996. The data warehouse toolkit: practical 
techniques for building dimensional data warehouses, 
John Wiley & Sons, Inc. New York, NY, USA. 
Kimball, R. et al., 1998. The data warehouse lifecycle 
toolkit, John Wiley & Sons, Inc. New York, NY, USA. 
Martello, S., Toth, P., 1990. Knapsack problems: 
algorithms and computer implementations, John 
Wiley & Sons, Inc. New York, NY, USA. 
Papadomanolakis E., Ailamaki, A., 2004: AutoPart: 
Automating Schema Design for Large Scientific 
Databases Using Data Partitioning. Proc. 16
th
 Int. 
Conf. on Scient. and Stat. Datab. Manag. (SSDBM). 
Stonebraker, M. et al., 2005. C-Store: A Column-oriented 
DBMS.  Proc of the 31st Int. Conf. on Very Large 
Databases (VLDB). 
TPC-H Standard Specification Revision 2.1.0, 2002. 
http://www.tpc.org 
A NEW LOOK INTO DATA WAREHOUSE MODELLING
543