
 
records satisfying the query predicates.  
Joshi and Jermaine (2008) introduced the ACE 
Tree which is a binary tree index structure for 
efficiently drawing samples for processing database 
queries. They demonstrated the effectiveness of this 
structure for single and two attribute database 
queries, but did not deal with multi-attribute 
aggregate queries. For extending the ACE Tree to k 
key attributes, Joshi and Jermaine proposed binary 
splitting of one attribute range after another at 
consecutive levels of the binary tree starting from 
the root; from level k+1, the process is repeated with 
each attribute in the same sequence as before. This 
process could lead to an index tree of very large 
height for a data warehouse even if only a relatively 
small number of attributes are considered. 
Li et al. (2008) proposed a sampling cube 
framework for answering analytical queries on a 
data warehouse which calculates confidence 
intervals for any multidimensional query. The 
sampling cube is constructed from a random sample 
of the data warehouse. After building the sampling 
cube, there is no further access to the original data 
records should a query require a different sample 
from the one already drawn. If a query has too few 
sample records in the sampling cube, they expand 
the query to gather more sample records from the 
sampling cube itself in an attempt to improve the 
quality of the query result. 
In this paper, we propose the k-MDI Tree which 
extends the ACE Tree structure to deal with multi-
dimensional data warehouse queries. Unlike the 
ACE Tree, the k-MDI tree allows non-binary splits 
of data ranges for key values that do not split evenly 
into 2
n
 distinct ranges. The number of levels in the k-
MDI tree can be limited to the number of key 
attributes. The shallow tree structure resulting from 
multi-way branching also facilitates quicker retrieval 
of leaf nodes from disk storage. Unlike the sampling 
cube of Li et al. (2008), new samples that contain 
relevant records are drawn for each query. These 
records can be considered as drawn from a subset of 
the data warehouse that satisfies the query 
predicates. In estimating the query results, we take 
into account the proportion of relevant records for 
the query in the whole data warehouse. The 
sampling and estimation methods are evaluated 
experimentally using a real life data set.   
The rest of the paper is organized as follows:  In 
Section 2, we define some relevant terms and briefly 
describe the ACE Tree structure. In Section 3, our k-
way multi-dimensional (k-MDI) indexing structure 
is described in detail. We also introduce the concept 
of relevancy ratios, both for the database and a 
specific sample. Section 4 reports the experimental 
results that evaluate the efficacy of our scheme. 
Section 5 is the conclusion of the paper. 
2 TERMS, DEFINITIONS AND 
ACE TREE STRUCTURE 
In this section, we define some terms pertaining to 
data warehousing, define confidence interval and 
then review briefly the ACE Tree structure (Joshi 
and Jermaine, 2008) that has preceded the k-MDI 
tree we propose in Section 3.  
2.1  Dimensions and Measure 
To support decision support queries, data is usually 
structured in large databases called data warehouses. 
Typically, data warehouses are relational databases 
with a large table in the middle called the fact table 
connected to other tables called dimensions. For 
example, consider the fact table Sales shown as 
Table 1. A dimension table Store linked to StoreNo 
in this fact table will contain more information on 
each of the stores such as store name, location, state, 
and country (Kimball and Moss, 2002). Other 
dimension tables could exist for items and date. The 
remaining attributes like quantity and amount are 
typically, but not necessarily, numerical and are 
termed  measures. A typical decision support query 
aggregates a measure using functions such as Sum(), 
Avg() or Count().  The fact table Sales along with all 
its dimension tables form a star schema. 
Table 1: Fact table Sales. 
  
SALES 
  
Store
No 
Date Item Quantity Amount 
  21  12-Jan-11  iPad      223  123,455 
  21  12-Jan-11  PC        20    24,800 
  24  11-Jan-11  iMac        11      9,990 
  77  25-Jan-11  PC        10    12,600 
 
In decision support queries a measure is of 
interest for calculation of averages, totals and 
counts. For example, a sales manager may like to 
know the total sales quantity and amount for certain 
item(s) in a certain period of time for a particular 
store or even all (or some) stores in a region. This 
may then allow her to make decisions to order more 
or less stocks as appropriate at a point in time. 
AnEfficientSamplingSchemeforApproximateProcessingofDecisionSupportQueries
17