
Table 1: FR
Aj
 ratios for 1GB TPC-H data warehouse. 
Attribute (A
j
) Table (T
k
) 
FR
Aj   
(ΣAIR
QiTkAj
) 
C_MktSegment Customer  0,0004 
C_Phone Customer 0,0000 
L_ShipDate LineItem 0,0740 
L_ShipMode LineItem 0,0438 
L_ShipInstruct LineItem 0,0384 
L_Quantity LineItem 0,0284 
L_ReturnFlag LineItem 0,0142 
L_ReceiptDate LineItem 0,0125 
L_Discount LineItem 0,0106 
O_OrderDate Orders 0,0738 
O_OrderStatus Orders 0,0213 
O_Comment Orders 0,0000 
P_Brand Part 0,0020 
P_Container Part 0,0017 
P_Size Part 0,0014 
P_Name Part 0,0009 
P_Type Part 0,0008 
S_Comment Supplier 0,0000 
Table 2: Partitioning attributes cardinality and range 
values. 
Partitioning 
Attribute (PA
j
) 
Cardinality of 
PA
j
 in T
k
 (#) 
Range Values of  
PA
j
 in T
k
 
C_MktSegment 5 
‘AUTOMOBILE’, 
FURNITURE’, ’MACHINERY’, 
’HOUSEHOLD’, ’BUILDING’ 
L_ShipDate 2526 02-01-1992…01-12-1998 
O_OrderDate 2406 01-01-1992…02-08-1998 
P_Brand 25 ‘Brand#11’…‘Brand#55’ 
S_Comment 9999 ‘Customer%’…‘water%’ 
Table 3: Partitioning schema for 1GB TPC-H using PIN. 
Table (Tk)  Partitions 
Customer 
Create partition by List on C_MktSegment ( 
   Partition 1 with values 'BUILDING', 
   Partition 2 with values 'AUTOMOBILE', 
   Partition 3 with values 'FURNITURE', 
   Partition 4 with values 'MACHINERY', 
   Partition 5 with values 'HOUSEHOLD') 
LineItem 
Create partition by Range on L_ShipDate ( 
   Partition 1 with values between [01-01-92; 31-01-92], 
   Partition 2 with values between [01-02-92; 28-02-92],  
   … 
   Partition 84 with values between [01-12-98; 31-12-98]) 
Orders 
Create partition by Range on L_ShipDate ( 
   Partition 1 with values between [01-01-92; 31-03-92], 
   Partition 2 with values between [01-04-92; 30-06-92],  
   … 
    Partition 27 with values between [01-07-98; 30-09-98]) 
Part 
Create partition by List on P_Brand ( 
   Partition 1 with values 'Brand#11', 
   Partition 2 with values 'Brand#12',  
   … 
   Partition 25 with values 'Brand#55') 
Supplier 
Create partition by List on S_Comment ( 
   Partition 1 with values '%Customer%Complaints%', 
   Partition 2 with all other values) 
4.2  Indexing the Data Warehouse 
As stated previously, primary keys and referential 
integrity constraints will be maintained, so we will 
only refer to other indexing for our method. This 
means that all attributes used as restriction 
predicates in the OLAP queries will be indexed 
according to the rule defined in section 3.3 of this 
paper. The resulting indexing schema is presented in 
Table 4. 
Table 4: Indexing schema for TPC-H using PIN. 
Attribute (A
j
) Table (T
k
) Index Type 
C_MktSegment Customer Bitmap 
C_Phone Customer B*Tree 
L_ShipDate, L_ShipMode, 
L_ShipInstruct, L_Quantity, 
L_ReturnFlag, L_ReceiptDate, 
L_Discount 
LineItem Bitmap 
O_OrderDate, O_OrderStatus  Orders  Bitmap 
O_Comment Orders B*Tree 
P_Brand, P_Container, P_Size, _Type  Part  Bitmap 
P_Name Part B*Tree 
S_Comment Supplier B*Tree 
4.3 Results 
To analyze the performance of our method, we have 
conducted four series of experiments for each setup: 
(1) implemented without optimization techniques, 
just maintaining its standard primary keys and 
integrity constraints; (2) optimized using standard 
indexing techniques, such as join indexes; (3) 
optimized only by standard partitioning techniques 
using our fragmentation rules; and (4) with full 
optimization using our method with partitioning and 
indexing. Figures 1 and 2 present the results relating 
the execution of the workload for size 1GB and 8GB 
data warehouses, while Figure 3 presents the results 
according to predefined scenarios on 8 simultaneous 
users executing the workload. 
For performance optimization of the fourth series 
of experiments (our method), we executed many 
performance optimization tests that included bitmap 
join indexes, which usually improve OLAP query 
performance (Bellatreche, 2004), and in this case it 
did not improve. Contrarily, performance degraded. 
We observed that, for our method, only the 
restriction predicates should be indexed. This 
enforces the validity for our rule defining the 
indexing schema, only building indexes for all the 
attributes that appear in the OLAP queries WHERE 
clause after adequately partitioning the tables 
according to our method. 
Figures 1 to 3 and Tables 5 and 6 show that PIN 
has best scalability features and is the most efficient 
technique, outstanding the others in every execution, 
both relating to DW size and number of 
simultaneous users. Analyzing average gains from a 
DB size point of view, the gain for standard 
indexing is around 25%, while for standard 
partitioning the results show an average gain of 
48%. PIN has an average gain of 55%. In 
simultaneous users execution, PIN’s advantage is 
also evident, with an average gain of 56% against 
49% for standard partitioning and 25% for standard 
indexing. It also increases the gain compared with 
other methods as the DB grows in size and/or 
PIN: A PARTITIONING & INDEXING OPTIMIZATION METHOD FOR OLAP
175