
 
2 PRELIMINARIES  
In this section we present five formal definitions of 
the basic concepts required to understand the 
foundations of Skyline and Skyline metrics. For 
these definitions we are assuming a space S on a set 
of n dimensions {d
1
, …, d
n
}, a subspace S’ or non-
empty subset of the space S, and a dataset DS on S. 
Also, we suppose a tuple t  DS is represented as t = 
(t
1
, …, t
n
) where t
i
 is a real number on dimension d
i
. 
For simplicity, we suppose all dimension will be 
preferred if they have the highest values 
(maximization). 
Definition 1 (Dominance). A tuple t = (t
1
, …, t
n
)  
DS dominates 
another 
tuple  u = (u
1
, …, u
n
)  DS if 
(∀i | 1  i  n : t
i
  u
i
 ∧ (j | 1  j  n : t
j
  u
j
)). 
Definition 2 (Skyline).  The 
Skyline 
of a 
space 
S, 
denoted as SKY
S
, is the set of the non-dominated 
tuples on S. 
Definition 3 (Skycube). The Skycube or lattice is the 
set of the all Skylines for any subspace S’ of S, i.e., 
Skycube = {∪SKY
S’
 | S’
⊆
 S}
.
 
Definition 4 (Skyline Frequency).  The Skyline 
Frequency of a tuple t  DS, denoted by sf(t), is the 
number of subspaces S’ of S in which t is a Skyline 
tuple, this is, sf(t) = (∑ S’ | S’  S 
∧
 t  SKY
S’
 
: 1). 
Since the Skyline can be huge (Chan et al., 
2006a), the Skyline needs to be ranked by a score 
function to distinguish the top-k tuples in a set of 
incomparable ones. A score function of a tuple t, 
denoted as f(t), is a function that ranks the tuple t 
inducing a totally ordered of the input dataset DS. 
Definition 5 (Top-k Skyline).  The Top-k Skyline 
tuples of a space S, denoted by TKS
S
, are the k 
Skyline tuples on S that no other Skyline tuple on S 
may have higher score function value than them: 
TKS
S
 = {t | t  SKY
S
 ∧ (
k-|SKYs|
u | u  SKY
S
 : f(u) 
>  f(t))}, where, 
x
 means that exists at most x 
elements in the set. 
The Skyline Frequency may be used as score 
function to rank the Skyline.  In (Chan et al., 2006a), 
the Top-k Frequent Skyline tuples, denoted here by 
TKFS, are defined as the k tuples in DS that no other 
tuple in DS can have larger Skyline Frequency than 
them: TKFS = {t | t  SKY
S
 ∧ (
k-|SKYs|
u | u  SKY
S
 
: sf(u) > sf(t))}. 
 
 
3 SKYLINE METRICS 
The three steps to compute the SFM metric are: 1) 
The Skyline for each subspace of the multi-
dimensional criteria is computed; 2) The SFM of 
each tuple t is calculated by summing up the 
number of subspaces for which t is a Skyline tuple; 
3) The Skyline is sorted by SFM values and the best 
k tuples are returned. 
Unfortunately, Skyline Frequency has two 
disadvantages. On one hand, it may require to build 
a lattice of skylines for each non-empty subset of a 
multi-dimensional criteria, this is, 2d − 1 skylines 
(Chan et al., 2006a).  In this sense, several solutions 
have been introduced to reduce cost of the lattice 
computation. In (Chan et al., 2006a), the authors 
proposed to estimate the Skyline Frequency values 
with efficient approximated algorithms. (Yuan et al., 
2005; Pei et al., 2006) define algorithms to 
efficiently calculate the Skycube or the lattice of 
skylines by sharing computation of multiple related 
Skyline subspaces. 
On the other hand, Skyline Frequency benefits 
those tuples that have the best value in at least one 
dimension. Any tuple with this characteristic will 
have a lower bound of 1+ 
1
1
  
1
 
d
d
i
i
when data 
are not duplicated. According to Corollary 1 in 
(Yuan et al, 2005), a tuple in a subspace s will be in 
all subspaces for which subspace s is a subset. For 
this reason, all of these tuples could have the same 
Skyline Frequency value (little variability). 
To introduce variability into SFM, we propose a 
new metric called Top-k Skyline Frequency Metric 
(TKSFM). The basis of the lattice for TKSFM is the 
two-dimensional Skylines. Therefore, it does not 
benefit those tuples with the best value in at least 
one dimension as SFM does. Additionally, our 
experimental study shows that our metric is less 
expensive than SFM because it does not need to 
build the whole Skyline for each subspace. 
To exemplify the difference between TKSFM 
and SFM, suppose a lattice for 4 dimensions: A, B, 
C, and D, as shown in Figure 1. SFM value of a 
tuple  t is the number of times in which t is in a 
subspace of the lattice. Since the Skyline for each 
subspace must be calculated, the Skyline Frequency 
computation is very costly (Chan et al., 2006a). 
Instead of the skylines for each subspace of the 
lattice, the lattice of the TKSFM is based on Top-k 
Skyline subspaces. Thus, the evaluation cost of the 
metric may be reduced because the Top-k Skyline is 
computed instead of the whole Skyline set 
(Goncalves and Vidal, 2009). 
ICEIS 2010 - 12th International Conference on Enterprise Information Systems
384