Fast Multi-View Evaluation of Data

Represented by Symmetric Clusters

Alexander Vinogradov

A. A. Dorodnicyn Computing Centre, Russian Academy of Sciences

Vavilov str. 40, 119333 Moscow, Russian Federation

Abstract. A new framework is proposed for a fast calculation of linear scalings

posed on structured data. Several widely used types of data representation based

on clusters with intrinsic features of local simmetry are taken into account. Pa-

per presents some Image Mining technologies that are used for improvement of

abstract data multi-view evaluation procedures.

1 Introduction

On-line estimation of standard parameters concerning to big samples is usually per-

formed on the basis of preliminary calculation and accumulation of frequently used

intermediate values. The greater the costs of time expenditures, the more numerous in-

termediate data become. A characteristic example is given by systems of class OLAP

(On-Line Analytical Processing) in which ﬁles of aggregates prepared in advance can

surpass in size initial sample in tens and hundreds times [5], [12]. Nevertheless, mul-

tiple view is quite necessary for many types of data, in particular, for various types of

spatially distributed data, 3D scenes and images. Recently hypercubes and other ad-

vanced architectures and methods quickly spread and beneﬁt in many applications of

Image Processing and Image Mining technologies, for instance, in GIS [9], [10], [11].

And, vice-versa, some technologies of IP and IM turn out to be useful in processing

abstract samples in structured data spaces [4], [7]. A subject matter of the paper be-

longs to this last class of methods: we develop a framework providing fast evaluation

of density distributions for arbitrary linear scalings posed on a data sample. Techniques

described here use certain properties of spatial symmetry that are speciﬁc to clusters of

some widely used types of data representation.

1.1 Multiple View on a Data Sample

Existing software products and systems of the speciﬁed type are focused, mainly, on

calculation of several key parameters of data. At the same time, inquiries of reﬁned

analysts concern often also to various non-standard relations of parameters for each of

which aggregated scales can’t be prepared in advance [2], [4], [6]. Such inquiries can

concern to spatiotemporal relations, indirect or hidden indicators, actual and presup-

posed dependencies, and so on [1], [3], [8]. In some cases that we address to in what

Vinogradov A. (2008).

Fast Multi-View Evaluation of Data Represented by Symmetric Clusters.

In Image Mining Theory and Applications, pages 53-57

DOI: 10.5220/0002338600530057

 SciTePress

follows, it is possible to perform fast estimations of corresponding distributions with-

out direct use of bulky ﬁles of data. By the equation of arbitrary relation A(x) = a

an a-parametric scaling is determined in the data space X. Projections of clusters of

X onto the parameterization scale automatically form sampling estimates of the den-

sity distribution for a. If the form of clusters possesses properties of isotropy, i.e., to

some extent it is invariant with respect to linear transformations belonging to the group

, in some cases it is possible to replace all procedures of calculation and gathering

projections by simple summation of values that are independent on actual direction of

a-layers in X.

2 Local Linear Transforms

We assume that for a sample X = {x

} ⊆ R

, i ∈ I, of great volume |I|, some kind

of preliminary segmentation or structuring was made that can be permanently reﬁned

with new data. The situation is widespread when such pre-segmentation produces an

estimation of empirical density f

(x) =

|I|

δ(x −x

)dx in the form of ﬁnite sum

g(x) =

(x), in which every component g

(x) represents in the vicinity of its

center x

a limited function h of some positively determined 2-form B

of coordinates:

(x) = h((x − x

)

(x − x

)), where λ

are aprioristic weights. Examples are:

normal mixtures, ﬁnal sets of ellipsoidal clusters, of ”thick” spheres, etc. The last type

of vicinities is speciﬁc to data of the highest dimension, ﬁrst of all, for long time series

(for example, the sphere with ”thickness” 0.01R contains more than 90% of the volume

of 1000-dimensional sphere).

2.1 Advantages of Clusters with Symmetry

The common for these representations (see Fig. 1) is that in the vicinity of the center x

each component of the sum

(x) is resulted by some invertible linear transform

to spherically symmetric form g

) = h

(

)

)dy

= 1.

Fig.1. Clusters of initial representation are rebuilt to isotropic ones via local transforms. Linear

layer of a scaling (dotted line, left) is replaced with its transformed images (right).

5454

If the integral

)

)dy

on the image L

) of a layer S

is easily calculable

(or can be represented by table values prepared in advance along with permanent reﬁne-

ment of the representation

(x)), an estimation of aprioristic distribution of scal-

ing parameter p(a) =

)

)dy

is reduced to an arrangement of trans-

formed layers L

) in R

, and also to calculation of factors µ

), adhering coordi-

nates y

to uniform scale of scalar parameter a. In a linear case, when A(x) =

all layers are hyper-planes of dimension N − 1, and the value

)

)dy

de-

pends only on the distance from component’s center x

to the image L

) of a layer

whatever direction of vector c

, j ∈ {1, ..., N }, taken in the space R

3 Fast Calculations on Isotropic Clusters

The form of each linear transform L

is uniquely (and simply) determined by the equa-

tion A(x) =

and by the main axes of the initial shape of cluster k. Let matrixes

of these transforms are constructed for all components. In a linear case factors µ

)

are determined by the formula

) =

)

−1

)

(1)

If we exclude from the analysis exotic types of distributions g

(x) used in integrals on

transformed layers L

), then obvious formulas can be written out. We show here as

examples the two corresponding estimates of density p(a) for cases of a normal mixture

and a set of uniformly ﬁlled ellipsoids.

For a normal mixture:

p(a) =

√

2π

)exp

−

(a −

)

−1

)

(2)

For a set of ellipsoids:

p(a) =

√

Γ (

+ 1)

Γ (

N−1

+ 1)

)

1 −

(a −

)

−1

)

N −1

(3)

So, if a data representation of speciﬁed type is used, then for evaluation of distributions

on linear scales A(x) =

it is enough to make weighted by factors (1) transver-

sal summations on always the same set of k one-dimensional ﬁles of size n

ﬁlled

with predeﬁned constants

)

)dy

, where n

depends only on a necessary

detailing of the density p(a).

3.1 Possible Generalizations

Certainly, mixed combinations are admissible, when the sum

(x) contains com-

ponents g

(x) of different types. In general case a critical point is a choice of suitable

5555

piece-wise approximationof the scaling A(x) = a. For a single linear scaling each clus-

ter intersects with entire (N −1)-hyper-plane, but the calculation of its partial crossings

with linear pieces proves to be a separate hard analytical and computational problem

even for simple equations of kind A(x) = a, especially when N grows. Approximate,

but yet computationally efﬁcient solutions can be found in cases when actual repre-

sentations

(x) consist of numerous compact clusters that are small enough to

embody close to exactly plain fragments of a smooth non-linear a-layer (see Fig. 2).

This condition can be directly satisﬁed for diminishing ellipsoids in sparse clusterings,

Fig.2. Plain piece-wise approximation of intersections of smaller clusters with layers of a smooth

non-linear scaling A(x) = a.

but for kernels with inﬁnite support (such as Gaussian ones) some additional assump-

tions are necessary, for instance, usage of kernels cut within ellipsoids that contain main

share of kernels’ volume. In any case, computation of values of type (2), (3) is a prob-

lem essentially less complicated, than pre-calculation and warehousing of numerous

data aggregations or a direct calculation on the sample X = {x

4 Conclusions

A framework was proposed for efﬁcient estimation of a distribution taking place on

parameter a scale of arbitrary linear scaling A(x) = a posed on a big data sample. It

was shown that in special cases it is possible to replace all procedures of calculation

of a sample share represented in any global a-layer by simple weighted summation of

highly limited number of predeﬁned values. Besides exactly computational advantages,

basic improvements of estimates p(a) are provided on this way in the case when it’s

ensured a priori, that not only approximation, but also the true distribution of the sample

5656

X = {x

} is a mixture of the form

(x). Then formulas of type (2) or (3)

containing estimates of parameters x

, c

, B

, µ

obtained on the basis of searching all

the set of initial data, will be more exact, than nonparametric density estimates provided

by restricted fragments situated in layers of scaling A(x) = a, in particular,more exact,

than similar evaluations produced from standard aggregates or other partial histograms.

The framework could be useful in applied tasks concerning to mining and analytical

processing images and other big data sets represented by clusters with described above

features of symmetry.

Acknowledgements

This work was supported by the Russian Foundation for Basic Research (05-01-00332,

05-07-90333,06-01-00492, 06-01-08045, 07-01-12020), and the Grant of Presidium of

RAS ”Scientiﬁc Schools” No 5833.2006.1.

References

1. Alejandro A. Vaisman, Alberto O. Mendelzon, Walter Ruaro and Sergio G. Cymerman: Sup-

porting dimension updates in an OLAP server. Information Systems, Volume 29, Issue 2

(2004) 165-185

2. Alfredo Cuzzocrea: Improving range-sum query evaluation on data cubes via polynomial

approximation. Data & Knowledge Engineering, Volume 56, Issue 2 (2006) 85-121

3. Chun-Che Huang, Tzu-Liang (Bill) Tseng, Ming-Zhong Li and Roger R. Gung: Models

of multi-dimensional analysis for qualitative data and its application. European Journal of

Operational Research, Volume 174, Issue 2 (2006) 983-1008

4. Damianos Chatziantoniou: Using grouping variables to express complex decision support

queries. Data & Knowledge Engineering, Volume 61, Issue 1 (2007) 114-136

5. J. Mundy: Smarter data warehouses. Intelligent Entherprise v.4 No 3 (2001) 100-120

6. Ki Yong Lee, Jin Hyun Son and Myoung Ho Kim: Reducing the cost of accessing relations

in incremental view maintenance. Decision Support Systems, Volume 43, Issue 2 (2007)

512-526

7. Ming-Chuan Hung, Man-Lin Huang, Don-Lin Yang and Nien-Lin Hsueh: Efﬁcient ap-

proaches for materialized views selection in a data warehouse. Information Sciences, Volume

177, Issue 6 (2007) 1333-1348

8. Navin Kumar, Aryya Gangopadhyay and George Karabatis: Supporting mobile decision

making with association rules and multi-layered caching. Decision Support Systems, Vol-

ume 43, Issue 1 (2007) 16-30

9. Owen Kaser and Daniel Lemire: Attribute value reordering for efﬁcient hybrid OLAP. Infor-

mation Sciences, Volume 176, Issue 16 (2006) 2304-2336

10. Pierre Marchand, Alexandre Brisebois, Yvan Bedard and Geoffrey Edwards: Implementation

and evaluation of a hypercube-based method for spatiotemporal exploration and analysis.

ISPRS Journal of Photogrammetry and Remote Sensing, Volume 59, Issues 1-2 (2004) 6-20

11. Sonia Rivest, Yvan Bedard, Marie-Jose Proulx, Martin Nadeau, Frederic Hubert and Julien

Pastor: SOLAP technology: Merging business intelligence with geospatial technology for

interactive spatio-temporal exploration and analysis of data. ISPRS Journal of Photogram-

metry and Remote Sensing, Volume 60, Issue 1 (2005) 17-33

12. Young-Koo Lee, Kyu-Young Whang, Yang-Sae Moon and Il-Yeol Song: An aggregation

algorithm using a multidimensional ﬁle in multidimensional OLAP. Information Sciences,

Volume 152 (2003) 121-138

5757