
3  RELATED WORKS  
In  the  last  decade  multi-dimensional  and  high-
dimensional  indexing  in  decentralized  peer-to-peer 
(P2P)  networks,  received  extensive  research 
attention.  In  (Aly,  2011)  there  is  proposal  of  a 
distributed k-d tree based on MapReduce framework 
(Dean,  2008).  In  such  index  structures  queries  are 
processed similar to the centralized approach, i.e., the 
query starts in root node and traverse the tree. These 
methods exhibit logarithmic search cost, but face a 
serious  limitation.  Peers  that  correspond  to  nodes 
high in the tree  can  quickly become overloaded as 
query  processing  must  pass  through  them.  In 
centralized  indexes  this  was  a  desirable  property 
because  maintaining  these  nodes  in  main  memory 
allow  the  minimization  of  the  number  of  I/O 
operations. In distributed indexes it is a limiting factor 
leading  to  bottlenecks.  Moreover,  this  causes  an 
imbalance in fault tolerance: if a peer high in the tree 
fails than the system requires a significant amount of 
effort  to  recover.  MIDAS  (Tsatsanifos,  2013)  is 
similar  to  these  works  and  in  particular,  MIDAS 
implements  a  distributed  k-d  tree,  where  leaves 
correspond  to  peers,  and  internal  nodes  dictate 
message routing. MIDAS distinguishes the concepts 
of physical and  virtual  peer.  A  physical  peer  is an 
actual machine responsible for several peers due to 
node departures or failures, or for load balancing and 
fault tolerance  purposes.  A virtual peer  in  MIDAS 
corresponds  to  a  leaf  of  the  k-d  tree,  and 
stores/indexes all key-value tuples, whose keys reside 
in the  leaves  rectangle  and  for  any point  in space, 
there exists exactly one peer in MIDAS responsible 
for it. Two algorithms for Nearest Neighbour Queries 
are described: the first (expected 
) has low 
latency  and  involve  a  large  number  of  peers;  the 
second (expected 
) has higher latency but 
involves  far  fewer  peers.  The  proposed  algorithms 
process  point  and  range  queries  over  the 
multidimensional indexed space in 
 hops in 
expectance. 
4  CONCLUSIONS  
The main objective of this work is the proposal of 
index with the following characteristics: 1) Must be 
used on a large amount of data. The assumption is that 
it  is  not  possible  or  convenient  to  use  a  single 
workstation to host all the data; 2) It  is distributed 
over  a  computer  network  and  ensures  the  greatest 
possible benefits in terms of efficiency (search, insert, 
delete),  i.e.  the  performance  are  close  to  the 
traditional indexes that use a single workstation. The 
basic  ideas  behind  are  a  data  structure,  called 
Decentralized  Random  Trees  (DRT),  based  on  k-d 
tree  and  a  novel  k-nearest  neighbour  algorithm, 
named random k-nearest neighbour algorithm. The 
Decentralized  Random  Trees  represent  the  main 
contribution  of  this  work.  With  a  DTR  distributed 
over a network of peers a randomly chosen peer can 
start  the  propagation  of  a  query  in  the  network 
without involving the peer containing the root of the 
tree in about 65% of cases. Furthermore, the first peer 
that determines that the search is complete will return 
the result. With high probability, more than 98% of 
cases, that peer is not the peer containing the root. Of 
course, due the distributed nature of the DRT, more 
than one query can be running at the same time. The 
number  of  initiated  queries  is  potentially  limitless 
even if the number of peers limits the number of the 
running queries. 
REFERENCES 
Abele,  A.,  McCrae,  J.P.,  Buitelaar,  P.,  Jentzsch,  A., 
Cyganiak, R., 2017. Linking Open Data cloud diagram 
2017. http://lod-cloud.net/ 
Corley, C.,  Mihalcea,  R.,  2005.  Measuring  the  semantic 
similarity of texts. In Proceedings of the ACL workshop 
on  empirical  modeling  of  semantic  equivalence  and 
entailment (pp. 13-18). Association for Computational 
Linguistics. 
Faloutsos, C., Lin, k., 1995. FastMap: A fast algorithm for 
indexing, data-mining and visualization of traditional 
and multimedia datasets, volume 24. ACM. 
Kruskal, J.B., Wish, M., 1978. Multidimensional scaling, 
volume 11. Sage. 
Gargiulo,  F.,  Gigante,  G.,  Ficco,  M.,  2015.  A  semantic 
driven  approach  for  requirements  consistency 
verification. International. Journal of High Performance 
Computing and Networking, 8(3):201–211. 
Basile,  P.,  De  Gemmis,  M.,  Gentile,  A.L.,  Lops,  P., 
Semeraro, G., 2007. Uniba: Jigsaw algorithm for word 
sense  disambiguation.  In  Proceedings  of  the  4th 
International  Workshop  on  Semantic  Evaluations, 
pages  398–401.  Association  for  Computational 
Linguistics. 
Samet,  H.,  2006.  Foundations  of  multidimensional  and 
metric data structures. Morgan Kaufmann. 
Aly,  M.,  Munich,  M.,  Perona,  P.,  2011.  Distributed  k-d 
trees for retrieval from very large image collections. In 
British Machine Vision Conference, Dundee, Scotland. 
Dean, J., Ghemawat, S., 2008. MapReduce: simplified data 
processing on  large clusters. Communications of the 
ACM, 51(1):107–113. 
Tsatsanifos,  G.,  Sacharidis,  D.,  Sellis,  T.,  2013.  Index-
based  query  processing  on  distributed 
multidimensional  data.  GeoInformatica  17.3  pages: 
489-519. 
DATA 2018 - 7th International Conference on Data Science, Technology and Applications
238