difference  was  0.016.  In  the  Single-Link  algorithm 
(Figure 12), we  did  not observe relevant difference 
among the quality results. 
5  RELATED WORK 
Recent researches have focused on the use of queries, 
indexing techniques or both to reduce the volume of 
data to be processed (Bhattacharya and Getoor, 2007; 
Altwaijry et al., 2013; Christen, 2012a; Ramadan et 
al.,  2015;  Vieira,  2016).  Different  indexing 
techniques  are  summarized  in  (Christen,  2012a). 
However,  most  of  these  techniques  are  focused  on 
traditional ER process, with batch algorithms and just 
few researches focus on incremental ER (Gruenheid 
et al., 2014; Whang et al., 2013; Altowin et al., 2014; 
Whang and Garcia-Molina, 2014).  
  In (Bhattacharya and Getoor, 2007; Altwaijry et 
al.,  2013),  a  query-time  ER  is  proposed,  but  the 
indexing  to  reuse  previous  classifications  was  not 
considered. In (Whang et al., 2013; Gruenheid et al., 
2014), an incremental ER approach is proposed, but 
the indexing is static and the ER is not query-driven. 
  In  (Ramadan  et  al.,  2015)  dynamic  indexes  are 
proposed.  Both  papers  focused  on  information 
retrieval  and  not  on  data  integration  process 
(Christen,  2012).  Besides  that,  just  attribute  and 
similarity  values  are  indexed  and  not  clusters  of 
tuples that  refer to the same real-world entity.  
Our indexes are different in  three aspects. First, 
our  focus  is  the  data  integration  process  and  an 
incremental  ER  over  query  results.  Second,  our 
proposal is to index tuple identifiers, and not attribute 
values. In scenarios with a large volume of data, using 
multiple attributes for similarity index functions can 
be very costly and time-consuming (Christen, 2012; 
Ribeiro  et  al,  2016).  Third,  we  propose  to  index 
similarity  values  and  previous  ER  of  tuples  from 
multiple data sources. 
6  CONCLUSIONS 
In  this  paper, two  indexes for  incremental  ER  over 
query  results  were  presented, Cluster Index and 
Similarity Index. The quality and the efficiency of the 
ER process were evaluated, as well as the impact of 
the Similarity Index size on the incremental ER 
process  was  investigated.  We  showed,  on  a  real 
dataset,  that  our  indexes  are  suitable  for  the 
incremental ER process. The incremental ER had the 
same  quality  of  traditional  processes,  without 
indexes, but was more efficient. As future work, we 
intend to analyze the indexes with other datasets, as 
well as to evaluate other ER incremental algorithms. 
REFERENCES 
Altowim,  Y.,  Kalashnikov,  D.  V.,  Mehrotra,  S.  (2014). 
Progressive Approach to Relational Entity Resolution. 
In: VLDB. Hangshou, China. 
Altwaijry,  H.,  Kalashnikov,  D.  D.,  Mehrotra,  S.  (2013). 
Query-Driven Approach to Entity Resolution. In: 
VLDB. Trento, Italy. 
Bhattacharya,  I.,  Getoor,  L.  (2007).  Query-time  Entity 
Resolution. Journal of Artificial Intelligence Research. 
V 30 , issue 1, pp 621-657. 
Bhattacharya, I.; Getoor,  L.  (2007a). Entity Resolution In 
Graphs. In: Mining Graph Data. John Wiley & Sons, 
Inc. 
CDDB  (2016).  Available  in:  http://hpi.de/naumann/ 
projects/repeatability/datasets/cd-datasets.html. 
Christen, P. (2008). Febrl – An Open Source Data Cleaning, 
Deduplication  and  Record  Linkage  System  with  a 
Graphical User Interface. In: KDD. Las Vegas, USA. 
Christen,  P.  (2012).  Data Matching: Concepts and 
Techniques for Record Linkage, Entity Resolution, and 
Duplicate Detection. Springer.  
Christen, P. (2012a). A Survey of Indexing Techniques for 
Scalable Record Linkage and Deduplication. In: TKDE. 
V 24, issue 9, pp 1537-1555. 
FreeDB (2016). Available in: http://www.freedb.org/ 
Gruenheid,  A.;  Dong,  X.  L.;  Srivastava,  D.  (2014). 
Incremental Record Linkage. In: VLDB.  Hangzhou, 
China.  
Guo, S.;  Dong, X.; Srivastava, D.; Zajac, R. (2010). Record 
linkage with uniqueness constraints and erroneous 
values. In: PVLDB. Singapore. 
Ramadan, B. et al. (2015). Dynamic Sorted Neighbourhood 
Indexing for Real-Time Entity Resolution. In: Journal 
of Data and Information Quality. V 6, issue 4, nº 15.  
Ribeiro, L. A. et al. (2016). SJClust: Towards a Framework 
for  Integrating  Similarity  Join  Algorithms  and 
Clustering. In: ICEIS. Rome, Italy. 
Su,  W.,  Wang,  J.,  Lochovsky,  F,  H.  (2010).  Record 
Matching  Over  Query  Results  from  Multiple  Web 
Databases. In: TKDE. V 22, issue 4, pp 578-589. 
Tan, P.; Steinbach, M.; Kumar, V. (2006). Introduction to 
Data Mining. Pearson.  
Vieira, P. K. M.; Salgado, A. C.; Lóscio, B. F. (2016).  A 
Query-driven  and  Incremental  Process  for  Entity 
Resolution. In: AMW. Panama City, Panama. 
Whang,  S.  E.;  Marmaros,  D.;  Garcia-Molina,  H.  (2013). 
Pay-As-You-Go  Entity  Resolution.  In: TKDE. V 25, 
issue 5, pp 1111-1124.  
Whang, S. E.; Garcia-Molina, H. (2014). Incremental entity 
resolution on rules  and  data. In VLDB Journal. V 23, 
issue 1, pp 77- 102.