strategy narrows  the  target area,  while  the  relocation 
strategy is used to provide more accurate locations by 
reducing the loss of information in the neural network. 
As the example in the last line of Figure 7, the method 
proposed  in  this  paper  excludes  interference  from 
objects  in  same  categories  with  similar  appearance, 
while other  methods are  more susceptible to  shallow 
features and more likely to find false target then. Or, as 
examples in the second and second-to-last figures, the 
other  methods  work  not  as  well  as  ours  when 
illumination is stronger or weaker than usual. And in 
the first line of the Figure 7 we can find that even there 
is  only  one  salient  target  in  the  image  (simple 
background and low interference), our method can also 
achieve a higher location accuracy than others.  
Table  4:  Speed  test  on  MOTB  datasets,  where  NCC  and 
SSD  methods  can  only  use  CPU,  while  QATM  and  the 
method  in  this  paper  can  use  GPU  to  accelerate  the 
positioning effect. 
Methods  SSD  NCC  QATM  Ours 
Backend  CPU  GPU 
Average(ms)  296  321  1780  90 
Finally, the matching  speed is  also an important 
criterion to measure the performance of the algorithm 
for  practical  application.  Table.3  compares  the 
average  time  consumed  by  different  matching  and 
locating methods on MOTB datasets, it is clear that 
the  methods  proposed  in  this  paper  has  obvious 
advantages over traditional sliding window methods 
and QATM with GPU acceleration. 
 
4  CONCLUSIONS 
We  introduced  a  novel  target  matching  framework, 
which mainly includes Coverage- IoU based feature 
extractor,  verification  process  and  relocation  after 
expanding region of interests. The idea of Coverage- 
IoU  loss  in  this  framework  comes  from  that  the 
existing  IoU-loss  cannot  meet  the  coverage 
requirement  in  some  scenes.  The  coverage,  shape 
restriction and corner distance loss function can better 
describe the regression process of the bounding box 
and  acquire  more  accurate  position  regression. 
Moreover, the verification strategy present here is to 
reduce  false-positive  results  without  the  instance-
level template, so as to guide the regions of interest to 
the target area.  Finally, the  inspiration of relocation 
strategy comes from the location errors caused by the 
information  loss  caused  by  pooling  and  other 
operations  in  the  neural  network,  while  narrowing 
input size and relocating in this area can reduce the 
position  errors  to  achieve  better  performance  in 
location  accuracy.  Also,  the  relocation  strategy  and 
Coverage-IoU  Loss  proposed  in  this  paper  can  be 
easily  ported  to  other  common  tasks  like  target 
detection, instance segmentation and so on.
  
REFERENCES 
James,  Alex  Pappachen,  and  Belur  V  Dasarathy.  (2014). 
Medical Image Fusion: {A} Survey of the State of the 
Art. CoRR abs/1401.0. 
Hashemi,  Nazanin  Sadat,  Roya  Babaie  Aghdam,  Atieh 
Sadat  Bayat  Ghiasi,  and  Parastoo  Fatemi.  (2016). 
Template  Matching  Advances  and  Applications  in 
Image Analysis. arXiv preprint arXiv:1610.07231. 
Nan,  Junyu,  and  David  Held.  (2019).  Combining  Deep 
Learning and Verification for  Precise Object Instance 
Detection, no. CoRL: 1–20. 
Perveen,  Nazil,  Darshan  Kumar,  and  Ishan  Bhardwaj. 
(2013).  An  Overview  on  Template  Matching 
Methodologies and Its Applicatons 2 (10): 988–995. 
Dekel, Tali,  Shaul Oron, Michael Rubinstein, Shai Avidan, 
and  William  T.  Freeman.  (2015).  Best-Buddies  Simil-
arity for Robust Template Matching. Proceedings of the 
IEEE Computer Society Conference on Computer Vision 
and Pattern Recognition 07-12-June. IEEE: 2021–2029. 
Kat, Rotal, and Shai Avidan. (2018). Matching Pixels Using 
Co-Occurrence  Statistics.  Proceedings  of  the  IEEE 
Computer Society Conference on Computer Vision and 
Pattern Recognition, 1751–1759. 
Cheng,  Jiaxin,  Yue  Wu,  and  Premkumar  Natarajan. 
(2019)QATM: Quality-Aware Template Matching  for 
Deep  Learning.  Proceedings  of  the  IEEE  Computer 
Society  Conference  on  Computer  Vision  and  Pattern 
Recognition 2019-June: 11545–11554. 
Ren,  Shaoqing  and  Kaiming  He.  (2017).  Faster  R-CNN: 
Towards  Real-Time  Object  Detection  with  Region 
Proposal  Networks.  IEEE  Transactions  on  Pattern 
Analysis and Machine Intelligence 39 (6): 1137–1149. 
Ammirato,  Phil,  Cheng-Yang  Fu,  Mykhailo  Shvets,  Jana 
Kosecka, and Alexander C. Berg. (2018). Target Driven 
Instance Detection.  arXiv preprint arXiv:1803.04610, 
2018. 
Girshick,  Ross.  (2015).  Fast  R-CNN.  Proceedings  of  the 
IEEE  International  Conference  on  Computer  Vision 
2015Inter: 1440–1448. 
Rezatofighi, Hamid, Nathan Tsoi, JunYoung Gwak, Amir 
Sadeghian,  Ian  Reid,  and  Silvio  Savarese.  (2019). 
Generalized Intersection over Union: A Metric and A 
Loss for Bounding Box Regression. Proceedings of the 
IEEE  Conference  on  Computer  Vision  and  Pattern 
Recognition. 2019: 658-666. 
Joseph  Redmon.  (2013–2016).  Darknet:  Open  Source 
Neural  Networks  in  C.  http://pjreddie.com/darknet/, 
[Access 23-August-2020].