Long-term Correlation Tracking using Multi-layer Hybrid Features in Dense Environments

Nathanael L. Baisa, Deepayan Bhowmik, Andrew Wallace

Abstract

Tracking a target of interest in crowded environments is a challenging problem, not yet successfully addressed in the literature. In this paper, we propose a new long-term algorithm, learning a discriminative correlation filter and using an online classifier, to track a target of interest in dense video sequences. First, we learn a translational correlation filter using a multi-layer hybrid of convolutional neural networks (CNN) and traditional hand-crafted features. We combine the advantages of both the lower convolutional layer which retains better spatial detail for precise localization, and the higher convolutional layer which encodes semantic information for handling appearance variations. This is integrated with traditional features formed from a histogram of oriented gradients (HOG) and color-naming. Second, we include a re-detection module for overcoming tracking failures due to long-term occlusions by training an incremental (online) SVM on the most confident frames using hand-engineered features. This re-detection module is activated only when the correlation response of the object is below some pre-defined threshold to generate high score detection proposals. Finally, we incorporate a Gaussian mixture probability hypothesis density (GM-PHD) filter to temporally filter high score detection proposals generated from the learned online SVM to find the detection proposal with the maximum weight as the target position estimate by removing the other detection proposals as clutter. Extensive experiments on dense data sets show that our method significantly outperforms state-of-the-art methods.

References

  1. Babenko, B., Yang, M. H., and Belongie, S. (2011). Robust object tracking with online multiple instance learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(8):1619-1632.
  2. Chen, Z., Hong, Z., and Tao, D. (2015). An experimental survey on correlation filter-based tracking. CoRR, abs/1509.05520.
  3. Danelljan, M., Hager, G., Shahbaz Khan, F., and Felsberg, M. (2014). Accurate scale estimation for robust visual tracking. In Proceedings of the British Machine Vision Conference. BMVA Press.
  4. Deng, J., Dong, W., Socher, R., Li, L. J., Li, K., and Fei-Fei, L. (2009). ImageNet: A large-scale hierarchical image database. In Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on, pages 248-255.
  5. Diehl, C. P. and Cauwenberghs, G. (2003). SVM incremental learning, adaptation and optimization. In Neural Networks, 2003. Proceedings of the International Joint Conference on, volume 4, pages 2685- 2690 vol.4.
  6. Dinh, T. B., Yu, Q., and Medioni, G. (2014). Co-trained generative and discriminative trackers with cascade particle filter. Comput. Vis. Image Underst., 119:41- 56.
  7. Felzenszwalb, P. F., Girshick, R. B., McAllester, D., and Ramanan, D. (2010). Object detection with discriminatively trained part based models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(9):1627-1645.
  8. Girshick, R., Donahue, J., Darrell, T., and Malik, J. (2014). Rich feature hierarchies for accurate object detection and semantic segmentation. In Computer Vision and Pattern Recognition.
  9. Grabner, H., Leistner, C., and Bischof, H. (2008). Semisupervised on-line boosting for robust tracking. In Proceedings of the 10th European Conference on Computer Vision: Part I, ECCV 7808, pages 234-247.
  10. Han, B., Comaniciu, D., Zhu, Y., and Davis, L. S. (2008). Sequential kernel density approximation and its application to real-time visual tracking. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30(7):1186-1197.
  11. Hare, S., Saffari, A., and Torr, P. H. S. (2011). Struck: Structured output tracking with kernels. In 2011 International Conference on Computer Vision, pages 263- 270.
  12. Henriques, J. a. F., Caseiro, R., Martins, P., and Batista, J. (2012). Exploiting the circulant structure of trackingby-detection with kernels. In Proceedings of the 12th European Conference on Computer Vision - Volume Part IV, ECCV'12, pages 702-715.
  13. Henriques, J. F., Caseiro, R., Martins, P., and Batista, J. (2015). High-speed tracking with kernelized correlation filters. Pattern Analysis and Machine Intelligence, IEEE Transactions on.
  14. Idrees, H., Warner, N., and Shah, M. (2014). Tracking in dense crowds using prominence and neighborhood motion concurrence. Image and Vision Computing, 32(1):14 - 26.
  15. Jia, X., Lu, H., and Yang, M. H. (2012). Visual tracking via adaptive structural local sparse appearance model. In Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on, pages 1822-1829.
  16. Kalal, Z., Mikolajczyk, K., and Matas, J. (2012). Trackinglearning-detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(7):1409-1422.
  17. Kratz, L. and Nishino, K. (2012). Tracking pedestrians using local spatio-temporal motion patterns in extremely crowded scenes. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(5):987-1002.
  18. Li, Y. and Zhu, J. (2015). A Scale Adaptive Kernel Correlation Filter Tracker with Feature Integration, chapter Computer Vision - ECCV 2014 Workshops: Zurich, Switzerland, September 6-7 and 12, 2014, Proceedings, Part II, pages 254-265. Cham.
  19. Ma, C., Huang, J. B., Yang, X., and Yang, M. H. (2015a). Hierarchical convolutional features for visual tracking. In 2015 IEEE International Conference on Computer Vision (ICCV), pages 3074-3082.
  20. Ma, C., Yang, X., Zhang, C., and Yang, M. H. (2015b). Long-term correlation tracking. In 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 5388-5396.
  21. Rifkin, R., Yeo, G., and Poggio, T. (2003). Regularized least-squares classification. Nato Science Series Sub Series III Computer and Systems Sciences, 190:131- 154.
  22. Ristic, B., Clark, D. E., Vo, B.-N., and Vo, B.-T. (2012). Adaptive target birth intensity for PHD and CPHD filters. IEEE Transactions on Aerospace and Electronic Systems, 48(2):1656-1668.
  23. Rodriguez, M., Sivic, J., Laptev, I., and Audibert, J.-Y. (2011). Density-aware person detection and tracking in crowds. In Proceedings of the International Conference on Computer Vision (ICCV).
  24. Ross, D. A., Lim, J., Lin, R.-S., and Yang, M.-H. (2008). Incremental learning for robust visual tracking. International Journal of Computer Vision, 77(1):125-141.
  25. Simonyan, K. and Zisserman, A. (2015). Very deep convolutional networks for large-scale image recognition. ICLR.
  26. Smeulders, A. W. M., Chu, D. M., Cucchiara, R., Calderara, S., Dehghan, A., and Shah, M. (2014). Visual tracking: An experimental survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(7):1442-1468.
  27. van de Weijer, J., Schmid, C., Verbeek, J., and Larlus, D. (2009). Learning color names for real-world applications. Trans. Img. Proc., 18(7):1512-1523.
  28. Vedaldi, A. and Lenc, K. (2015). MatConvNet - convolutional neural networks for matlab. In Proceedings of the 25th annual ACM international conference on Multimedia.
  29. Vo, B.-N. and Ma, W.-K. (2006). The gaussian mixture probability hypothesis density filter. Signal Processing, IEEE Transactions on, 54(11):4091-4104.
  30. Wang, L., Ouyang, W., Wang, X., and Lu, H. (2015). Visual tracking with fully convolutional networks. In 2015 IEEE International Conference on Computer Vision (ICCV), pages 3119-3127.
  31. Wang, N. and Yeung, D.-Y. (2013). Learning a deep compact image representation for visual tracking. In Burges, C. J. C., Bottou, L., Welling, M., Ghahramani, Z., and Weinberger, K. Q., editors, Advances in Neural Information Processing Systems 26, pages 809-817.
  32. Wu, Y., Lim, J., and Yang, M. H. (2013). Online object tracking: A benchmark. In Computer Vision and Pattern Recognition (CVPR), 2013 IEEE Conference on, pages 2411-2418.
  33. Zhang, J., Ma, S., and Sclaroff, S. (2014). MEEM: robust tracking via multiple experts using entropy minimization. In Proc. of the European Conference on Computer Vision (ECCV).
  34. Zhang, T., Ghanem, B., Liu, S., and Ahuja, N. (2012). Robust visual tracking via multi-task sparse learning. In Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on, pages 2042-2049.
  35. Zhong, W., Lu, H., and Yang, M. H. (2012). Robust object tracking via sparsity-based collaborative model. In Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on, pages 1838-1845.
Download


Paper Citation


in Harvard Style

L. Baisa N., Bhowmik D. and Wallace A. (2017). Long-term Correlation Tracking using Multi-layer Hybrid Features in Dense Environments . In Proceedings of the 12th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 6: VISAPP, (VISIGRAPP 2017) ISBN 978-989-758-227-1, pages 192-203. DOI: 10.5220/0006117301920203


in Bibtex Style

@conference{visapp17,
author={Nathanael L. Baisa and Deepayan Bhowmik and Andrew Wallace},
title={Long-term Correlation Tracking using Multi-layer Hybrid Features in Dense Environments},
booktitle={Proceedings of the 12th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 6: VISAPP, (VISIGRAPP 2017)},
year={2017},
pages={192-203},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0006117301920203},
isbn={978-989-758-227-1},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 12th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 6: VISAPP, (VISIGRAPP 2017)
TI - Long-term Correlation Tracking using Multi-layer Hybrid Features in Dense Environments
SN - 978-989-758-227-1
AU - L. Baisa N.
AU - Bhowmik D.
AU - Wallace A.
PY - 2017
SP - 192
EP - 203
DO - 10.5220/0006117301920203