
 
Table 1: Average computation time and percentage for 
each step of ORB and DARP methods. 
  ORB DARP 
  ms % ms % 
Keypoint detection  16.11  80.89  2.63 9.40
Normal estimation  –  –  14.99 53.56
Patch rectification  –  –  8.40 30.01
Orientation estimation  0.14 0.71 0.20 0.72
Patch description  3.67  18.40  1.77 6.31
Total 19.92 100.00 27.99 100.00
4  CONCLUSIONS 
The DARP method has been introduced, which 
exploits depth information to improve keypoint 
matching. This is done by rectifying the patches 
using the 3D information in order to remove 
perspective distortions. The depth information is 
also used to obtain a scale invariant representation of 
the patches. It was shown that DARP can be used 
together with existing keypoint matching methods in 
order to help them to handle situations such as 
oblique poses with respect to the viewing direction. 
It supports both planar and non-planar objects and is 
able to run in real-time. 
As future work, tests with other keypoint 
detectors and patch descriptors will be done. 
Optimizations on normal estimation and patch 
rectification are also planned, since they showed to 
be the most time demanding steps of the technique. 
REFERENCES 
Berkmann, J., Caelli, T., 1994. Computation of surface 
geometry and segmentation using covariance 
techniques. In IEEE Transactions on Pattern Analysis 
and Machine Intelligence, volume 16, issue 11, pages 
1114–1116. 
Del Bimbo, A., Franco, F., Pernici, F., 2010. Local 
homography estimation using keypoint descriptors. In 
WIAMIS’10, 11th International Workshop on Image 
Analysis for Multimedia Interactive Services, 4 pages. 
Eyjolfsdottir, E., Turk., M., 2011. Multisensory embedded 
pose estimation. In WACV’11, IEEE Workshop on 
Applications of Computer Vision, pages 23–30. 
Hinterstoisser, S., Benhimane, S., Navab, N., Fua, P., 
Lepetit, V., 2008. Online learning of patch perspective 
rectification for efficient object detection. In 
CVPR’08, 21th IEEE Conference on Computer Vision 
and Pattern Recognition, 8 pages. 
Hinterstoisser, S., Kutter, O., Navab, N., Fua, P., Lepetit, 
V., 2009. Real-time learning of accurate patch 
rectification. In CVPR’09, 22th IEEE Conference on 
Computer Vision and Pattern Recognition, pages 
2945–2952. 
Koser, K., Koch, R., 2007. Perspectively invariant normal 
features. In ICCV’07, 11th IEEE International 
Conference on Computer Vision, 8 pages. 
Kurz, D., Benhimane, S., 2011. Gravity-aware handheld 
augmented reality. In ISMAR’11, 10th IEEE 
International Symposium on Mixed and Augmented 
Reality, pages 111–120. 
Lai, K., Bo, L., Ren, X., Fox, D., 2011. A large-scale 
hierarchical multi-view RGB-D object dataset. In 
ICRA’11, IEEE International Conference on Robotics 
and Automation, pages 1817–1824. 
Marcon, M., Frigerio, E., Sarti, A., Tubaro, S., 2012. 3D 
wide baseline correspondences using depth-maps. In 
Signal Processing: Image Communication, volume 27, 
issue 8, pages 849–855. 
Mikolajczyk, K., Tuytelaars, T., Schmid, C., Zisserman, 
A., Matas, J., Schaffalitzky, F., Kadir, T., Van Gool, 
L., 2005. A comparison of affine region detectors. In 
International Journal of Computer Vision, volume 5, 
issue 1–2, pages 43–72. 
Morel, J., Yu, G., 2009. ASIFT: A new framework for 
fully affine invariant image comparison. In SIAM 
Journal on Imaging Sciences, volume 2, issue 2, pages 
438–469. 
Moreno-Noguer, F., Lepetit, V., Fua, P., 2007. Accurate 
non-iterative O(n) solution to the PnP problem. In 
ICCV’07, 11th IEEE International Conference on 
Computer Vision, 8 pages. 
Pagani, A., Stricker, D., 2009. Learning local patch 
orientation with a cascade of sparse regressors. In 
BMVC’09, 20th British Machine Vision Conference, 
pages 86.1–86.11. 
Rosten, E., Drummond, T., 2006. Machine learning for 
high-speed corner detection. In ECCV’06, 9th 
European Conference on Computer Vision, pages 
430–443. 
Rublee, E., Rabaud, V., Konolige, K., Bradski, G., 2011. 
ORB: an efficient alternative to SIFT or SURF. In 
ICCV’11, 15th IEEE International Conference on 
Computer Vision, pages 2564–2571. 
Wu, C., Clipp, B., Li, X., Frahm, J.-M., Pollefeys, M., 
2008. 3D model matching with viewpoint invariant 
patches (VIPs). In CVPR’08, IEEE Conference on 
Computer Vision and Pattern Recognition, 8 pages. 
Yang, M., Cao, Y., Förstner, W., McDonald, J., 2010. 
Robust wide baseline scene alignment based on 3d 
viewpoint normalization. In ISVC’10, 6th 
International Symposium on Visual Computing, 
Lecture Notes in Computer Science, volume 6453, 
pages 654–665. 
VISAPP2013-InternationalConferenceonComputerVisionTheoryandApplications
656