Markerless Augmented Reality based on Local Binary Pattern

Youssef Hbali, Mohammed Sadgal and Abdelaziz EL Fazziki

Faculty of Sciences SEMLALIA, Cadi Ayyad University, Prince Moulay Abdellah Avenue, Marrakech, Morocco

Keywords:

Local Binary Pattern, Augmented Reality, Machine Learning, Face Detection, Eyes Detection.

Abstract:

Augmented reality is becoming the future of e-commerce, throw their mobile devices, customers have access

to all kind of information, going from weather, news papers, shops and so on. Today’s mobiles devices are

so powerful to the point that they can be used as a platform of virtual try-on systems. Over this paper we

present a virtual eye glasses try-on system based on augmented reality and LBP for face and eyes detection.

The well-known machine learning Ada Boost algorithm is used for real time eyes tracking, the resulting face

and eyes positions are continuously utilized to overlay the glasses model over the face. The system helps

evaluating glasses before trying them in the store and makes possible the design of its own style.

1 INTRODUCTION

Object detection is a fundamental part of many virtual

try-on systems. A ﬂexible eye glasses try-on system

which can be executed on mobile devices requires an

efﬁcient and robust face and eyes detection. Object

detectors techniques can be divided into two main

categories (Hjelmsa and Lowb, 2001) : feature-based

approach where human knowledge is used to extract

explicit object features such as nose, mouth, and

ears for a face detection. The second approach is

the image-based approach, in this approach, the

object detection problem is treated as binary pattern

recognition problem to distinguish between face and

non-face images, eye and non-eye images, etc. This

approach is a holistic approach that uses machine

learning to capture unique and implicit object fea-

tures. Based on the classiﬁcation strategy used in the

design process, image-based approach is categorized

into two subcategories: appearance-based approach

and boosting-based approach.

Appearance-based approach category is considered

as any image-based approach face detector that does

not employ the boosting classiﬁcation methods in

it classiﬁcation stage. However, other classiﬁcation

schemes are used such as neural networks (Rowley

et al., 1998) (Roth et al., 2000), Support Vector

Machines (SVM) (Osuna et al., ), Bayesian classiﬁers

(Cootesa et al., 2002) (Jin et al., 2004), and so forth.

All techniques in the appearance-based approach

lack the ability to perform in real-time, and it takes

an order of seconds to process an image.

The other image-based approach subcategory is

the boosting-based approach, this approach started af-

ter the successful work of Viola and Jones (Viola and

Jones, 2001) where high detection rate and high speed

of processing (15 frames/second)using the Ada Boost

(Adaptive Boosting) algorithm (Freund and Schapire,

1996) and cascade of classiﬁers were used. Boosting-

based approach is considered as any image-based ap-

proach that uses the boosting algorithm in the classi-

ﬁcation stage.

Augmented reality is a term for a live direct

or indirect view of a physical, real world environ-

ment whose elements are augmented by computer-

generated sensory input, such as sound or graphics

(Lu et al., 1999) (Shen et al., 2010). Augmentation

is conventionally in real-time, so is the need for a ro-

bust eyes detection system that is capable of process-

ing image rapidly and detecting eyes accurately in an

arbitrary face image with invariance to pose, scale and

lighting.

2 LOCAL BINARY PATTERN

The local binary pattern (LBP) is deﬁned as a gray-

scale invariant texture measure, derived from a gen-

eral deﬁnition of texture in a local neighborhood. The

original LBP operator labels the pixels of an image by

thresholding the 3-by-3 neighborhood of each pixel

with the center pixel value and considering the result

as a binary number. The decimal result is the sum of,

137

Hbali Y., Sadgal M. and EL Fazziki A..

Markerless Augmented Reality based on Local Binary Pattern.

DOI: 10.5220/0004531201370141

In Proceedings of the 10th International Conference on Signal Processing and Multimedia Applications and 10th International Conference on Wireless

Information Networks and Systems (SIGMAP-2013), pages 137-141

ISBN: 978-989-8565-74-7

 2013 SCITEPRESS (Science and Technology Publications, Lda.)

the thresholds multiplied by their weights values, as it

can be seen in Fig 1.

Figure 1: LBP Calculation.

In other words given a pixel position (x

, y

), LBP

is deﬁned as an ordered set of binary comparisons of

pixels intensities between the central pixel and its sur-

rounding pixels. The resulting label value of the 8-bit

word can be expressed as follows :

LBP(x

, y

) =

∑

n=0

t(l

− l

. (1)

where l

corresponds to the gray value of the cen-

tral pixel, l

the gray value of the neighbor pixel n,

and function t(k) is deﬁned as following :

t(k) =



1, for k ≥ 0 (2)

0, for k < 0 (3)

According to (2), the LBP code is invariant to mono-

tonic gray-scale transformations, hense the LBP rep-

resentation may be less sensitive to illumination

changes.

The 256-bin histogram of the labels computed over

an image can be used as texture descriptor. Each bin

of histogram (LBP code) can be regarded as micro-

texton and the histogram characterizes occurrence

statistics of simple texture primitive.

The histogram of the labeled image f

(x, y) can be de-

ﬁned as:

∑

x,y

I(f

(x, y) = i), i = 0, .., L− 1. (4)

where L is the number of different labels produced

by the LBP operator and l(A) is 1 if A true and 0 oth-

erwise.

2.1 Multi-scale LBP

The LBP operator has been extended to consider dif-

ferent neighborhood sizes to deal with various scales

(Ojala et al., 2002). The local neighborhood of the

LBP operator is deﬁned as set of sampling points

equally spaced on a circle of radius R centered on the

pixel to be labeled. These sampling points which do

not fall exactly on the pixels are expressed using bilin-

ear interpolation, therefore allowing any radius value

and any number of points in the neighborhood. Fig. 3

shows different LBP neighborhoods.

The notation LBP

P,R

denotes the neighborhood of P

sampling points on a circle of radius R.

Figure 2: LBP operator examples : circular(8,1),(16,2) and

(8,2).

2.2 Uniform Patterns

LBP

P,R

produces 2

different binary patterns that can

be formed by the P pixels in the neighbor set. Ojala

et al. (Ojala et al., 2002) have noticed that most of the

texture information was contained in a small subset of

LBP patterns. Therefore, it is possible to use only a

subset of the 2

LBPs to describe the textured images.

They deﬁned these fundamental patterns as those with

at most 2 bitwise transitions from 0 to 1 or vice versa.

For example, 00000000 and 11111111 contain 0 tran-

sition while 0110000 and 01111110 contain 2 transi-

tions and so on. In the computation of the LBP labels,

uniform patterns are used so that there is a separate la-

bel for each uniform pattern and all the non-uniform

patterns are labeled with a single label. For example,

when using (8,R) neighborhood, there are a total of

256 patterns, 58 of which are uniform, which yields

in 59 different labels.

3 LBP FACE FACIAL

REPRESENTATION

Face image can be decomposed Hadid et al. (Hadid

et al., 2004) introduced a face representation based

on LBP for face recognition. To consider the shape

information of faces, face images are divided into M

small non-overlappingregions R

,...,R

(as shown

in Fig. 4). The LBP histograms extracted from each

sub-region are then concatenated into a single, spa-

tially enhanced feature histogram deﬁned as :

i, j

∑

x,y

l( f

(x, y) = i)l((x, y) ∈ R

) (5)

where i=,..L-1, j=0,..M-1. The extracted feature

histogram describes the local texture and global shape

of face images.

SIGMAP2013-InternationalConferenceonSignalProcessingandMultimediaApplications

138

Figure 3: A face image is divided into sub-regions from

which LBP histograms are extracted and concatenated into

a single, spatially enhanced feature histogram.

4 AUGMENTED REALITY

Augmented Reality (AR) employs computer vision,

image processing and computer graphics techniques

to merge digital content into the real world. It enables

real- time interaction between the user, real objects

and virtual objects. AR can, for example, be used to

embed 3D graphics into a video in such a way as if the

virtual elements were part of the real environment.

Model-based tracking approaches(Reitmayr and

Drummond, 2006) appear to be the most promising

among the standard vision techniques currently ap-

plied in AR applications. While marker-based ap-

proaches such as ARToolkit(Kato and Billinghurst,

1999) or commercial tracking systems such as ART

provide a robust and stable solution for controlled en-

vironments, it is not feasible to equip a larger outdoor

space with ﬁducial markers. Hence, any such system

has to rely on models of natural features such as ar-

chitectural lines or feature points extracted from ref-

erence images.

For facial accessory products like eye glasses, it

appears embarrassing to place such markers in front

of the user who is trying on the glasses. Marker-

less AR systems use natural features instead of ﬁdu-

cial markers in order to perform tracking. Therefore,

there are no ambient intrusive markers that are not re-

ally part of the world. Furthermore, markerless AR

counts on specialized and robust trackers.

The ﬁrst step of building the learning-based track-

ing system is to produce training data. For the pro-

posed system we used two face databases, Bioid and

CIE. From these different images we extract facial

models under different rotations and both the left an

right eyes. Then by applying a boosting (Friedman

et al., 2000) classiﬁcation we produce two classiﬁer,

one for faces, the other for eyes. In this system, a vari-

ant of Ada Boost, Gentle Ada Boost is used to select

the feature and to train the classiﬁer. The formal guar-

antees provided by the Ada Boost learning procedure

are quite strong. It has been proved in (Freund and

Schapire, 1996) approaches zero exponentially in the

number of rounds. Gentle AdaBoost takes a newton

steps for optimization.

The weak classiﬁer is designed to select the single

LBP histogram bin which best separates the positive

and negative examples. Similar to (Viola and Jones,

2001), a weak classiﬁer h

(x) consists of a feature f

which corresponds to each LBP bin, a threshold θ

and a parity p

indicating the direction of the inequal-

ity sign:

(k) =



1 if p

(x) ≤ p

(6)

0 otherwise (7)

The boosted classiﬁer is a combination of weights and

weak classiﬁers.

Figure 4: Off-line learning stage.

MarkerlessAugmentedRealitybasedonLocalBinaryPattern

139

5 EXPERIMENTAL RESULTS

In this section, we will measure the performance

of the different classiﬁer over Bioid database. The

dataset of the Bioid database consists of 1521 gray

level images with a resolution of 384x286 pixel. Each

one shows the frontal view of a face of one out of 23

different test persons. For comparison reasons the set

also contains manually set eye positions. The classi-

ﬁer takes a collection of marked up test images, ap-

plies the classiﬁer and output the performance, i.e.

number of found objects, number of missed objects

and the number false alarms which are deﬁned as fol-

lowing :

• Hits : the number of correctly found objects.

• Missed : the number of missed objects (must exist

but are not found, also known as false negatives).

• False alarms : the number false alarms (must not

exist but are found, also known as false positives).

The performance of the classiﬁers will be illus-

trated by the receiver operating characteristic (Laskoa

et al., 2005) cuves which are frequently used in

biomedical informatics research to evaluate classiﬁ-

cation and prediction models for decision support, di-

agnosis, and prognosis. ROC analysis investigates the

accuracy of a model’s ability to separate positive from

negative cases (such as predicting the presence or ab-

sence of disease), and the results are independent of

the prevalence of positive cases in the study popula-

tion. It is especially useful in evaluating predictive

models or other tests that produce output values over

a continuous range, since it captures the trade-off be-

tween sensitivity and speciﬁcity over that range.

5.1 Test Performance for the Face

Classiﬁers

For tests performance, we will compare the trained

LBP classiﬁers to Haar feature classiﬁers, We apply

both face classiﬁers on the Bioid database, the results

are shown in the table 1.

Figure 5 gives the ROC curves comparing the per-

formance of the two faces classiﬁers. It shows that

there is little difference between the two classiﬁers in

terms of accuracy.

Table 1 shows that LBP is much fast than the Haar

classiﬁer. The number of hits faces of LBP detector is

higher than Haar detector and false alarms detected by

Haar detector is considerable compared to the number

of false alarms detected by the LBP detector.

Table 1: LBP and Haar faces classiﬁers performance evalu-

ation and comparison.

Features Hits Missed False alarm Total time

LBP 1392 129 28 6.34 seconds

Haar 1377 144 547 157 seconds

0 0.005 0.01 0.015 0.02 0.025 0.03 0.035 0.04

0.94

0.95

0.96

0.97

0.98

0.99

False acceptance rate

Detection rate

Haar

LBP

Figure 5: Roc curve of the LBP and Haar face classiﬁer

applied on the Bioid database.

5.2 Test Performance for the Eyes

Classiﬁers

Table 2 shows the results of applying the eyes classi-

ﬁers on the Bioid dataset.

It clearly appears the LBP eyes classiﬁer gets down

in terms of accuracy, however its shows that the LBP

detector response time still more interesting than Haar

detector response time.

Table 2: LBP and Haar eyes classiﬁers performance evalu-

ation and comparison.

Features Hits Missed False alarm Total time

LBP 966 555 1993 43.6 seconds

Haar 1986 1056 630 73 seconds

6 PROPOSED ARCHITECTURE

For the augmented reality part, we apply the face clas-

siﬁer to ﬁrst detect the face, see the ﬁgure 6.a . From

this detected face we use the region of interest marked

by the rectangle withing the face, see the ﬁgure 6.b,

to look for eyes. Then, applying the eyes classiﬁer

to the region of interest leads to detect eyes withing

the given image with better accuracy and better time

response.

Finally, our system uses ARtoolkit to overlay a

VRML (Virtual Reality Modeling Language) model

SIGMAP2013-InternationalConferenceonSignalProcessingandMultimediaApplications

140

glasses on the image. Giving the user the ability to

test many glasses without putting any kind of markers

on his face.

Figure 6: a) - Frame image rendered by the webcam, b) -

The detected face and eyes region of interest, c) The de-

tected eyes, d) The glasses model overlayed on the detected

eyes.

7 CONCLUSIONS

In this paper,we introduced and implemented marker-

less augmented reality based on local binary patterns

for eyes and face detection. The LBP features have

proved accuracy on face detection, for small region

like eyes, the LBP still need more improvements.

Due to the computational simplicity and speed of

the LBP, virtual try-on systems can easily be imple-

mented on mobile devices. This approach can im-

provehighly the electronic commerce and will change

the customers shopping habits.

REFERENCES

Cootesa, T., Wheelera, G., Walkerb, K., and Taylora, C.

(2002). View-based active appearance models. Image

and Vision Computing, 20(9-10):657664.

Freund, Y. and Schapire, R. E. (1996). Experiments with a

new boosting algorithm. Proceedings of International

Conference on Machine Learning, page 148156.

Friedman, J., Hastie, T., and Tibshirani, R. (2000). Addi-

tive logistic regression: a statistical view of boosting.

Annals of statistics, 28(2):337–407.

Hadid, A., Pietikainen, M., and Ahonen, T. (2004). A

discriminative feature space for detecting and recog-

nizing faces. Computer Vision and Pattern Recogni-

tion, 2004. CVPR 2004. Proceedings of the 2004 IEEE

Computer Society Conference on, 2:797–804.

Hjelmsa, E. and Lowb, B. K. (2001). Face detection: A

survey. Computer Vision and Image Understanding,

3(83):236274.

Jin, H., Liu, Q., Lu, H., and Tong, X. (2004). Face detec-

tion using improved lbp under bayesian framework.

Multi-Agent Security and Survivability, 2004 IEEE

First Symposium on, pages 306–309.

Kato, H. and Billinghurst, M. (1999). Marker tracking and

hmd cal-ibration for a video-based augmented real-

ity conferenencing system. Augmented Reality, 1999.

(IWAR ’99) Proceedings. 2nd IEEE and ACM Inter-

national Workshop on, pages 85–94.

Laskoa, T., Bhagwatc, J., Kelly, H., and Ohno-Machado, L.

(2005). The use of receiver operating characteristic

curves in biomedical informatics. Journal of Biomed-

ical Informatics, 38(5):404–415.

Lu, S., Shpitalni, M., and Gadh, R. (1999). Virtual and aug-

mented reality technologies for product realization.

CIRP Annals - Manufacturing Technology, 2(2):471–

495.

Ojala, T., Pietikainen, M., and Maenpaa, T. (2002). Mul-

tiresolution gray-scale and rotation invariant tex-

ture classiﬁcation with local binary patterns. IEEE

Trans, Pattern Analysis and Machine Intelligence,

24(7):971–987.

Osuna, E., Freund, R., and Girosit, F. Training support vec-

tor machines: an application to face detection. Com-

puter Vision and Pattern Recognition, 1997. Proceed-

ings IEEE Computer Society Conference on, pages

130–136.

Reitmayr, G. and Drummond, T. (2006). Going out: robust

model-based tracking for outdoor augmented reality.

Mixed and Augmented Reality, 2006. ISMAR 2006.

IEEE/ACM International Symposium on, pages 109–

118.

Roth, D., Yang, M., and Ahuja, N. (2000). A snow-based

face detector. Advances in Neural Information Pro-

cessing Systems, pages 855–861.

Rowley, H. A., Baluja, S., and Kanade, T. (1998). Neu-

ral network-based face detection. Pattern Analy-

sis and Machine Intelligence, IEEE Transactions on,

20(1):23–38.

Shen, Y., Ong, S., and Nee, A. (2010). Augmented real-

ity for collaborative product design and development.

Design Studies, 31(2):118–145.

Viola, P. and Jones, M. (2001). Rapid object detecting

using boosted cascade of simple features. in Proc.

IEEE Conf. Computer Vision and Pattern Recognition,

1:511–518.

MarkerlessAugmentedRealitybasedonLocalBinaryPattern

141