Fast In-the-Wild Hair Segmentation and Color Classiﬁcation

Tudor Alexandru Ileni

, Diana Laura Borza

and Adrian Sergiu Darabant

Department of Computer Science, Faculty of Mathematics and Computer Science,

Babes-Bolyai University, Cluj-Napoca, Romania

Computer Science Department, Technical University of Cluj-Napoca, Romania

Keywords:

Hair Segmentation, Hair Color, Fully Convolutional Neural Network, Histograms, Neural Network.

Abstract:

In this paper we address the problem of hair segmentation and hair color classiﬁcation in facial images using

a machine learning approach based on both convolutional neural networks and classical neural networks.

Hair with its color shades, shape and length represents an important feature of the human face and is used

in domains like biometrics, visagisme (the art of aesthetically matching fashion and medical accessories to

the face region) , hair styling, fashion, etc. We propose a deep learning method for accurate and fast hair

segmentation followed by a histogram feature based classiﬁcation of the obtained hair region on ﬁve color

classes. We developed a hair and face annotation tool to enrich the training data. The proposed solutions are

trained on publicly available and own annotated databases. The proposed method attained a hair segmentation

accuracy of 91.61% and a hair color classiﬁcation accuracy of 89.6%.

1 INTRODUCTION

Face analysis has received great interest from the

computer vision community due to its applications

in various domains: behavioral psychology, human-

computer interaction, biometrics etc.. However, most

of the research conducted in this area focused mainly

on internal face features (eyes, eyebrows, lips etc.),

while the external features (hair, chin contour) were

somewhat neglected.

The hair plays an important role in human face re-

cognition: in (Sinha and Poggio, 2002) it was proved

that internal face features are ignored in favor of ex-

ternal face one and the overall head structure. Other

studies showed that the facial features are perceived

holistically (Sinha et al., 2006) and that the hair line

and color are an important recognition cues in cases

when the shape features are distorted. In the ﬁeld of

soft biometrics, the hair style is one of the most ef-

fective biometric traits (Proenc¸a and Neves, 2017).

Nowadays e-commerce and digital interaction

with the clients play an important role in the ﬁeld

of modern optometry. Several technologies around

virtual/virtual-try on applications were developed,

which allow customers to experiment frames and

glasses from the comfort of their homes with a similar

experience to that in an optical shop. These systems

are based on 3D models of real glasses and frames.

As the worldly offer of frames and glasses is very

large, users (buyers) are seldom in the situation of

being overwhelmed by the multitude of choices. One

has to physically try thousands of frames to see which

one ﬁts him better medically and aesthetically. In this

context, aesthetics has seen development of new ap-

proaches: visagisme; it is a new subject which allow

humans to enhance their appearance by choosing the

appropriate accessories that are in harmony with their

face. It deﬁnes a complex set of rules taking in ac-

count facial features like: hair texture and color, face

shape, skin tone and texture, location of lips, eyes and

facial proportions, etc. Amongst these hair color is

one of the decisive factors when choosing the eyeglas-

ses as it covers a major part of the upper side of the

head.

However, automatic hair analysis was not intensi-

vely studied. First of all, the hairstyle and its color

can be easily changed, but in practice, most of the

people keep the same hairstyle for a long period of

time. Also, numerous hairstyles exist (symmetrical,

asymmetrical, curly, bald etc.), so, unlike other face

features, it is hard to establish the areas where hair is

likely to be present. Also, the hair’s color distribution

is not uniform (different color for the roots and locks,

highlights), making it more difﬁcult to detect.

This paper proposes a hair segmentation and color

recognition method targeted for visagisme applicati-

Ileni, T., Borza, D. and Darabant, A.

Fast In-the-Wild Hair Segmentation and Color Classiﬁcation.

DOI: 10.5220/0007250500590066

In Proceedings of the 14th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP 2019), pages 59-66

ISBN: 978-989-758-354-4

ons. The hair area is determined using state of the art

fully convolutional neural networks (CNN); the de-

tected ”hair” pixels are used to construct a color histo-

gram which is further analyzed by an artiﬁcial neural

network (ANN) to decide on the hair tone.

The remainder of this paper is organized as fol-

lows: in Section 2 we review the recent advances in

the ﬁeld of automatic hair analysis. The proposed so-

lution is detailed in Section 3, and the experimental

results are reported in Section 4. This work is conclu-

ded in Section 5.

2 RELATED WORK

Automatic hair color analysis was pioneered by (Ya-

coob and Davis, 2005): the authors proposed a met-

hod for hair segmentation in frontal facial images.

The hair area is established using facial proportions,

color information and region growing. The work also

deﬁned several metrics to describe the hair’s proper-

ties: length, dominant color, volume, symmetry etc.

Paper (Rousset and Coulon, 2008) introduces a novel

hair segmentation method by intersecting two image

masks computed by frequency and color analysis, re-

spectively. In (Julian et al., 2010) the hair region is

segmented in two steps: ﬁrst, a simple hair shape mo-

del is ﬁtted to upper hair region using active shape

models. Next, a pixel-wise segmentation is perfor-

med based on the appearance parameters (texture, co-

lor) learned from the ﬁrst region. A hair segmenta-

tion method tuned for automatic caricature synthesis

is described in (Shen et al., 2014). In an off-line trai-

ning phase, the prior distribution hair’s position and

color likely-hood are estimated from a labeled data-

set of images. Based on this information, the hair is

localized through graph-cuts and k-means clustering.

Recently, more robust hair segmentation algo-

rithms were proposed which perform well on images

captured in unconstrained environments. In (Proenc¸a

and Neves, 2017), a two-layer Markov Random Field

architecture is proposed: one layer works at pixel le-

vel, while the second one operates at object level and

guides the algorithm towards possible solutions. The

method presented in (Muhammad et al., 2018) con-

structs a hair probability map from overlapping image

patches using a Random Forest classiﬁer and features

extracted by a CNN. This rough segmentation is re-

ﬁned using local ternary patterns and support vector

machines to perform hair classiﬁcation at pixel level.

Other works also tackled the problem of hair co-

lor classiﬁcation as a soft biometric trait. The works

(Krupka et al., 2014), (Prinosil et al., 2015) propose a

hair color analysis method from video sequences. The

head area is estimated trough background subtraction

and face detection and a face skin mask is computed

using ﬂood-ﬁll. The hair area is simply determined as

the difference between the head and the skin. The hair

color is classiﬁed into ﬁve distinct tones: white/gray,

black, brown, red and blond. In (Sarraf, 2016), the

hair color is distinguished only between ”black” and

”non-black” tones. The values, mean and variance of

each channel from the RGB and HSV representation

of the images are combined into a feature vector and

a machine learning classiﬁer (kNN or SVM) is used

to decide on the hair color.

3 PROPOSED SOLUTION

The problem of hair color classiﬁcation involves two

main steps: hair segmentation and color analysis. The

segmentation module detects all the pixels from the

input image which belong to the ”hair” class; this mo-

dule has a great impact on the color recognition mo-

dule, as an incorrect segmentation inﬂuences the ap-

plicability of the color features.

A general outline of the proposed method is de-

picted in Fig. 1. First, the hair area is extracted using

a CNN; as the hair has a uniform structure, an additi-

onal post-processing step is applied in order to ﬁll in

the (eventual) gaps from the hair pixels.

The hair color classiﬁcation module analyzes the

detected hair pixels to decide on the hair tone. The

classiﬁcation is performed using an artiﬁcial neural

network which operates on normalized color histo-

grams.

3.1 Hair Segmentation

Segmentation is the process of detecting and highlig-

hting one or many objects of interest in an image. It

also can be viewed as a classiﬁcation problem, by as-

signing to each pixel of the image a label.

We used a variant of the U-Net fully convolutional

network (U-Net FCN) (Ronneberger et al., 2015) to

detect the hair pixels. The architecture comprises two

symmetric parts.

The ﬁrst part, the contraction path, iteratively

down-samples the original image: at each step a 3 ×3

pooling operation is applied and the number of output

channels is doubled. Different from the implementa-

tion in (Ronneberger et al., 2015), the classical con-

volutions are replaced by depthwise convolutions, in

order to reduce the computational cost, but still to be-

neﬁt from spatial and depthwise information. Such

layers are created from a pipeline of operations. First,

VISAPP 2019 - 14th International Conference on Computer Vision Theory and Applications

Face image

U-Net FCN

Hair color

feature extraction

Normalized

histograms

Hair color

Segmentation module

Hair color classification module

Artificial neural

network

Figure 1: Outline of the proposed hair segmentation and color recognition module.

the convolution kernel is applied for each input chan-

nel, and then a pointwise convolution is performed on

the resulting matrix.

The input layer has the shape of 224 × 224× 3

corresponding to a 3 channel image. The ﬁrst layer is

a classical convolution that outputs 32 ﬁlters. In plus,

it performs a batch normalization (Ioffe and Szegedy,

2015) and a ReLU activation.

The second part, the expansive path, is symmetri-

cal to the contraction path, and it reverts the dowsam-

pling operations, by using transpose (or fractionally

strided convolutions) layers. At each step, a 2 × 2 de-

convolution is applied to increase the feature map size

and halve the number output channels.

In addition, as described in (Ronneberger et al.,

2015), a cropped part of the corresponding con-

traction path layer is concatenated to each deconvo-

lution layer. This way, the architecture beneﬁts from

a mixture of low and high level features, similar to

skip layers, introduced in (Long et al., 2015). Each

such concatenation is followed by a depthwise convo-

lutional block, as previously detailed. Finally, a 2D

convolution of size one and an upsampling layer are

added. The upsampling layer is initialized by bilinear

interpolation.

As the proposed method is intended for visagisme

applications, where the user is cooperative and has a

(near-)frontal position, the network operates on face

images. The face is detected in the input image using

an off-the-shelf face detector (King, 2009) and the

face area is enlarged (both on width and height) with

a factor of 1.5; this region of interest is cropped from

the input image and used as input for the network.

3.1.1 Segmentation Post-processing

The human hair has a uniform pattern, therefore we

apply an additional post-processing step on the hair

segmentation mask in order to ﬁll in the eventual gaps

within the detected hair area. We used a simple algo-

rithm based on the ﬂood-ﬁll operation. The method is

described in the Algorithm 1.

Data: HM - binary hair segmentation mask

Result: RES - hair segmentation mask, gaps

ﬁlled

Apply back border of size 5 to HM

Select pixel (s

) outside the hair area

FLOOD MASK ← ﬂoodFill(HM, (s

))

FLOOD MASK ← ¬ FLOOD MASK

RES ← HM ∨ FLOOD MASK

Algorithm 1: Hair area post-processing.

In the above algorithm, ¬ and ∨ are the bitwise not

and or operators and the ﬂoodFill(I, (x, y)) function

performs a ﬂood ﬁll operation on the input image I

starting from the seed point (x,y) and returns a bi-

nary mask which highlights the pixels modiﬁed by

this operation.

The algorithm works with a binary mask (0 - back-

ground pixel, 255 - hair pixel) and it ﬁrst applies a

border on this image in order to consider the cases

where the hair area reaches the borders of the image.

Next, a background pixel is selected and ﬂood ﬁll is

applied starting from this pixel; as a result we obtain a

mask that marks all the background pixels which are

not inside the hair contour. This mask is inverted, now

all the background pixels within the hair area become

white. Finally, a bitwise or operation between the ori-

ginal hair mask and this inverted ﬂood ﬁll mask, is

performed in order to ﬁll in the gaps from the initial

segmentation.

In the case of bald individuals detecting the hair

color does not make sense. Therefore, we need a rule

to decide whether the subject has hair before feeding

the input image to the hair tone recognition module.

We propose a simple, yet efﬁcient method for this

task based on the proportion and position of the hair

area relative to the face area.

First, an off-the-shelf facial landmark detector

(King, 2009) is applied to ﬁnd 68 facial landmarks on

the face. Only the external face landmarks are used

Fast In-the-Wild Hair Segmentation and Color Classiﬁcation

to compute the face area; it can be noticed that this

algorithm only computes the face contour for the lo-

wer face part. In order to estimate the upper region of

the face, we scan the upper region of the face, in polar

coordinates, starting from the middle eyebrow point

) with a radius R and with the angle θ ∈ 0,180.

For each angle, we mark the ﬁrst pixel on that radius

labelled as ”hair” as a face contour pixel; if no such

pixel exists, we consider that the face boundary for

the current angle θ is R pixels away from (e

). We

heuristically determined that R = 0.7 · f w, where f w

is the width of the face, is sufﬁcient for most human

face shapes. An overview of this process and its re-

sult are depicted in Fig. 2. In the ﬁgure, the detected

landmarks are represented in the lower part of the ﬁ-

gure with yellow circles, while the estimated upper

face contour is drawn with a yellow curve.

(a) Hair segmentation mask (b) Estimated face area

Figure 2: Face area estimation based on hair segmentation

and facial landmarks.

To determine if the person pictured in the input

image has hair, we compute the ratio between the hair

area and the face area b

. If the value of b

is less

than 0.15 it is possible that the subject is bald; this

threshold value was determined through trial and er-

ror experiments. In order not to label persons with

(very) short hair as bald, we also add an additional

rule regarding the position of the detected hair: only

if the detected hair area is split into multiple parts on

the sides of the face. We made this assumption based

on the fact that human hair loss (androgenic alopecia)

follows a similar pattern: the hair starts to fall from

above the temples and the calvaria of the scalp (skul-

lcap) and it progressively extends to the side and rear

of the head (Asgari and Sinclair, 2011).

3.2 Hair Color Classiﬁcation

We propose a hair color taxonomy consistent with

the natural hair colours (ﬁve classes): black, blond,

brown, grey/white and red.

To recognize the hair color, only the pixels which

were classiﬁed as belonging to the ”hair” class are

analyzed. We compute a normalized color histogram

from all the ”hair” pixels and we feed this feature vec-

tor to an artiﬁcial neural network with two hidden lay-

ers. The hidden layers contains 4096 neurons each.

Of course, the ﬁrst layer has the size of the feature

vector, while the last layer has 5 neurons (the number

of classes).

Multiple colorspaces were proposed to encode co-

lors, but none of them can be considered as a ”best”

representation. In each colorspace the color informa-

tion is encoded differently, such that colors are more

intuitively distinguished or certain computations are

more suitable. We tested our method by representing

the input image in the most commonly used color-

spaces RGB, HSV and Lab; all these experiments are

detailed in Section 4.

4 EXPERIMENTAL RESULTS

4.1 Databases

Training data is a crucial aspect for (deep-)learning,

as it determines what the classiﬁer learns before being

applied to unseen data.

For the segmentation part, we gathered images

from two publicly available datasets: Labeled Fa-

ces in the Wild (LFW) (Huang et al., 2007) and

CelebA (Liu et al., 2015). LFW dataset contains

more than 13000 celebrity images captured in un-

controlled scenarios; the only restriction imposed on

an image is that the face can be detected using the

Viola-Jones face detector. We used an extension of

this database, Part Labels Database (Kae et al., 2013),

which contains the semantic labelling into Hair-Skin-

Background of 2927 image from LFW.

CelebA is a multi face attribute dataset, contai-

ning more than 200k images with large pose variati-

ons. The database comprises more than 10000 identi-

ties and is annotated with 40 binary attributes, like:

wearing Hat, Wavy hair, mustache, just to name a

few. The annotations also contain information about

the hair color: Black hair, Blond Hair, Brown Hair

and Gray Hair. However, it does not include the red

hair class and we noticed some inconsistencies in the

ground truth annotations for these attributes. There-

fore, this labelling cannot be used as it is for hair color

classiﬁcation.

We also developed an application which can be

used to manually mark the skin and hair area from

facial images; i.e. to create skin and hair mask. We

used this application to manually mark the hair area

for 2188 additional images from the CelebA dataset.

VISAPP 2019 - 14th International Conference on Computer Vision Theory and Applications

The total size of the dataset used to training the

segmentation network is 5115 images; 20% of these

images were used for validation. Data augmentation

was also performed, because of the small size of the

dataset. We introduced a random rotation of maxi-

mum angle 20, a shear deformation of 0.2 and a zoom

range of 0.2.

For the hair color classiﬁcation module we used

images from the CelebA dataset; ﬁrst, a raw classiﬁ-

cation was performed based on the binary attributes

- Black hair, Blond Hair, Brown Hair and Gray Hair

- provided by the dataset. Starting from this coarse

classiﬁcation, three human labellers classiﬁed each

image into the following classes: black, blond, brown,

grey and red. The annotations are made publicly avai-

lable. We used more than 20000 images to train the

network and 2000 images to evaluate its performance.

4.2 Hair Segmentation

To train the segmentation network we used a stochas-

tic gradient descend optimizer with a learning rate

equal to 0.0001. The training time took more than 14h

running on two NVIDIA Tesla K40m 12GB GPUs.

We run the training 250 epochs, but no major impro-

vements was made after the step 200. This can be

view in Fig. 3.

Figure 3: Training (blue curve) and validation loss (orange

curve) for hair segmentation task, during 250 epochs.

We evaluated the hair segmentation model using

the Intersection Over Union (IoU) metric. It is a scale

invariant method, that calculates the similarity bet-

ween two ﬁnite sets, by dividing the size of the in-

tersection by the size of the union. More formally, the

metric is deﬁned as:

J(A, B) = |A ∩ B|/(|A| + |B| − |A ∩ B|). (1)

We also report two variations of this metric: the

Mean IoU and the frequency Weighted IoU. The ﬁrst

one is deﬁned as:

∑

i j

− n

(2)

and the latter as:

(

∑

)

−1

∑

− n

)

(3)

where n

is the number of segmentation classes,

i j

is the number of pixels of class i predicted to be

in class j, and t

the total number of pixels in ground

truth segmentation of class i.

We also compute pixel accuracy (or precision)

pixelAcc =

∑

. (4)

and mean pixel accuracy:

meanPixelAcc =

∑

. (5)

When training the FCN, we tested two conﬁgura-

tions for the hyper-parameters of the network:

• c1 : momentum = 0.9 and batch size = 16 samples

• c2 : momentum = 0.98 and batch size = 2 samples

In Table 1 we report the performance of the seg-

mentation module, for both training conﬁgurations

(c1 and c2), on CelebA and Figaro1k databases.

Figaro1k contains 1050 unconstrained image la-

belled with seven different hair-styles: straight, wavy,

curly, kinky, braids, dreadlocks, short. However, not

all this images from this dataset can be used to test the

proposed solution as the face is not always visible in

these samples. The hair segmentation module works

with facial images (the ﬁrst step of the algorithm is

to detect and crop the face area). Therefore, we ﬁrst

apply a face detector (King, 2009) and compute the

hair segmentation mask only on the samples in which

a face was detected. In total, we obtained 171 images

from 1050. Although the Figaro1k dataset was envi-

sioned for other purposes (hair texture and hairstyle

classiﬁcation), we tested our solution on this dataset

in order to be able to compare with other works pu-

blished in the literature.

On both datasets, better results are obtained using

the second conﬁguration of the network’s hyper-

parameters. As expected, the segmentation perfor-

mance decreases on the Figaro1k database. The ima-

ges contained in this dataset contain unusual hairsty-

les and shapes. Also, in some images the face is not

fully contained and some parts of it are cropped. On

average (on both datasets), the mean pixel accuracy

for the hair segmentation module is 91.61%.

Some hair color segmentation results are depicted

in Fig. 4.

Fast In-the-Wild Hair Segmentation and Color Classiﬁcation

Table 1: Hair segmentation performance on the CelebA and Figaro-1k databases.

CelebA Figaro-1k

Metric c1 c2 c1 c2

mean pixel accuracy 93.84% 94.76% 83.42% 88.46%

mean IoU 88.34% 90.35% 76.28% 81.96%

weighted freq. IoU 92.59% 93.89% 81.91% 86.01%

pixel accuracy 95.99% 96.76% 89.55% 92.13%

Figure 4: Some examples of hair segmentation results.

To evaluate the proposed algorithm for hair vs. no-

hair detection (i.e. hair baldness), we selected 100

images (50 bald, 50 non-bald) from the CelebA da-

taset. The confusion matrix for this test scenario is

illustrated in Table 2. The ground truth labels are re-

presented on each row. The accuracy of the proposed

algorithm is 91%.

Table 2: Confusion matrix for bald vs hair classiﬁcation.

Bald Hair

Bald 44 6

Hair 3 47

4.3 Hair Color Classiﬁcation

The hair color is classiﬁed based on the normalized

color histogram of the hair pixels using a classical ar-

tiﬁcial neural network. To evaluate the classiﬁer per-

formance we selected 2000 (400 for each hair color

class) images from the CelebA dataset. The ground

truth was obtained by merging the classiﬁcations per-

formed by three independent human labelers; in cases

of disagreement we used simple voting to obtain the

ground truth. We observed that the majority of con-

fusions occurred between the red-brown and blond-

brown classes.

In Table 3 we report the performance of the hair

color classiﬁcation module for different colorspaces

and sizes for the feature vector. The test samples are

balanced, i.e. we have 400 images belonging to each

hair color class.

Table 3: Hair color classiﬁcation performance for different

colorspaces.

Colorspace Bin size Accuracy

RGB 1,1,1 0.878

HSV 1,1,1 0.881

LAB 1,1,1 0.883

RGB 8,8,8 0.881

HSV 8,8,8 0.889

LAB 8,8,8 0.896

The best results are obtained using the LAB color-

space with a bin size of 8; Table 4 shows the confu-

sion matrix for this conﬁguration. The rows contain

the ground truth class.

Table 4: Confusion matrix for the LAB colorspace with bin

size of 8.

Black Blond Brown Grey Red

Black 398 0 1 0 1

Blond 0 398 2 0 0

Brown 0 0 397 0 3

Grey 1 3 2 394 0

Red 0 1 4 0 395

The execution time for the hair color classiﬁcation

method is on average 8 · 10

−4

seconds for a batch of

32 samples, run on the GPU device.

VISAPP 2019 - 14th International Conference on Computer Vision Theory and Applications

Blond Red Grey Black Brown

Brown as grey Brown as red Blond as grey

Figure 5: Correct and incorrect hair color classiﬁcation.

Fig. 5 shows some correct and incorrect hair color

classiﬁcation results.

4.4 Comparison with State of the Art

As discussed in Section 2, several works addressed

the problem of hair segmentation. However, there

is no standardized benchmark for this task and some

methods were only tested on internal, non-public da-

tasets. The method (Muhammad et al., 2018) was

evaluated on all 1050 images from Figaro1k dataset

and the best conﬁguration attained 91.5% segmenta-

tion accuracy. The algorithm uses features extracted

by a CNN, local ternary patterns, super-pixels and a

random forest classiﬁer to segment the hair pixels.

The FCN for hair segmentation proposed in this

paper obtained a pixel accuracy of 92.13% on the sub-

set of Figaro 1k database that meets the requirements

of our application; i.e. the face must be detected in

the input image. Due to this fact (we couldn’t use all

the images from the dataset), a direct numerical com-

parison is not relevant. However, our average pixel

accuracy on (on all the available test data) is 91.61%,

so, we can conclude that our method is at least com-

parable with (Muhammad et al., 2018).

To the best of our knowledge, only two more pa-

pers addressed the problem of hair color classiﬁca-

tion: (Sarraf, 2016) and (Krupka et al., 2014). In

(Sarraf, 2016) the hair tone is distinguished in only

two classes: black and non-black, so a direct compari-

son with this work is not possible. The authors report

an accuracy score of 97% in the best case and 55% in

the worst scenario. However, we can extrapolate the

results from Table 4 and compute the accuracy score

for the black vs. non-black hair scenario: 99.85% (it

should be noted that the classes are unbalanced in this

scenario: 400 black and 1600 non-black).

The work (Sarraf, 2016) uses the same hair color

taxonomy as the one presented in this paper. Their

accuracy is 88.66% (value computed from the con-

fusion matrix). However, the test data from (Sarraf,

2016) is not balanced: the red hair class is represented

by only 3 samples, while the black hair class contains

30 samples.

Our method attains a hair color classiﬁcation

accuracy of 89.6%, so it can be concluded that the

proposed classiﬁcation module achieves better results

than the other works presented in the literature.

5 CONCLUSION

This paper presented an automatic skin tone analy-

sis system targeted for (on-line) eyeglasses virtual try

on applications. Using a simple consumer camera

and a virtual reality application, the user can perceive

his/her appearance with different type of eyeglasses.

Our method intervenes in the virtual eyeglasses dis-

play strategy: as the available dataset of 3D glasses is

large, the assets should be displayed to the user such

that the most suitable glasses for his/her appearance

show up ﬁrst. For this purpose a new ﬁeld of study,

visagisme was developed to help users choose the ap-

propriate accessories based on their physical appea-

rances. Hair color is one of the most important visa-

gisme attribute in the choice of eyeglasses. Our met-

hod analyses an input image and outputs the hair color

of the user: black, blond, brown, grey or red.

The proposed method involves two main steps:

segmentation and color analysis. First, the hair area

is determined using a state of the art fully CNN; addi-

tional morphological operators are applied to the hair

mask in order to ﬁll in the eventual gaps in the hair

area. The hair pixels are further analysed by a classi-

Fast In-the-Wild Hair Segmentation and Color Classiﬁcation

cal artiﬁcial neural network in order to determine the

hair color. To train and test the proposed algorithm,

we annotated more than 4000 images from an exis-

ting database with the hair color.

The experiments we performed and the reported

results (a hair segmentation accuracy of 91.61% and

a hair color classiﬁcation accuracy of 89.6% demon-

strate the effectiveness of the proposed solution.

As a future work, we plan to add more classes to

the hair tone taxonomy in order to be able to also

recognize un-natural, dyed hair colors: blue, violet,

green or hair with highligths.

REFERENCES

Asgari, A. and Sinclair, R. (2011). Male pattern androgene-

tic alopecia.

Huang, G. B., Ramesh, M., Berg, T., and Learned-Miller, E.

(2007). Labeled faces in the wild: A database for stu-

dying face recognition in unconstrained environments.

Technical Report 07-49, University of Massachusetts,

Amherst.

Ioffe, S. and Szegedy, C. (2015). Batch normalization:

Accelerating deep network training by reducing inter-

nal covariate shift. arXiv preprint arXiv:1502.03167.

Julian, P., Dehais, C., Lauze, F., Charvillat, V., Bartoli,

A., and Choukroun, A. (2010). Automatic hair de-

tection in the wild. In Pattern Recognition (ICPR),

2010 20th International Conference on, pages 4617–

4620. IEEE.

Kae, A., Sohn, K., Lee, H., and Learned-Miller, E. (2013).

Augmenting CRFs with Boltzmann machine shape

priors for image labeling. In 2013 IEEE Conference

on Computer Vision and Pattern Recognition, pages

2019–2026.

King, D. E. (2009). Dlib-ml: A machine learning

toolkit. Journal of Machine Learning Research,

10(Jul):1755–1758.

Krupka, A., Prinosil, J., Riha, K., Minar, J., and Dutta, M.

(2014). Hair segmentation for color estimation in sur-

veillance systems. In Proc. 6th Int. Conf. Adv. Multi-

media, pages 102–107.

Liu, Z., Luo, P., Wang, X., and Tang, X. (2015). Deep lear-

ning face attributes in the wild. In Proceedings of In-

ternational Conference on Computer Vision (ICCV).

Long, J., Shelhamer, E., and Darrell, T. (2015). Fully con-

volutional networks for semantic segmentation. In

Proceedings of the IEEE conference on computer vi-

sion and pattern recognition, pages 3431–3440.

Muhammad, U. R., Svanera, M., Leonardi, R., and Benini,

S. (2018). Hair detection, segmentation, and hairstyle

classiﬁcation in the wild. Image and Vision Compu-

ting, 71:25–37.

Prinosil, J., Krupka, A., Riha, K., Dutta, M. K., and Singh,

A. (2015). Automatic hair color de-identiﬁcation. In

Green Computing and Internet of Things (ICGCIoT),

2015 International Conference on, pages 732–736.

IEEE.

Proenc¸a, H. and Neves, J. C. (2017). Soft biometrics:

Globally coherent solutions for hair segmentation and

style recognition based on hierarchical mrfs. IEEE

Transactions on Information Forensics and Security,

12(7):1637–1645.

Ronneberger, O., Fischer, P., and Brox, T. (2015). U-net:

Convolutional networks for biomedical image seg-

mentation. In International Conference on Medical

image computing and computer-assisted intervention,

pages 234–241. Springer.

Rousset, C. and Coulon, P.-Y. (2008). Frequential and color

analysis for hair mask segmentation. In Image Proces-

sing, 2008. ICIP 2008. 15th IEEE International Con-

ference on, pages 2276–2279. IEEE.

Sarraf, S. (2016). Hair color classiﬁcation in face recog-

nition using machine learning algorithms. American

Scientiﬁc Research Journal for Engineering, Techno-

logy, and Sciences (ASRJETS), 26(3):317–334.

Shen, Y., Peng, Z., and Zhang, Y. (2014). Image based hair

segmentation algorithm for the application of automa-

tic facial caricature synthesis. The Scientiﬁc World

Journal, 2014.

Sinha, P., Balas, B., Ostrovsky, Y., and Russell, R. (2006).

Face recognition by humans: Nineteen results all

computer vision researchers should know about. Pro-

ceedings of the IEEE, 94(11):1948–1962.

Sinha, P. and Poggio, T. (2002). ’united’we stand. Percep-

tion, 31(1):133.

Yacoob, Y. and Davis, L. (2005). Detection, analysis and

matching of hair. In Computer Vision, 2005. ICCV

2005. Tenth IEEE International Conference on, vo-

lume 1, pages 741–748. IEEE.

VISAPP 2019 - 14th International Conference on Computer Vision Theory and Applications