Overcome Ethnic Discrimination with Unbiased Machine Learning for

Facial Data Sets

∗

Michael Danner

1,† a

, Bakir Had

2,†

, Robert Radloff

, Xueping Su

3 b

, Leping Peng

Thomas Weber

and Matthias R

atsch

2 c

CVSSP, University of Surrey, Guildford, U.K.

ViSiR, Reutlingen University, Germany

School of Electronics and Information, Xi’an Polytechnic University, China

Hunan University of Science and Technology, China

Keywords:

Unbiased Machine Learning, Fairness, Trustworthy AI, Acceptance Research, Debiasing Training Data,

Facial Data Sets, AI-Acceptance Analysis.

Abstract:

AI-based prediction and recommender systems are widely used in various industry sectors. However, general

acceptance of AI-enabled systems is still widely uninvestigated. Therefore, ﬁrstly we conducted a survey with

559 respondents. Findings suggested that AI-enabled systems should be fair, transparent, consider person-

ality traits and perform tasks efﬁciently. Secondly, we developed a system for the Facial Beauty Prediction

(FBP) benchmark that automatically evaluates facial attractiveness. As our previous experiments have proven,

these results are usually highly correlated with human ratings. Consequently they also reﬂect human bias

in annotations. An upcoming challenge for scientists is to provide training data and AI algorithms that can

withstand distorted information. In this work, we introduce AntiDiscriminationNet (ADN), a superior attrac-

tiveness prediction network. We propose a new method to generate an unbiased convolutional neural network

(CNN) to improve the fairness of machine learning in facial dataset. To train unbiased networks we generate

synthetic images and weight training data for anti-discrimination assessments towards different ethnicities.

Additionally, we introduce an approach with entropy penalty terms to reduce the bias of our CNN. Our re-

search provides insights in how to train and build fair machine learning models for facial image analysis by

minimising implicit biases. Our AntiDiscriminationNet ﬁnally outperforms all competitors in the FBP bench-

mark by achieving a Pearson correlation coefﬁcient of PCC = 0.9601.

1 INTRODUCTION

In recent years, the use of artiﬁcial intelligence has

proven to solve a wide spectrum of technical prob-

lems. Especially in the high-tech sector and in knowl-

edge intensive industries, machines and intelligent al-

gorithms turned from clunky tools to sophisticated

systems performing various complex tasks today (Ar-

slan et al., 2021). In today’s global war of talents,

companies are hunting for the best employees with

speciﬁc requirements of skills and personal traits to

achieve competitive advantage in their ﬁeld (Grant,

1991). In this context, a wide range of research has

https://orcid.org/0000-0002-8652-6905

https://orcid.org/0000-0003-1306-8453

https://orcid.org/0000-0002-8254-8293

†

Both authors contributed equally

∗

This work is partially supported by a grant of the

BMWi ZIM-FuE programs, no. KK5007201LB0.

been conducted to understand the evolutionary basis

of beauty and determine the bias of attractiveness in

the job hiring process (Little et al., 2011; Chiang and

Saw, 2018). Companies desire an efﬁcient and objec-

tive recruitment process with the preferred outcome

of ﬁnding the best job candidates and stay compliant

with regulations and ethical aspects. Artiﬁcial intelli-

gence has the potential to support these goals by min-

imising the risk of bias in decision making in order

to be a relevant and trustworthy partner for humans in

the future.

1.1 Motivation

In 2016 Beauty.AI, a Hong-Kong based technology

company, hosted the ﬁrst international beauty contest

judged by artiﬁcial intelligence (beauty.ai, 2016) but

the results were heavily biased, for example, against

dark-skinned subjects (Levin, 2016). ”Machine learn-

ing models are prone to biased decisions, due to bi-

464

Danner, M., Hadži

c, B., Radloff, R., Su, X., Peng, L., Weber, T. and Rätsch, M.

Overcome Ethnic Discrimination with Unbiased Machine Learning for Facial Data Sets.

DOI: 10.5220/0011624900003417

In Proceedings of the 18th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP 2023) - Volume 5: VISAPP, pages

464-471

ISBN: 978-989-758-634-7; ISSN: 2184-4321

 2023 by SCITEPRESS – Science and Technology Publications, Lda. Under CC license (CC BY-NC-ND 4.0)

ases in data-sets” (Sharma et al., 2020). Biased train-

ing data potentially leads to discriminatory models, as

the data sets are created by humans or derived from

human activities in the past, for example hiring algo-

rithms (Bogen, 2019). The purpose of Facial Beauty

Prediction (FBP) research is to classify images mim-

icking subjective human judgements. Investigations

related to machine perception in a ground-truth free

setting show that the data source depends on the mea-

surement of human perception (Prijatelj et al., 2020).

Therefore, artiﬁcial networks need a process to deter-

mine labels of the average person’s judgement. Our

data analysis has already proven that people consider

their own ethnicity to be more attractive than oth-

ers (Gerlach et al., 2020). With this tendency, it be-

comes difﬁcult to generate input data to train a ma-

chine learning algorithm, which assesses attractive-

ness without bias. This is a highly relevant topic

in machine learning, which has a technical compo-

nent to solve and trigger ethical debates about dis-

crimination. To tackle this issue, we used our re-

cently published balanced training data set (Gerlach

et al., 2020). with minimal bias between European

and Asian aesthetic predictions from a convolutional

neural network (CNN). Based on this training data,

we created a model capable to achieve an equally dis-

tributed performance over all classes for those differ-

ent ethnicities. For the ﬁrst time, we could prove, that

this resampled and balanced training data leads to a

debiased AI model for a fair facial aesthetic prediction

among different ethnicities. The main contribution of

this work is the novel approach towards an unbiased

machine learning among different ethnicities to build

a fair and trustworthy AI model, by applying a mixed

data set, which consists of real images together with

synthetic data.

1.2 Acceptance of AI

AI-based technologies are promising tools to opti-

mise the process of recruiting, assisting recruiters in

their routine work and thus increasing the efﬁciency

of the whole recruiting process(Ahmed, 2018; Reilly,

2018). However, the extent of willingness to accept

and use AI-enabled recruiting among the actual ap-

plicants is widely uninvestigated until now (Laurim

et al., 2021). Consequently, one of the goals of the

present study was to examine acceptance factors of AI

use in recruiting process among university students.

A total of 559 participants (49% female) completed

the online survey. The results showed that AI accep-

tance is dependent on the following: a contact person

should be available (91%), process should be trans-

parent (86%) and ensure data protection (83%). Even

though the minority (35%) of participants is gener-

ally worried about AI utilisation, however, only 52

% stated that they would support AI-enabled recruit-

ing procedures (Schlick and Reich, 2021). Our survey

demonstrated that the acceptance of AI-enabled sys-

tems heavily depends on the features of the provided

AI. The AI system should make fair decisions that are

transparent and consider individual personality traits

while keeping their data safe and protected. Potential

sources of worries and doubts regarding the AI among

the applicants must be adequately addressed in future

studies. First prerequisite for usage of AI systems is

that they are accepted by the users. Therefore, one of

the central goals of the following study is to design a

fair and unbiased AI.

2 STATE OF THE ART

While research on the estimation of images or por-

traits is not a new trend, it has gained increasing

attention since the emergence of artiﬁcial intelli-

gence (Zhang and Kreiman, 2021). Although, AI

is undoubtedly the best solution for many applica-

tions like autonomous driving or image classiﬁcation,

applications that are affected by unconscious bias,

like beauty prediction (Dornaika et al., 2020), tend

to reﬂect a bias that is likely to be prevalent within

given data sets. Especially, when people’s subjec-

tive preferences play a role, such as in attractiveness

judgement (Shank and DeSanti, 2018) or human re-

source evaluation (Lloyd, 2018), bias is almost cer-

tain to happen. (Carrera, 2020) conducted research

on the implication of racism in image databases that

analysed the association of aggressiveness, kindness,

beauty and ugliness with different images and found

that the decisions of many people are affected by sub-

conscious racism. Since researchers are aware of such

effects, they found different ways to reduce subcon-

scious bias in machine learning. If the problem orig-

inates from the given databases - either the databases

or the training needs to be changed.

3 BIASED AI

3.1 Bias from Human Indications

Our latest data set included 12,034 images of peo-

ple from different social and ethnic backgrounds that

we collected during the period of eight years long pe-

riod working on this topic. One part of the data was

collected from the students of our partner university

Overcome Ethnic Discrimination with Unbiased Machine Learning for Facial Data Sets

465

Figure 1: AntiDiscriminationNet is trained on annotations

from students at a German and at a Chinese university. The

trained network follows the bias from the annotations.

in China. This number of images provided 5.4 mil-

lion annotations where different annotators evaluated

the attractiveness of persons in the presented picture.

Our aim in that study (Gerlach et al., 2020) was to

empirically test if data manifest any implicit bias re-

garding ethnicity or some other relevant characteris-

tics of evaluated pictures. Based on the literature re-

search that we conducted, we hypothesised that the

evaluated attractiveness of facial pictures in the Asia-

Europe data set by annotators in China and Germany

is implicitly biased. More precisely, we assumed that

annotators are evaluating more attractive images of

persons that are having the same ethnic background

as them. To test this hypothesis, we separated results

from annotators, divided them based on their racial

background, and then compared their results. Ger-

mans represented European annotators, while Chi-

nese represented Asian annotators. The results of the

conducted analysis indicated that our hypothesis was

supported. As we can see in Figure 1. European an-

notators evaluated European faces as more attractive,

while Asian evaluated Asian faces as more attractive.

3.2 Artiﬁcial Intelligence as Facial

Aesthetics Predictor

Current state-of-the-art results of Facial Aesthetics

Predictor system are presented in this subchapter. Af-

terwards, we introduced our AntiDiscriminationNet

Predictor. For the prediction of facial aesthetics, we

used convolutional neural networks (CNN).

Related Work. With the introduction of CNNs

and large-scale image repositories, facial image and

video tasks get more powerful (Krizhevsky et al.,

2017; Zeiler and Fergus, 2013; Deng et al., 2009).

Xie et al. (Xie et al., 2015) present the SCUT-FBP500

dataset, containing 5500 subjects with attractiveness

ratings. Since ”FBP is a multi-paradigm computa-

tion problem” the successor SCUT-FBP5500 (Liang

et al., 2018) is introduced in 2018, including an in-

creased database of 5500 frontal faces with multi-

ple attributes: male/female, Asian/Caucasian, age and

beauty score. Liang et al. (2018) have evaluated

their database ”using different combinations of fea-

ture and predictor, and various deep learning meth-

ods” on AlexNet (Krizhevsky et al., 2017), ResNet-

18 (He et al., 2015) and ResNeXt-50 and achieved the

Pearson correlation coefﬁcient PCC = 0.8777; with p

value being statistically signiﬁcant at p < 0.01, mean

average error MAE = 0.2518; root-mean-square er-

ror RMSE = 0.3325 as a benchmark. In summary, it

can be said that all deep CNN models are superior to

the shallow predictor with the hand-crafted geometric

feature or appearance feature (Liang et al., 2018).

Benchmark Data Set. The SCUT-FBP 5500 data set is

a small data set for deep learning tasks. Therefore, it

is an even greater challenge to train soft features like

aesthetic or beauty. In order to measure the accuracy

of the network and to be comparable to recent exper-

iments in facial beauty prediction, we calculated the

Pearson correlation coefﬁcient, mean absolute error

(MAE) and root mean square error (RMSE).

Figure 2: The architecture of AntiDiscriminationNet is

based on the VGG Face architecture and is expanded by

two separate skip connections. At the end, the predictions

of the differently convoluted feature vectors are merged.

AntiDiscriminationNet Predictor Architecture. The

VGG Face architecture (Simonyan and Zisserman,

2015) was the basis of our AntiDiscriminationNet.

Inspired by an idea of the paper from (Shelhamer

et al., 2017) we then added modiﬁcations to the net-

work by exploiting feature maps from the third and

fourth convolution blocks. Since the size of the fea-

ture maps differed from the size of the resulting fea-

ture vector, we implemented an additional max pool-

ing layer to achieve the wanted output. For the pre-

dictions of the network, we concatenated the softmax

results into a single feature vector as shown in Fig-

ure 2. Our proposed network achieved the Pearson

correlation coefﬁcient PCC = 0.9601; with p-value

being statistically signiﬁcant at p < 0.01, which in-

dicated an almost linear correspondence between an-

notations and predictions. Our training results have

very high accuracy and outperform state-of-the-art re-

VISAPP 2023 - 18th International Conference on Computer Vision Theory and Applications

466

sults. More detailed results, and comparisons with

other networks are presented in our previous work

(Danner et al., 2021). The normalised mean square er-

ror was nMSE = 3.896 and the normalised root mean

square error was nRMSE = 5.580. These are mea-

surements of the average error of the predicted labels,

which were used to evaluate the accuracy of the net-

work. The results are normalised because there are

different data sets with different score ranges.

Re-annotation of SCUT-FBP5500 Data Set. Since

2013, for our study of facial aesthetics, we conducted

online surveys on multiple image data sets where

thousands of students and their relatives participated

(Gerlach et al., 2020). With this process we have been

able to gather enough data to train a convolutional

neural network with the goal to improve facial beauty

prediction. During training convolutional neural net-

works (CNN) on this data, we recognised a large bias

in this data. This led us to evaluate the annotations

from Chinese and German universities and take a

closer look at the bias. We conducted statistical anal-

ysis to see if the trained network would also produce

bias during aesthetic prediction. Results of mentioned

former paper (Gerlach et al., 2020) indicated a statis-

tically signiﬁcant difference between CNN aesthetic

prediction based on Chinese and German annotations.

Results are presented in Figure 1. When based on

Chinese annotations, CNN predicted a higher aes-

thetic score for Chinese annotations. Based on Ger-

man annotations, CNN predicted a higher aesthetic

score for German annotations. These results empir-

ically revealed that the trained network reﬂected the

same bias as human annotators. After this revelation,

our next step was to train unbiased AI.

4 TRAINING OF UNBIASED AI

In general, there are three main paths to reach the

goal of unbiased predictions: fair pre-processing, fair

in-processing, and fair post-processing (Bellamy and

et. al., 2018). Within this paper, we present two ap-

proaches based on those paths to train an unbiased

network with biased data for FBP. The ﬁrst approach

relies on data pre-processing before training to intro-

duce fairness - we call it ”balanced training”. The

second approach relies on a categorical cross entropy

loss function, for the network to learn the bias and

decrease it. Those processes are explained in the fol-

lowing sections.

4.1 Data Set and GAN-Images

We analysed the data that we gathered with our Anal-

ysis Toolbox and could measure a signiﬁcant bias

within the prediction of aesthetics through different

ethnicities. Therefore, training a network with the

goal to create unbiased results is still a challenge in

deep learning tasks. In the following we will ﬁrst de-

scribe our data set blend and the accompanying Anal-

ysis Toolbox and we explain how we used a GAN to

create artiﬁcial portraits with European and Asian eth-

nicities.

Starting in 2017, we used the Asian-European

data set SCUT-FBP (Xie et al., 2015; Liang et al.,

2018) to evaluate biased annotations from Chinese

and German universities. Since the SCUT-FBP 5500

dataset is a small dataset for deep learning tasks, we

used data augmentation methods to enlarge the sam-

ple size of the training set by generating GAN im-

ages with either Asian or European or mixed images

as input and new synthesised images as output. This

augmentation method proves superior to geometric

transformations like cropping and rotating. All im-

ages are pre-processed, by normalisation methods to

harmonise face pose, facial landmark positions and

image size.

For the purpose of a detailed analysis, we blended

multiple data sets in the domain of facial aesthetics to-

gether. In total, this data set included 12,034 portrait

images from persons of different ethnicities with indi-

vidual social backgrounds. These images are labelled

and annotated in surveys over a period of eight years

with a total number of 5.4 million annotations. Beside

that, recently we added the FairFace (K

arkk

ainen and

Joo, 2019) database, which includes male and female

portraits of seven different ethnic groups. The synthe-

sised Eurasian images are artiﬁcially generated with

StarGAN v2 (Choi et al., 2020) to determine the in-

ﬂuence of the biased view of annotators on aesthetics

of persons from different ethnicities.

After annotating the data set, the unconscious

bias in the annotations can be uncovered. Figure 3

shows the biased average score of our networks on

the SCUT-FBP data set and the Eurasian data set. Fig-

ure 4 illustrates the analysis on the distribution of aes-

thetic score and age for Asians, Europeans and three

mixed-racial subgroups. The different group annota-

tion points are displayed in different colours. We cal-

culated the following metrics for each group cluster

i: Horizontal dashed lines are average attractiveness

values a

. Vertical dashed lines are average age values

. As can be seen, the interval of a

has a small span,

however the interval of y

has a signiﬁcantly larger

span. Each a

and y

values intersection point forms

Overcome Ethnic Discrimination with Unbiased Machine Learning for Facial Data Sets

467

Figure 3: Unconscious bias towards ethnic aesthetic of ei-

ther German or Chinese annotators. Left: average aesthetic

score on SCUT-FBP by German annotators, middle: aver-

age aesthetic score labelled by Chinese students, right: aes-

thetic scores on the Eurasian data set annotated by German

students.

Figure 4: Biased correlation between attractiveness, age and

ethnicity by German annotators. In an ethical, fair network

the attractiveness for equal age groups would be the same.

This would be represented in the ﬁgure by the same height

of the lines for equal age groups.

an per group attractiveness-age-factor AAF

= a

In a fair machine, these AAF

points would be closer

together, as the y

span is small. This idea is further

elaborated in Section 4.

4.2 Training and Data Pre-Processing

In our ﬁrst approach of training the network we ap-

plied pre-processing and resampling to the input data,

which is explained in the following paragraphs.

This paper proposes a way to create a fair net-

work with this biased data. Therefore, the bias must

be identiﬁed in the ground truth labels of the data set

and divided into two subsets. The ﬁrst subset (Ger-

man annotations) conﬁrms and increases the existing

bias whereas the second subset (Chinese annotations)

consists of the contrary prejudices (annotation bias).

Afterwards, a GAN then generates synthetic images,

which are a gradation of the mixture of the ﬁrst and

second subset. This selection bias leads to the best

balanced results of the generated images. This knowl-

Figure 5: Correlation of the bias over the ratio of German

and Chinese annotations. The least bias here is at the ratio

of 1.9.

edge can then be applied back to the original data set.

In our training process, we have a clear bias in the

annotations, as shown in Figure 1 and measured in

the analysis of the data. If we train our network based

only on these labels, it follows the data and replicates

the bias from the annotations. In the next training,

we added the annotations from the Chinese and Ger-

man annotators and trained the network on an equal

distribution of those annotations (Ratio: 1.0). The av-

erage aesthetic rating of European and Asian faces is

still biased, however not as strong as in the previous

experiment.

In this experiment, balancing the training data

meant ﬁnding the minimum by concatenating the Ger-

man annotated subset g with the weighted ω Chinese

annotated subset c. The goal in this approach was to

level the average aesthetic scores g and c for the gen-

erated predictions g

and c

. The network bias B is

then deﬁned by

B =

2n + 1

∑

i=0

g − g

+ ω

c − c

. (1)

Starting from a ratio of 1:1, in which German and Chi-

nese annotations are distributed equally, we gradually

increased the weight of the Chinese annotations. In

our experiment we varied the ratio from 2:1 to 1:3.2

for German annotations to Chinese annotations. Each

training step and the corresponding bias over the ratio

is shown in Figure 5. Determining the minimum in

Figure 5 is equal to ﬁnding the least biased network.

It is visible that a ratio of 1:1.9 produced the least bi-

ased network for this experiment and its results are

shown in Figure 6. This means the Chinese annota-

tions are weighted nearly double the amount than the

European annotations. Limitations of this approach

are that information about the structure of the under-

lying latent features are unknown and balancing the

network requires a lot of time and work. Therefore,

VISAPP 2023 - 18th International Conference on Computer Vision Theory and Applications

468

Figure 6: CNN aesthetic prediction with equalised distribution of training data. The charts on the left side show the prediction

of the network if it is only trained on Chinese or German annotations. On the right side, the prediction of the network, which

was trained on the biased data is shown. All bars have more or less the same height and only differ minimally. This means,

that we could eliminate most of the bias in the training data, by balancing and we can assume that this trained network is fair.

we additionally propose another approach, described

in the following section.

4.3 Debiasing Neural Network

4.3.1 Training Network Features

To achieve the ﬁrst results on unbiased aesthetic es-

timation, we used the existing VGG-Face framework

in Keras with TensorFlow and adjusted it. The net-

work consists of 11 blocks, each containing a linear

operator and followed by one or more non-linearities

such as ReLU and max pooling (Parkhi et al., 2015).

We applied transfer learning here and used the pre-

trained model for Face Recognition (Parkhi et al.,

2015). Building up on the face recognition, attractive-

ness estimation is similar to age estimation (Gyawali

et al., 2020) performed by observing the facial fea-

tures from portraits. The convolutional layers in the

network are followed by a rectiﬁcation layer (ReLu)

as in (Krizhevsky et al., 2017). We used the Adam

optimiser (Kingma and Ba, 2017). The input to our

network is a face image of the size 256×256×3, and

it uses Zero-Padding around the edges, to ensure that

the image information on the edge is not lost. Our in-

put data is split into 60% train and 40% test data. The

convolutional layers parameters of VGG-Face are not

changed and kept frozen during the training. We used

a dropout of 50%, and as it is a regression problem

our ﬁnal layer had to be the size of 1. To classify the

aesthetic score, a softmax activation function is used

in the ﬁnal layer. As a loss metric, we used the mean

squared error and to compare our networks we also

calculated the Pearson correlation and the root mean

squared error.

4.3.2 Balanced Training

The process and the effect of the ratio on the average

aesthetic rating is shown in Figure 5. By modifying

the ratio of the annotations, a minimum is determined

that illustrates the lowest difference between the aver-

age aesthetic prediction of Asian and European faces.

This represents a speciﬁc loss function for our net-

work that maps bias onto measurable values. To re-

move bias from our network, we calculated the differ-

ence between European and Asian aesthetic predic-

tions and found the global minimum. The minimum

of the average aesthetic score between Asian and Eu-

ropean faces is located at a ratio of 1:1.9 where the av-

erage aesthetic score differs by about 5%. We created

a model with a fair performance over all classes of dif-

ferent ethnicities as shown in Figure 6. This proves,

that by re-sampling and balancing the training data a

less biased AI can be created. This process created a

less-biased AI in FBP tasks.

Our results are displayed in Figure 6 where all bar

charts have a similar height and the FBP score is con-

siderably less biased. Not all bars have the exact same

height, this is due to some background noise. Real

world data usually contains noise which affects tasks

such as classiﬁcation in machine learning (Gupta and

Gupta, 2019). This noise also affects our aesthetic

prediction. However, with those minor differences,

we can consider our network as unbiased and there-

fore fair. As we used a factor-based approach to mul-

tiply the annotation data, this noise is present over all

ratios. Only the difference of the averages increases

or decreases within the variations of the ratio.

Overcome Ethnic Discrimination with Unbiased Machine Learning for Facial Data Sets

469

4.3.3 Removing Bias Using Clustered Labels

A more sophisticated approach in getting rid of the

bias in training data is our second approach. Within

this approach we are developing a new method to re-

duce the bias in the training data. This method con-

sists of a deep learning network that is trained on the

original learning task within the data set, and then

minimises the bias inside the learned latent distribu-

tions using a specially adapted loss function.

Each data record contains a list of labels a =

,..., a

, which are to be debiased, and a further list

of labels b = b

,..., b

. In this example we remove

the bias from the ethnical label a

and preserve the

age, profession, hair colour and skin complexion la-

bels. The network evaluates all attributes of the data

set during the training and groups all objects accord-

ing to the attributes b in clusters.

Within each subgroup the difference between the

ethnical mean value a

represents the bias. A non-

linear operation, similar to the gamma correction in

image systems, is then applied to the ethnic label to

preserve the range of the values and bring the dif-

ferences closer together. These differences for all

clusters are the measure of the loss function, which

is implemented as categorical cross entropy loss and

should be minimised during training. With this we

present a universally adaptable method to make any

network fairer according to given labels.

5 FURTHER ETHICAL

CONSIDERATIONS

Societal beneﬁts can arise from aesthetic prediction

when trustworthy AI models are used. We presented

a method to eliminate bias in facial attractiveness pre-

diction and this method can be transferred to other

similar networks and use cases. For example, in

the future, a possible implication of our unbiased

AntiDiscriminationNet is supporting the recruitment

processes or plastic aesthetic surgeries in the medical

domain. Applying those AI models needs to be dis-

cussed in the light of a beneﬁt and risk assessment.

The implementation of machine learning models in

various future tasks must be accompanied by effective

measures, long-term ethical considerations and trans-

parency. Training of an unbiased model on biased

data will be a constant challenge in machine learn-

ing, especially in the ﬁeld of aesthetic judgement and

other ﬁelds where underrepresented ethnic groups are

common. Pragmatical regulations and an open-source

mindset will reduce the implication that algorithms

may become a major threat of discrimination on a

level of gender, sex, and ethnicity. Applying a debi-

ased prediction model like AntiDiscriminationNet is a

starting point for future research and continuous eth-

ical evaluation. The implication and future research

in this domain are driven by the adoption of AI mod-

els which allow us to screen facial images in a high

throughput manner when needed and within the reg-

ulations. Beauty and aesthetic attractiveness predic-

tions raise ethical questions and concerns. This de-

bate needs to be held on an ongoing and open basis,

ideally with a diverse group of stakeholders. Ulti-

mately, the well-known phrase ‘beauty is in the eye

of the beholder’ will stay as a universal rule for ma-

chines and humans in attractiveness judgement.

6 CONCLUSION

In this research we used experimental methods to sys-

tematically demonstrate how human implicit bias af-

fects the decision-making of artiﬁcial intelligence and

found a way to eliminate the implicit bias of artiﬁ-

cial intelligence. Additionally, we improved the fair-

ness of the prediction towards an equally distributed

prediction between different ethnicities. Moreover,

the Pearson correlation coefﬁcient of PCC = 0.9601,

which denotes a nearly linear correspondence be-

tween annotations and predictions, was reached by

our trained network. Our training results are more

accurate than those obtained in recent studies in this

area. In summary, two main contributions of this pa-

per are AntiDiscriminationNet for facial image anal-

ysis and a new approach towards bias-free machine

learning models. Bias-free decision making is a chal-

lenging problem in machine learning tasks, yet it

yields the great potential to be one of the most signiﬁ-

cant strengths of AI. Future work on this topic should

focus on scaling our approach on larger, more diverse

data sets and in other use cases.

REFERENCES

Ahmed, O. (2018). Artiﬁcial intelligence in hr. Interna-

tional Journal of Research and Analytical Reviews,

5(4):971–978.

Arslan, A., Ruman, A., Naughton, S., and Tarba, S. Y.

(2021). Human dynamics of automation and digitali-

sation of economies: Discussion on the challenges and

opportunities. In The Palgrave handbook of corpo-

rate sustainability in the digital era, pages 613–629.

Springer.

beauty.ai (2016). The First International Beauty Contest

Judged by Artiﬁcial Intelligence.

Bellamy, R. K. E. and et. al. (2018). AI Fairness

VISAPP 2023 - 18th International Conference on Computer Vision Theory and Applications

470

360: An Extensible Toolkit for Detecting, Under-

standing, and Mitigating Unwanted Algorithmic Bias.

arXiv:1810.01943 [cs]. arXiv: 1810.01943.

Bogen, M. (2019). All the Ways Hiring Algorithms Can

Introduce Bias. Harvard Business Review. Section:

Hiring.

Carrera, F. (2020). Race and gender of aesthetics and

affections: algorithmization of racism and sexism

in contemporary digital image databases. Matrizes,

14(2):217–240.

Chiang, C. and Saw, Y. (2018). Do good looks matter when

applying for jobs in the hospitality industry? Interna-

tional Journal of Hospitality Management.

Choi, Y., Uh, Y., Yoo, J., and Ha, J.-W. (2020). StarGAN

v2: Diverse Image Synthesis for Multiple Domains.

arXiv:1912.01865 [cs]. arXiv: 1912.01865.

Danner, M., Weber, T., Peng, L. P., Gerlach, T., Su, X.,

and R

atsch, M. (2021). Ethically aligned deep learn-

ing: Unbiased facial aesthetic prediction. CoRR,

abs/2111.05149.

Deng, J., Dong, W., Socher, R., Li, L., Kai Li, and Li

Fei-Fei (2009). ImageNet: A large-scale hierarchical

image database. In 2009 IEEE Conference on Com-

puter Vision and Pattern Recognition, pages 248–255.

ISSN: 1063-6919.

Dornaika, F., Moujahid, A., Wang, K., and Feng, X. (2020).

Efﬁcient deep discriminant embedding: Application

to face beauty prediction and classiﬁcation. Engineer-

ing Applications of Artiﬁcial Intelligence, 95:103831.

Gerlach, T., Danner, M., Peng, L., Kaminickas, A., Fei, W.,

and R

atsch, M. (2020). Who Loves Virtue as much

as He Loves Beauty?: Deep Learning based Estimator

for Aesthetics of Portraits:. In Proceedings of the 15th

International Joint Conference on Computer Vision,

Imaging and Computer Graphics Theory and Applica-

tions, pages 521–528, Valletta, Malta. SCITEPRESS

- Science and Technology Publications.

Grant, R. M. (1991). The resource-based theory of compet-

itive advantage: Implications for strategy formulation.

California Management Review.

Gupta, S. and Gupta, A. (2019). Dealing with Noise Prob-

lem in Machine Learning Data-sets: A Systematic Re-

view. Procedia Computer Science, 161:466–474.

Gyawali, D., Pokharel, P., Chauhan, A., and Shakya, S. C.

(2020). Age Range Estimation Using MTCNN and

VGG-Face Model. In 2020 11th International Confer-

ence on Computing, Communication and Networking

Technologies (ICCCNT), pages 1–6.

He, K., Zhang, X., Ren, S., and Sun, J. (2015). Deep

residual learning for image recognition. CoRR,

abs/1512.03385.

Kingma, D. P. and Ba, J. (2017). Adam: A Method

for Stochastic Optimization. arXiv:1412.6980 [cs].

arXiv: 1412.6980.

Krizhevsky, A., Sutskever, I., and Hinton, G. E. (2017). Im-

ageNet classiﬁcation with deep convolutional neural

networks. Communications of the ACM, 60(6):84–90.

arkk

ainen, K. and Joo, J. (2019). FairFace: Face At-

tribute Dataset for Balanced Race, Gender, and Age.

arXiv:1908.04913 [cs]. arXiv: 1908.04913.

Laurim, V., Arpaci, S., Prommegger, B., and Krcmar, H.

(2021). Computer, whom should i hire?–acceptance

criteria for artiﬁcial intelligence in the recruitment

process. In Proceedings of the 54th Hawaii Interna-

tional Conference on System Sciences, page 5495.

Levin, S. (2016). A beauty contest was judged by AI and

the robots didn’t like dark skin. Section: Technology.

Liang, L., Lin, L., Jin, L., Xie, D., and Li, M.

(2018). SCUT-FBP5500: A Diverse Benchmark

Dataset for Multi-Paradigm Facial Beauty Prediction.

arXiv:1801.06345 [cs]. arXiv: 1801.06345.

Little, A. C., Jones, B. C., and DeBruine, L. M. (2011).

Facial attractiveness: evolutionary based research.

Philosophical Transactions of the Royal Society B: Bi-

ological Sciences.

Lloyd, K. (2018). Bias ampliﬁcation in artiﬁcial intelli-

gence systems.

Parkhi, O. M., Vedaldi, A., and Zisserman, A. (2015).

Deep Face Recognition. In Procedings of the British

Machine Vision Conference 2015, pages 41.1–41.12,

Swansea. British Machine Vision Association.

Prijatelj, D. S., McCurrie, M., and Scheirer, W. J. (2020).

A Bayesian Evaluation Framework for Ground Truth-

Free Visual Recognition Tasks. arXiv:2007.06711 [cs,

stat]. arXiv: 2007.06711.

Reilly, P. (2018). The impact of artiﬁcial intelligence on the

hr function.

Schlick, A. M. and Reich, K. (2021). Vereinbarkeit von

akzeptanzfaktoren beim einsatz von ki in der bewer-

berauswahl. Master Thesis, Reutlingen University.

Shank, D. B. and DeSanti, A. (2018). Attributions of moral-

ity and mind to artiﬁcial intelligence after real-world

moral violations. Computers in Human Behavior,

86:401–411.

Sharma, S., Zhang, Y., R

ıos Aliaga, J. M., Bouneffouf, D.,

Muthusamy, V., and Varshney, K. R. (2020). Data

Augmentation for Discrimination Prevention and Bias

Disambiguation. In Proceedings of the AAAI/ACM

Conference on AI, Ethics, and Society, AIES ’20,

pages 358–364, New York, NY, USA. Association for

Computing Machinery.

Shelhamer, E., Long, J., and Darrell, T. (2017). Fully con-

volutional networks for semantic segmentation. IEEE

Trans. Pattern Anal. Mach. Intell., 39(4):640–651.

Simonyan, K. and Zisserman, A. (2015). Very deep con-

volutional networks for large-scale image recognition.

In Bengio, Y. and LeCun, Y., editors, 3rd Interna-

tional Conference on Learning Representations, ICLR

2015, San Diego, CA, USA, May 7-9, 2015, Confer-

ence Track Proceedings.

Xie, D., Liang, L., Jin, L., Xu, J., and Li, M. (2015). SCUT-

FBP: A benchmark dataset for facial beauty percep-

tion. In 2015 IEEE International Conference on Sys-

tems, Man, and Cybernetics, Kowloon Tong, Hong

Kong, October 9-12, 2015, pages 1821–1826. IEEE.

Zeiler, M. D. and Fergus, R. (2013). Visualizing and Under-

standing Convolutional Networks. arXiv:1311.2901

[cs]. arXiv: 1311.2901.

Zhang, M. and Kreiman, G. (2021). Beauty is in the eye of

the machine. In Nat Hum Behav 5, 675–676 ().

Overcome Ethnic Discrimination with Unbiased Machine Learning for Facial Data Sets

471