Automated Deep Learning Based Answer Generation to Psychometric

Questionnaire: Mimicking Personality Traits

Anirban Lahiri

, Shivam Raj

, Utanko Mitra

, Sunreeta Sen

, Rajlakshmi Guha

, Pabitra Mitra

Partha P. Chakrabarti

and Anupam Basu

Kainos Ltd., London, U.K.

Indian Institute of Technology, Kharagpur, India

Arndit Ltd., Cambridge, U.K.

Keywords:

Deep Learning, Natural Language Processing, Psychometric Testing, Personality Trait Emulation.

Abstract:

Questionnaires have traditionally been used for psychometric testing and evaluation of personality traits. This

work explores if personality traits or characteristics can be emulated by a computer through the responses to a

questionnaire. A state-of-art Deep Learning model using natural language processing techniques coupled to a

personality prediction model has been exploited. A standard OCEAN – Five-Factor evaluation questionnaire

was used as the test bench for this novel study combining psychometry and machine learning. This article ex-

plains the design details of the emulation framework, the obtained results and their signiﬁcance. The obtained

results look promising and the framework can potentially ﬁnd commercial or academic application in the near

future.

1 INTRODUCTION AND

BACKGROUND

The use of Artiﬁcial Intelligence (AI), especially

Deep Learning(DL) for personality detection has

been researched only very recently(Mehta et al.,

2020). It has been decades since (Mcculloch and Pitts,

1943) proposed the Perceptron and implemented by

(Rosenblatt, 1958) as a mechanism mimicking hu-

man neurological processes. The science and tech-

nology for neural networks has progressed ever since

and we are now in the era of Deep-Learning. How-

ever, there is still much work remaining in terms of

modelling human behaviour through neural networks

and deep-learning. Recent work in modelling hu-

man behaviours and traits through Artiﬁcial Intelli-

gence(AI)/ Machine Learning (ML) include the fol-

lowing. An experiment on face based personality de-

tection of OCEAN - Big Five Factors was done by

(Al Moubayed et al., 2014). Predicting personal-

ity traits from physical behaviour like everyday eye

movements has been researched (Hoppe et al., 2018).

It has also been possible to predict private traits and

attributes from people’s online presence (e.g. Face-

book Likes) using AI / Machine Learning (ML) as

outlined by (Stillwell et al., 2013). Computer based

games for psychometric analysis have been proposed

by (Lahiri et al., 2020). Work has also been done on

modelling behaviour of groups through inverse plan-

ning (Shum et al., 2019) and predicting group be-

haviour.

Along with rapid advancements in Artiﬁcial In-

telligence, Machine Learning, and Deep Learning,

computers have become increasingly more efﬁcient

at tasks that were formerly considered to be the forte

of humans. These tasks even involve artistic skills,

(Wang et al., 2020) reviews image synthesis using

generative adversarial networks. Recent advances

in semantic capture of natural languages using DL

(Wu et al., 2016) have paved the way for this work.

Deep-learning has been proven to capture not only

the syntax but also the semantics of sentences through

sentence-embeddings (Reimers and Gurevych, 2019).

Transformers have proved the ability to capture the

human attention mechanism for language translation

tasks (Vaswani et al., 2017).

However, not much work has been done towards

psychometric emulation using AI especially the ad-

vance in DL and the authors are unaware of any stud-

ies using Deep Learning for emulating personality

traits or characteristics through questionnaires. Very

little literature exists in this domain and it can become

key area for nascent research in the near future.

Lahiri, A., Raj, S., Mitra, U., Sen, S., Guha, R., Mitra, P., Chakrabarti, P. and Basu, A.

Automated Deep Learning Based Answer Generation to Psychometric Questionnaire: Mimicking Personality Traits.

DOI: 10.5220/0011772200003393

In Proceedings of the 15th International Conference on Agents and Artiﬁcial Intelligence (ICAART 2023) - Volume 3, pages 737-743

ISBN: 978-989-758-623-1; ISSN: 2184-433X

 2023 by SCITEPRESS – Science and Technology Publications, Lda. Under CC license (CC BY-NC-ND 4.0)

737

Figure 1: Neural Network training duration plots corresponding to various emulators.

2 OCEAN FIVE FACTOR

Alan Turing once conjectured that at some point

in time machines would be able to mimic human

thought and interaction manifested through an “imita-

tion game” (Turing, 1950). This work attempts to take

one more step in that direction by exploring whether

it is possible for computers to imitate human person-

alities.

The OCEAN Five-Factor model is well known to

psychologists(Goldberg, 1992) and will be described

brieﬂy later in this section. This work explores

whether it is possible for a computer to impersonate

a particular combination of the ﬁve factors deﬁning a

person’s characteristics. A set of 50 questions (in the

form of a questionnaire) is answered by the computer

while imitating that particular set of OCEAN values.

Although it is theoretically possible for a com-

puter to mimic a particular personality and answer the

questionnaire using an exhaustive set of look-up ta-

bles it is not an elegant solution due to the following

pitfalls:

• If there are 50 questions and the value of each

of the 5 personality traits ranges from 1 to 40,

the number of rows on the look-up table would

be 50 × 40

=512 × 10

. Though this technically

achievable it is not a prudent solution for a much

higher number of questions.

• If the questions which are part of the question-

naire were rephrased, then the computer would be

clueless about how to answer the questions to re-

ﬂect a particular personality.

Therefore, an approach using look-up tables

though theoretically possible is not a recommended

solution for practical purposes. Deep-Neural Net-

works(DNN) have shown promise in being able to

generalize their learning and make effective decisions

(Mehta et al., 2020). The minimum theoretical neu-

ral network for answering each question for a given

combination of OCEAN personality traits is shown in

Figure 2. This work investigates whether it is possible

for such a network to impersonate a set of personality

traits while answering a psychometric questionnaire.

The next paragraph gives brief outline of the OCEAN

ﬁve factor model which is often used for psychome-

tric evaluation. The OCEAN model has been chosen

since it freely available and does not require any li-

censing.

For evaluating the OCEAN factors, a question-

naire/form with 50 questions (Goldberg, 1992) needs

to be answered by the test subject. Each question has

5 possible answers numbered 1 to 5, out of which the

subject needs to choose one. Finally, the subject for

the personality test is given a score between 0 to 40

(total of 41 values) for each of the 5 OCEAN person-

ality traits.This questionnaire was used as a basis for

building the DNN framework.

OCEAN - Five Factors: The Big Five personality

Figure 2: Quinary emulator.

ICAART 2023 - 15th International Conference on Agents and Artiﬁcial Intelligence

738

Figure 3: Deep Neural Network framework Personality emulation using BERT.

traits, also known as the ﬁve-factor model (FFM), is

a taxonomy for personality traits (Goldberg, 1992). It

is based on common language descriptors. The ﬁve

factors have been deﬁned as follows:

• Openness to experience (inventive/curious vs.

consistent/cautious).

• Conscientiousness (efﬁcient/organized vs. easy-

going/careless).

• Extroversion: (outgoing/energetic vs. soli-

tary/reserved)

• Agreeableness (friendly/compassionate vs. chal-

lenging/detached)

• Neuroticism (sensitive/nervous vs. se-

cure/conﬁdent).

The objective was to train a Deep Neural Network

which would be able to answer the questions in the

questionnaire so as to reﬂect a particular set of of

OCEAN characteristics( E.g set of OCEAN scores :

20, 30, 35, 15, 20). A set of minimalistic Neural

networks(NN) were created, where each NN corre-

sponded to one question in the questionnaire. These

were then trained through rote learning and mini-

mized in size. The summarized results are shown

in Figure 1. The NNs were ﬁrst tried with binary

response to the questions (either ’yes’ or ’no’) and

each output for the 5 factors would correspond to a

high or a low. It was then progressively made more

rich to capture the nuances of the answers (ternary,

quinary - 5 levels) as per the standard set by (Gold-

berg, 1992). The generic structure of the quinary em-

ulator is shown in Figure 2).

The results summarized in Figure 1 indicate the

order of the resources required by the NNs for cap-

turing the spectrum of answers. The training time in

terms of epochs varied between 3 and 110 for each

Figure 4: BERT language representation model for ques-

tions.

question with an average of 5.9 as shown in Figure

1. Validation accuracy of 100% was required for the

training to be completed. It was found that a single

hidden layer with 4 nodes was able to produce the

correct output to 100% accuracy and the number of

epochs required for training were fairly low. An NN

any smaller than 4 nodes in the hidden layer failed to

reach 100% accuracy even after 1000 epochs. More

nodes in the hidden layer did not contribute much

both in terms of the output spectrum or learning time.

However, as mentioned before such simplistic NN

models would not be able to capture the nuances of

the language therefore a robust language model could

be necessary, otherwise the results obtained would

seem meaningless. The next section describes the in-

corporation of a language model to the DNN and the

corresponding validation mechanism.

3 LANGUAGE MODEL AND

SUITABILITY OF BERT

A language model is required for capturing the se-

mantics of the questions in the questionnaire. The

BERT model (Devlin et al., 2018) claims improve-

Automated Deep Learning Based Answer Generation to Psychometric Questionnaire: Mimicking Personality Traits

739

Figure 5: Examples of Rephrased questions.

Figure 6: Examples of Negative Rephrased questions.

ments in natural language inferencing, paraphrasing

and enhanced ability for question-answer tests. This

study leverages the capabilities of such a pre-trained

language representation model to answer psychome-

tric questions while mimicking a particular personal-

ity.

In this case the BERT(Bidirectional Encoder Rep-

resentations from Transformers) model was pre-

trained and only required minor ﬁne-tuning (Hug-

gingFace, 2018). The BERT model is capable of

mapping sentences (questions in this case) to a set

of embeddings capturing both the semantics of the

words and their positions in the sentence as illustrated

in Figure 4. Additional Neural Network layers were

added to the output layer of the BERT model to con-

vert the positional embeddings from the BERT model

to the ﬁnal question response. The resulting DNN as

shown in Figure 3 was then trained iteratively. The

input to the DNN is the question text. The pre-trained

BERT model outputs a set of 1024 embeddings which

are then provided as input to the second phase of the

DNN. The second phase of the DNN accepts the em-

beddings corresponding to a question from the BERT

models as well the combination of the OCEAN scores

to emulate and produces as output one of the follow-

ing ﬁve answers:

• disagree

• slightly disagree

• neutral

• slightly agree

• agree

The Personality Emulator section of the NN has

5 dense layers in between its input and output layers.

Note, that this size-optimized NN was arrived at em-

pirically after rigorous experimentation with NNs of

varying depths and number of nodes in each layer.

BERT is fundamentally a transformer language

model with a number of encoder layers that may be

changed. A transformer is a deep learning model that

uses the self-attention process and weighs the impor-

tance of each component of the input data differently.

It has been used in effectively in the past for natural

language processing (NLP) (Devlin et al., 2018). It

captures the concept of attention in neural networks

since it tries to mimic cognitive attention in humans.

It does so by enhancing some parts of the input data

while diminishing other parts which insigniﬁcant to

the semantics.

Every input embedding is a combination of three

embeddings as depicted in Figure. 4. Positional em-

ICAART 2023 - 15th International Conference on Agents and Artiﬁcial Intelligence

740

Figure 7: Mean accuracy of the pair of questions with corresponding cosine similarity.

bedding which is learned by BERT to express the po-

sition of the words in a sentence which overcomes

the limitation of transformer, unlike recurrent neu-

ral network(RNN) where it is unable to capture ”se-

quence” or ”order” information in case of NLP. Seg-

ment embedding is able to take a pair of sentences or

phrases of the same sentence as inputs for tasks(E.g.

Question-Answering) and can differentiate between

them. Lastly, there is the token embeddings learned

for the speciﬁc token from the WordPiece token vo-

cabulary (Wu et al., 2016). The input representation

for a particular question is represented by the relevant

token, segment, and position embeddings. Therefore,

it is an effective tool for deriving the meanings of

the questions in a psychometric test and ﬁnding suit-

able answers. An additional spelling checker which

uses Levenshtein distance (Levenshtein et al., 1966)

for checking and correcting spellings can be used if

needed but is not necessary for the current scope.

It is desirable to generalize the emulation frame-

work such that the results can be reproduced even if

the questions in the questionnaire are rephrased. In

order to achieve this a few variants of the original set

of questions were created as shown in Figures 5 and

6. Set 1 refers to the original set of questions from

(Goldberg, 1992). Set 2 is semantically identical to

Set 1, i.e. each questions have been rephrased while

keeping the meaning the same. This was done to vali-

date whether the DNN is able to answer the questions

correctly though text of the question has changed.

The cosine similarities (Li et al., 2004) between Set

1 and Set 2 as output by the BERT network are shown

in ﬁgure 8. Set 3 represents the semantic negative of

the corresponding question in Set 1 without using a

strong negative word like“not’ or “don’t”. Therefore

it is expected to be more complicated for the DNN

to interpret it as the semantically opposite to the cor-

responding question in Set 1. In contrast, Figure 6

shows two additional sets of questions which are se-

mantically opposite to the corresponding question in

Set 1, but use a strong negative word like “not” or

“don’t”. Set 1 was used for training the DNN while

all the other sets were used for test and validation.

The computational platform used for this work

was a server with 40 virtual Xeon processor cores,

96GB RAM, and an Nvidia K80 GPGPU (Nvidia,

2014) for accelerating the DNN computations. For

each iteration, a total of approximately 10 hours of

machine time was required for training the DNNs us-

ing BERT for all 50 questions.

4 RESULTS AND DISCUSSION

Figure 7 shows the average accuracy of the emulator

for all questions across all possible personality types

which is numerically (41)

. The mean accuracy for

the questions ranges from 31.4% to 100% with a com-

Automated Deep Learning Based Answer Generation to Psychometric Questionnaire: Mimicking Personality Traits

741

Figure 8: Distribution of cosine similarities between Set 1

and Set 2.

bined average of 88.55%. On the other hand the Co-

sine similarity of the pairwise question embeddings

(taken from combinations of 2 sets) range from 42.9%

to 93.8% with a combined average of 72.1%. This

refers to cosine similarity between the original ques-

tion in the personality questionnaire and its rephrased

version from another set.

Although the cosine similarity between the em-

beddings of the same question number from different

sets might not be an accurate measure, it gives an indi-

cation that the BERT model is able to ﬁnd a semantic

similarity between the two questions. It is observed

that apart from a couple of outliers most of the ques-

tions are answered accurately. Interestingly, we found

that some of the questions which seem to have been

answered incorrectly have actually been answered us-

ing a very close alternative choice. For example in-

stead of answering with the choice 5 the personality

emulator answered with the choice 4. This is mani-

fested in the heatmap plot in Figure 9. The Heatmap

in Figure 9 shows a distribution of the expected results

vs the obtained result given by the DNN. Given that

answers to personality questionnaires are not always

exact and self-assessments can vary between persons,

a tolerance of +/-1 is often considered by psycholo-

gists. Therefore, if the DNN comes up with a result

that only differs by +/- 1 it can be considered to be

correct.

It is worthwhile to note that the NN has been able

to answer correctly even those questions which have

moderate cosine similarities. This indicates that the

personality emulation section of the DNN correctly

ampliﬁes or attenuates the signiﬁcance of various em-

beddings.

Figure 9: Distribution of Obtained values vs Expected Val-

ues.

5 CONCLUSIONS AND FUTURE

WORK

This study explored how personality traits can po-

tentially be emulated by computers using deep neu-

ral networks. The results seem promising and con-

ﬁrm that the framework can emulate human personal-

ity through answering a psychometric questionnaire .

A potential future work might include extending the

framework to also answer various other psychomet-

ric questionnaires for example, MBTI(Myers Briggs

Type Indicator)(Myers, 1962). In the future, such per-

sonality emulation system can ﬁnd commercial ap-

plications in chatbots(Lokman and Ameedeen, 2018)

which can optimize their communication styles based

on customer preferences.

ACKNOWLEDGEMENTS

The authors would like to thank Arndit Ltd.for the

computing resources especially GPGPU infrastruc-

ture provided for this work.

REFERENCES

Al Moubayed, N., Vazquez-Alvarez, Y., McKay, A., and

Vinciarelli, A. (2014). Face-based automatic person-

ality perception. In Proceedings of the 22nd ACM

International Conference on Multimedia, MM ’14,

page 1153–1156, New York, NY, USA. Association

for Computing Machinery.

Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K.

(2018). Bert: Pre-training of deep bidirectional trans-

formers for language understanding. arXiv preprint

arXiv:1810.04805.

Goldberg, L. R. (1992). The development of markers for the

big-ﬁve factor structure. Psychological assessment,

4(1):26.

Hoppe, S., Loetscher, T., Morey, S. A., and Bulling, A.

(2018). Eye movements during everyday behavior

predict personality traits. Frontiers in human neuro-

science, page 105.

ICAART 2023 - 15th International Conference on Agents and Artiﬁcial Intelligence

742

HuggingFace (2018). RoBERTa.

Lahiri, A., Mitra, U., Sen, S., Chakraborty, M., Kleiman-

Weiner, M., Guha, R., Mitra, P., Basu, A., and

Chakraborty, P. P. (2020). Antarjami: Exploring psy-

chometric evaluation through a computer-based game.

arXiv preprint arXiv:2007.10089.

Levenshtein, V. I. et al. (1966). Binary codes capable of cor-

recting deletions, insertions, and reversals. In Soviet

physics doklady, volume 10, pages 707–710. Soviet

Union.

Li, M., Chen, X., Li, X., Ma, B., and Vit

anyi, P. M. (2004).

The similarity metric. IEEE transactions on Informa-

tion Theory, 50(12):3250–3264.

Lokman, A. S. and Ameedeen, M. A. (2018). Modern chat-

bot systems: A technical review. In Proceedings of

the future technologies conference, pages 1012–1023.

Springer.

Mcculloch, W. and Pitts, W. (1943). A logical calculus of

ideas immanent in nervous activity. Bulletin of Math-

ematical Biophysics, 5:127–147.

Mehta, Y., Majumder, N., Gelbukh, A., and Cambria,

E. (2020). Recent trends in deep learning based

personality detection. Artiﬁcial Intelligence Review,

53(4):2313–2339.

Myers, I. B. (1962). The myers-briggs type indicator: Man-

ual (1962).

Nvidia (2014). Nvidia K80 Tesla.

Reimers, N. and Gurevych, I. (2019). Sentence-bert:

Sentence embeddings using siamese bert-networks.

CoRR, abs/1908.10084.

Rosenblatt, F. (1958). Rosenblatt solved the problem with

his perceptron. Psychological Review, 65(6):386–408.

Shum, M., Kleiman-Weiner, M., Littman, M. L., and Tenen-

baum, J. B. (2019). Theory of minds: Understanding

behavior in groups through inverse planning. Proceed-

ings of the AAAI Conference on Artiﬁcial Intelligence,

33(01):6163–6170.

Stillwell, D., Kosinski, M., and Graepel, T. (2013). Pri-

vate traits and attributes are predictable from digital

records of human behavior. Proceedings of the Na-

tional Academy of Sciences.

Turing, A. M. (1950). I.—COMPUTING MACHINERY

AND INTELLIGENCE. Mind, LIX(236):433–460.

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones,

L., Gomez, A. N., Kaiser, Ł., and Polosukhin, I.

(2017). Attention is all you need. Advances in neural

information processing systems, 30.

Wang, L., Chen, W., Yang, W., Bi, F., and Yu, F. R. (2020).

A state-of-the-art review on image synthesis with gen-

erative adversarial networks. IEEE Access, 8:63514–

63537.

Wu, Y., Schuster, M., Chen, Z., Le, Q. V., Norouzi,

M., Macherey, W., Krikun, M., Cao, Y., Gao, Q.,

Macherey, K., et al. (2016). Google’s neural ma-

chine translation system: Bridging the gap between

human and machine translation. arXiv preprint

arXiv:1609.08144.

Automated Deep Learning Based Answer Generation to Psychometric Questionnaire: Mimicking Personality Traits

743