Social Implications of OCEAN Personality: An Automated BERT-Based

Approach

Chaitanya Chakka, Bathini Sai Akash and Aruna Malapati

Department of Computer Science & Information Systems, BITS Pilani Hyderabad Campus, Hyderabad, India

Keywords:

Personality Prediction, Machine Learning, OCEAN, BERT.

Abstract:

The importance of personality within society is paramount, as it profoundly inﬂuences individual and col-

lective behaviors, interpersonal interactions, and the overall functionality of societies. However, for a long

time, personality detection from online social texts has been lacking in performance. This is due to the limited

data availability and constrained supervised learning frameworks over small labeled datasets. In this work, we

present a novel approach to personality prediction utilizing BERT in conjunction with two notable datasets,

achieving proﬁcient accuracy across the OCEAN traits. The research also extracts linguistic cues that do not

require supervision. Finally, we perform extensive empirical analysis to conclude over four research questions

that deal with the social implications of personality. The approach provides pragmatic results, making use of

the designed automatic personality prediction pipeline. The code has also been made open source to facilitate

enhanced innovation and research beneﬁts (https://github.com/LearningLeopard/personality-prediction).

1 INTRODUCTION

A person’s personality substantially impacts how he

or she makes life decisions, desires, mental and phys-

ical health, and/or way of life. Due to this, various

academic ﬁelds have taken a keen interest in ana-

lyzing and predicting one’s personality. Some appli-

cations of personality prediction include recommen-

dation systems(Dhelim et al., 2022) where sugges-

tions are made for a suitable friend based on differ-

ent aspects of their digital-social footprint. To an-

alyze a person’s personality from a computational

perspective, many consensuses have been proposed

over the past few decades, including the Big Five

Model(also called the OCEAN model) and the Myers-

Briggs Type Indicator(MBTI), and these models can

be used to predict the personality of a person by

exploiting any textual information they have writ-

ten. The Myers-Briggs Type Indicator determines a

person’s personality through a self-report question-

naire (Briggs Myers, 1998) developed by Katharine

Cook Briggs and her daughter Isabel Briggs Myers in

1944. The categories are as follows; Introversion (I)

vs. Extraversion (E)(Energy orientation), Sensing (S)

vs. Intuition (N)(reliance on facts), Thinking (T) vs.

Feeling (F)(logic and objectivity), Perceiving (P) vs.

Judging (J)(structure—perceives)

On the other hand, the Big Five model has been

developed by many researchers working indepen-

dently over the past century, but the foundational con-

cept can be attributed to Goldberg at the Oregon Re-

search Institute(Goldberg, 1990) in 1990. This mea-

sure consists of ﬁve traits, which are deﬁned as fol-

lows:

• Openness: Openness to new emotions and experi-

ences makes a person more creative and unusual.

A pragmatic, information-driven, and occasion-

ally dogmatic individual has low openness.

• Conscientiousness: This category’s high scorers

are often stubborn and laser-focused. However,

someone with low conscientiousness may be more

ﬂexible, impulsive, sloppy, and untrustworthy.

• Extroversion: Extroversion: represents affability.

High scorers in this attribute are forceful and pow-

erful socially. Those that score lower are intro-

verted and less sociable.

• Agreeableness: Very amiable people are kind and

obedient. The other extreme is agnostic and un-

cooperative. They prioritize self-interest over so-

cializing.

• Neuroticism: Neurotic individuals often have

emotional instability, making stressful circum-

stances harder to manage.

Textual data on social networking sites like blogs

494

Chakka, C., Akash, B. and Malapati, A.

Social Implications of OCEAN Personality: An Automated BERT-Based Approach.

DOI: 10.5220/0012362700003636

Paper published under CC license (CC BY-NC-ND 4.0)

In Proceedings of the 16th International Conference on Agents and Artiﬁcial Intelligence (ICAART 2024) - Volume 3, pages 494-506

ISBN: 978-989-758-680-4; ISSN: 2184-433X

and posts conceive information that can be used to

determine the characteristics of a personality. With

the rise of Big data, researchers are now presented

with abundant data that can be exploited for such

analysis. In this paper, we investigate various ma-

chine learning approaches to assess their effective-

ness in predicting personality traits based on textual

data authored by individuals. To improve the super-

vised learning process, we investigate the integration

of various datasets, which leads to notable improve-

ments in model performance. Then, we undertake

a case study to investigate how distinct facets of an

individual’s life, like their zodiac sign, age, and oc-

cupation, can provide insights into their personality

using the most promising models from the above ex-

periments. To support the veracity of our conclusions,

we also provide extensive statistical analysis for each

factor across the traits of the Big Five model and all

inferences in the research pertaining to social impli-

cations are grounded in empirical evidence and de-

void of any biases or stereotypes. The pipeline is also

open-sourced for further innovation and facilitation of

Research. Thus, our contributions include:

• Extract The Bidirectional Encoder Representa-

tions from Transformers (BERT, for brevity) and

psycholinguistic embeddings from textual data

by combining two notable datasets for personal-

ity prediction and exploring different machine-

learning models for the task at hand.

• Tabularize the results for different approaches and

note some exceptionally performing models.

• Undertake a case study where we take the best-

performing model from the above experimenta-

tion and perform extensive empirical analysis on

how social factors like age, gender, profession,

and zodiac sign correlate with the OCEAN per-

sonality of a person.

The rest of the paper is organized as follows. In sec-

tion 2, we discuss related work in personality pre-

diction. In section 3, we introduce and describe the

properties of the different datasets involved in this pa-

per. We also combine the datasets involved into the

Big Five model and provide the necessary justiﬁca-

tion. In section 4, we conduct the methodology where

we describe pre-processing and feature extraction us-

ing BERT and other linguistic lexicons. The differ-

ent approaches that are explored in this paper are also

briefed here. The experimental setup and the research

framework are expounded in section 5 and the results

and explained and discussed in section 6. The case

study and the results of the empirical analysis is re-

ported under section 7 followed by the conclusion in

section 8.

2 RELATED WORK

Due to its wide range of applications combined with

the emerging era of big data, personality detection

has led researchers to focus on using natural lan-

guage processing for the task. (Liu and Zhu, 2016)

took advantage of the temporal nature of the Sina Mi-

croblog dataset and created a Linguistic Represen-

tation Feature Vector(LRFV), which makes embed-

dings for each user using a modiﬁed version of the

Linguistic Inquiry Word Count(LIWC). A temporal

vector has been created with each value correspond-

ing to the LIWC frequency values, which are then re-

duced using the Fast Fourier transform. This was fur-

ther reduced using the Stacked Auto Encoders model

and passed through linear regression models for each

trait, yielding a best Pearson correlation coefﬁcient of

0.258. The innovation in this model lies in creating

LRFV, which is created by transforming the blog data,

taking only the semantic meaning of the data, yielding

better results. (Qin et al., 2022) used textual features

and their digital footprint on social networking sites to

extract comprehensive embeddings for OCEAN value

prediction. Datasets have been obtained by conduct-

ing a survey using the NEO personality questionnaire

to extract the non-textual features. The model has

been tested on the uniﬁed Facebook dataset presented

at the International AAAI Conference on Weblogs

and Social Media(ICWSM) in 2013. Textual embed-

dings are generated using LDA for ten topics where

text features are obtained by taking the topic proba-

bility distribution of each text. These features were

passed through a multilayer perceptron with ﬁve neu-

ron outputs for each trait. When compared to several

models presented at the same conference as above, the

results demonstrate a signiﬁcant improvement across

different metrics.

Although the two measures of personality seem to be

taking different approaches towards human behavior,

(Furnham, 1996) has researched the discernible over-

lap between them in 1996, where they conclude that

except for neuroticism, all the other traits of OCEAN

personality measure distinctly correlate with one of

the classiﬁcations of the MBTI personality. This

could help combine datasets that use different mea-

sures, thereby helping us with better prediction per-

formances. This approach was brieﬂy employed by

(Katare et al., 2022), where feature extraction was

performed using GloVe and Bag of Words models,

and they showed a notable improvement in predict-

ing the model’s personality.

Through self-attention mechanisms that identify con-

textual ties in data, Transformers, a ground-breaking

machine learning architecture, has revolutionized nat-

Social Implications of OCEAN Personality: An Automated BERT-Based Approach

495

ural language processing and achieved cutting-edge

performance across various applications. The most

recent innovation in the ﬁeld of NLP using this mech-

anism is the BERT language model developed by

Google in 2019(Devlin et al., 2019). (Kazameini

et al., 2020) has used multiple variations of the BERT

model to generate document embeddings, which were

then passed into a bagged SVM model. This model

was compared against Word2Vec models and multi-

layer perceptron and outperformed them with an ac-

curacy of 60 percent. In addition to their model sur-

passing the previous state-of-the-art, they were also

able to improve training times, which were decreased

from 50 hours to roughly 7 minutes. (Majumder

et al., 2017) devised a novel mechanism using Con-

volutional Neural network models(CNNs) to obtain

sentence embeddings from word vectors, and these

were aggregated into document vectors along with

Mairesse features. Datasets obtained were ﬁltered us-

ing NRC Lexicon to have at least one emotionally

charged word, and word embeddings were generated

using the pretrained word2vec model. These were

passed to different classiﬁcation models, and the re-

sults claimed to have outperformed state-of-the-art re-

sults across all traits of the Big Five model.

3 DATASET CREATION AND

PROCESSING

Personality prediction performance is remarkably re-

lated to both data quality and quantity. The cur-

rent state-of-the-art research was developed on scarce

datasets due to labeled data constraints. On inspec-

tion, the current research identiﬁed three standard la-

beled datasets for predicting personality.

We utilized the Essays dataset (Pennebaker and King,

1999), which includes 2468 student essays anno-

tated with OCEAN Big Five personality and Kag-

gle MTBI dataset(Jolly, 2017) with 8675 data points

gathered from the PersonalityCafe forum. Finally,

Facebook’s myPersonality dataset included a sample

of personality ratings using Facebook proﬁle infor-

mation(Stillwell and Kosinski, 2015). The informa-

tion was gathered using a Facebook application that

included, among other psychological examinations, a

test of the Big Five personality characteristics. Hence,

we captured two datasets with OCEAN traits and one

with MTBI. The descriptive statistics are given in ta-

ble 1. We develop a novel approach to merge three

datasets. However, since there are two different mod-

els employed in each of the datasets, we leveraged the

https://www.personalitycafe.com

research in (Furnham, 1996) on the high correlation

between the two personality measures. The research

concluded that

1. Extroversion of Big Five correlates with Introver-

sion or Extraversion

2. Openness of Big Five correlates with Sensing or

Intuition

3. Agreeableness of Big Five correlates with Think-

ing or Feeling

4. Conscientiousness of Big Five correlates with

Perceiving or Judging

In accordance with this research, we created

a conjunction of the MTBI Kaggle dataset with

OCEAN essays and myPersonality datasets. Integra-

tion was performed in a column-by-column fashion.

It is to be noted that we leave neuroticism empty for

all the data points corresponding to the MTBI dataset.

In total, we get 21,059 data points in the ﬁnal dataset.

Further, to check for data quality and perfor-

mance, we follow our research approach 5 based

on BERT without extensive ﬁne-tuning and run pre-

liminary analysis and results. The produced results

are in the table 2) and are rather unconvincing. A

close inspection of the combined dataset revealed

that the MyPersonality dataset had tiny tokens per

sentence (approx. 20) compared to the other two

datasets(approx. 500)1. The model’s poor perfor-

mance is attributed to the present analysis’s depen-

dence on an ample contextual foundation within the

sentences, thereby enhancing the prediction of their

respective personality traits. Substantial enhancement

in performance was observed upon the removal of

this speciﬁc dataset. This observation underscores the

signiﬁcance of recognizing that pre-trained encoder-

decoder models, such as BERT, rely on the volume

of information available to proﬁciently encapsulate

semantics and contextual nuances within sentences,

manifesting as data-intensive models. Hence, we ﬁ-

nally propose using the Essays OCEAN dataset in

conjunction with the Kaggle MTBI dataset alone, re-

sulting in 11143 data points.

4 METHODOLOGY

4.1 Pre Processing Text Data

Aiming at extracting personality traits from textual

data containing a high degree of semantic informa-

tion, it becomes essential to carefully conserve and

extract the emotional context inherent in sentences

for optimal results. According to empirical evidence

ICAART 2024 - 16th International Conference on Agents and Artiﬁcial Intelligence

496

Table 1: Distributions of Datasets over each Personality Traits.

Dataset Description Sen Length EXT(0,1) OPN(0,1) AGR(0,1) CON(0,1) NEU(0,1)

Essays Dataset Essay is a large dataset

based on the stream of

consciousness that was

collected by Pennebaker

and Laura King

652 1191, 1276 1196, 1271 1157, 1310 1214, 1253 1234, 1233

myPersonality myPersonality was a pop-

ular Facebook application

created by David Stillwell

in 2007

14 5705, 4207 2546, 7366 4647, 5265 4554, 5358 6196, 3716

Personality Cafe This dataset is taken from

Kaggle, extracted from a

website called Personality

Cafe.

1267 6675, 1999 1197, 7747 3981,4693 5241, 3433 0, 0

Table 2: Accuracy Scores of Classical Models including all

Datasets.

Model name EXT OPN AGR CON

SVM 0.662 0.789 0.644 0.607

Logistic Regression 0.667 0.784 0.649 0.611

Random Forest 0.656 0.792 0.622 0.589

cited by (Majumder et al., 2017), excluding emotion-

ally neutral words from the dataset signiﬁcantly im-

proves our model’s accuracy.

To facilitate this process, we utilized the

NRC Word-Emotion Association Lexicon

dataset(Mohammad and Turney, 2013), which

describes the emotional associations of individual

words with fundamental human emotions such as

anger, fear, anticipation, trust, surprise, sadness,

joy, and disgust, as well as general sentiments with

both positive and negative connotations. Using this

dataset, we systematically eliminated all lexemes

devoid of emotional associations, i.e., words that

do not correspond to any of the previously listed

emotions or sentiments. In addition, we standardized

the text by converting all text to lowercase and

removing extraneous characters such as $ and #,etc.

4.1.1 Pre-Trained Embeddings and BERT

The BERT model is a sophisticated deep learning-

based language model designed to generate sentence

embeddings. At the core of BERT’s architectural

framework lies a transformer model characterized by

a series of stacked encoders. It deploys embeddings

that dynamically adapt based on the contextual usage

of a given word across different sentences.

Vannila BERT has two variations: BERT

BASE

consists of 12 encoders and 12 bi-directional self-

attention heads, while BERT

LARGE

consists of 24 en-

coders and 16 self-attention heads. In this research,

we made use of BERT

BASE

to generate embeddings

from text data.

4.1.2 Psycholingustic Features

Psycholinguistic features are a set of linguistic at-

tributes and language characteristics closely related to

cognitive processes and psychological aspects of hu-

man language comprehension and production. These

characteristics illuminate the complex relationship

between language and the human psyche by reveal-

ing how individuals perceive, process, and use lan-

guage. There are methods to extract these features

from spoken and written language. These additional

features extracted for paragraphs/sentences correlate

with the person’s personality. Hence, we integrate

these features along with the extracted BERT embed-

dings. The features utilized in this research are shown

below:

1. LIWC Features: Linguistic Inquiry and Word

Count (LIWC) is a language statistical analysis

software, researchers have widely used it to ex-

tract attributes of text data(Golbeck et al., 2011).

The dictionary consists of 64 categories, and the

words fall into each category. The features are

calculated by the percentage of words present in

each category.

2. SenticNet: This lexicon associates different

words with conceptual primitives, namely pleas-

antness, sensitivity, attention, aptitude, and polar-

ity(Cambria et al., 2018). These features strongly

correlate with the Big Five personality model as

shown in Table 3.

3. NRC Emotion Lexicon: A lexicon of over

14,000 English words named under the eight emo-

tions: anger, anticipation, disgust, fear, joy, sad-

ness, surprise, trust, and sentiment. The ﬁnal

ouput is a 10-dimension vector, the mean of all

values of emotionally charged words in the docu-

ment.

4. VAD Lexicon: This lexicon relates 20,000 En-

glish words with three more descriptors: valence,

Social Implications of OCEAN Personality: An Automated BERT-Based Approach

497

arousal, and dominance. Like NRC, it results in

a three-dimensional vector, which takes a VAD

value mean for all the words in the document.

5. Readability: These features are just statistical

data values that are linear regressors on surface

characteristics like number of words, syllabus,

and sentences. This results in a 31-dimension

vector that could enhance the ﬁnal predictions of

OCEAN personality.

Table 3: Correlation between SenticNet concepts and Big

Five personality traits on Essay’s Dataset

Concepts O C E A N

Pleasantness 0.041 0.066 0.032 0.025 -0.075

Attention 0.113 -0.026 0.013 -0.007 -0.017

Sensitivity -0.011 -0.052 -0.064 -0.034 -0.022

Aptitude -0.045 0.112 0.052 0.081 0.020

Polarity 0.000 0.081 0.037 0.056 0.058

4.2 SMOTE for Class Imbalance

The initial statistics reveal that the dataset is mod-

erately imbalanced, with the majority class covering

70-80% in all traits. To overcome this, we have per-

formed re-sampling with SMOTE analysis. SMOTE

was performed only on the training set after the train-

test split, and the test set was left entirely intact.

Consequently, this would make the machine learning

models more versatile without exaggerating results.

4.3 Classical Machine Learning Models

Classical machine learning offers straightforward yet

potent models designed for classiﬁcation. While

models such as Naive Bayes may prove inadequate

when tasked with capturing intricate information,

they nonetheless furnish a robust foundational bench-

mark against which the efﬁcacy of alternative models

can be duly assessed concerning their performance.

The models considered are as follows:

1. Nave Bayes

2. Support Vector Machine(SVM)

3. Random Forest

4. Logistic Regression

4.4 Deep Learning Models

Deep Neural Networks (DNNs) have become indis-

pensable in Natural Language Processing (NLP) due

Source: Bottom-Up and Top-Down: Predicting Per-

sonality with Psycholinguistic and Language Model Fea-

tures.

to their remarkable ability to process and understand

human language. The success of DNNs in NLP can

be attributed to their ability to learn meaningful text

representations through deep learning automatically.

Multi-layer perceptron is one of the basic models in

Deep neural networks, which serves as a building

block for more complex systems. Hyper-parameters

include the number of layers, the number of neurons

per layer, and the activation for each layer. The last

layer contains one single neuron for binary classiﬁca-

tion with a sigmoid activation function. We make use

of multiple variants of MLP with supervised learning.

4.5 LSTM with BERT

The BERT model extracts word or sentence embed-

dings as input for a complex analysis. However, it

can be used as a classiﬁer by attaching a classiﬁca-

tion head(any other deep learning network) and can

be trained with the backpropagation algorithm. This

classiﬁcation head gives us a vast space of models to

try and see which will perform better in the analysis.

In this analysis, we have used a Bi-directional LSTM

as a classiﬁcation head with varying layers and nodes

for each layer and observed the performance.

5 RESEARCH FRAMEWORK

This section describes the training framework for

each method described in the prior sections. Across

all the methods, we treated each Big Five personality

trait as a different binary classiﬁcation problem. This

strategy ensures that each trait will have the freedom

to choose any other model independent of the other.

One important note is that since neuroticism is not

available in the MBTI dataset, this trait is restricted to

using only the smaller Essays dataset, which points to

a potential drawback of this analysis.

5.1 Feature Extraction Using BERT

The research uses pre-trained word embeddings from

the BERT

BASE

model. The features are extracted by

ﬁrst taking a mean over each word embedding( cor-

responding to a vector with 768 features) to get sen-

tence embeddings. Next, a mean is computed over the

12 BERT transformer layers. Finally, each sentence is

morphed into a 768 feature vector.

ICAART 2024 - 16th International Conference on Agents and Artiﬁcial Intelligence

498

5.2 Integration of Psycholinguistic

Features

We take a conjunction of BERT embeddings with

the aforementioned Psycholinguistic Features. Com-

bined with language model features, these features

could help improve feature quality and capture more

semantic meaning. The ﬁnal dataset consisted of 850

dimensions, where 768 were pre-trained embeddings

from BERT, while the remaining 82 features com-

prised ﬁve psycholinguistic groups.

5.3 Training of Different Models

Classical Models. Grid Search with cross-validation

strategy is employed to tune the hyperparameters

of all classical models and a baseline is setup using

the Gaussian Naive Bayes model. Since using all

the available parameters can drastically increase the

search space, the search space is restricted based on

the parameters’ relevance to the task.

Deep Learning Model. The DNN model is tuned

using Hyperband hyperparameter optimization

algorithm (O’Malley et al., 2019) for following

hyper-parameters: The number of hidden layers, neu-

rons per layer, initial learning rate, and the activation

function of each layer. The base model contains

the input layer of 850 dimensions, and the output

layer is a single neuron with sigmoid activation and

compiled with Adam learning rate optimizer with

Binary cross-entropy loss.

BERT as Classiﬁer. The classiﬁcation head can vary

the number of Bi-LSTM layers and how many neu-

rons each layer has and the output will be passed

through a Dense layer to lead to a single neuron layer

with sigmoid activation. A maximum of 50 epochs

was given with an early stopping strategy with a pa-

tience of 3.

6 RESULTS AND DISCUSSION

6.1 Classical Models

The results of this experiment are summarized in table

4. Out of the three classical models considered, SVM

has performed the best in three of the traits consid-

ered, while the Logistic Regression model performs

better agreeableness. Consciousness, although it had

the best prediction using the Random forest model,

the SVM model predicted with almost the same ac-

curacy. Although all of the models have very close

accuracy scores, it is safe to say that SVM has per-

formed the overall best across all the traits. BERT has

a small context length of 512 tokens, while our dataset

has an average of 800 tokens per sentence. To exploit

this, the experiment was repeated using embeddings

generated by the Longformer model (Beltagy et al.,

2020), which increased the model context length to

4096 tokens. The model elegantly captures longer

context using three strategies: sliding window atten-

tion, dilated sliding window attention, and global +

sliding window. Contrary to our expectations, the ac-

curacy did not improve from the original BERT base

uncased model’s accuracy across all traits.

6.2 Deep Learning Models

The tuned parameters and scores are in table 4, which

used the SMOTE re-sampled dataset. It was observed

that the SMOTE applied dataset required fewer layers

with fewer neurons, while the original dataset needed

a more complicated architecture. This can be at-

tributed to the SMOTE applied dataset having more

clustered and separable data points, making it eas-

ier for the model to learn the separation boundaries.

However, since the data points have increased com-

pared to the original dataset, training time was com-

paratively higher compared to the same. Multi-layer

perceptron performs notably better than the classical

models by an average of 2.2% except for the Con-

sciousness trait. In this case, classical models perform

signiﬁcantly better than deep learning models.

6.3 BERT with LSTM

The results of this experiment are shown in the table

4. An important observation is that no matter how

much more complex we make the classiﬁcation head,

the accuracy did not improve from the values reported

in the table. During the analysis, increasing the num-

ber of layers or the number of neurons, adding ad-

ditional normalization layers in between each LSTM

layer, or adding psycholinguistic layers did not make

any difference in the accuracy of the model. However,

it affected the convergence rate at which the model

achieved the best accuracy. Although Extraversion

and Openness traits have accuracies close to the best

achieved so far, the rest of the models performed sig-

niﬁcantly worse. In this experiment, we skipped neu-

roticism because very little data is available for this

trait.

Social Implications of OCEAN Personality: An Automated BERT-Based Approach

499

Table 4: Accuracy Scores of All Models.

Model name Trait Hyper Parameters Training Score Testing Score

Gaussian

Naive

Bayes

Extroversion var smoothing: 1e-9 - 0.698

Openness var smoothing: 1e-9 - 0.759

Agreeableness var smoothing: 1e-9 - 0.646

Consciousness var smoothing: 1e-9 - 0.776

Neuroticism var smoothing: 1e-9 - 0.581

SVM

Extroversion Kernel: RBF, C: 55, Gamma:0.8 0.730 0.725

Openness Kernel: RBF, C: 10, Gamma: 0.8 0.799 0.798

Agreeableness Kernel: RBF, C: 100, Gamma: 0.008 0.710 0.713

Consciousness Kernel: RBF, C: 10, Gamma: 0.8 0.803 0.792

Neuroticism Kernel: RBF, C: 100, Gamma: 0.001 0.599 0.579

Logistic

Regression

Extroversion Solver: sag , Penalty: L2 , C: 20 0.699 0.711

Openness Solver: Saga , Penalty: L1 , C: 0.001 0.786 0.782

Agreeableness Solver: Newton-Cholesky , Penalty: L2 , C: 1 0.707 0.720

Consciousness Solver: Sag , Penalty: L2 , C: 0.01 0.765 0.765

Neuroticism Solver: Newton-cg , Penalty: L2 , C: 1 0.589 0.595

Random

Forest

Extroversion Criterion: Log-Loss , Max Features: sqrt 0.726 0.720

Openness Criterion: Gini , Max Features: sqrt 0.800 0.793

Agreeableness Criterion: Entropy , Max Features: none 0.685 0.682

Consciousness Criterion: Gini , Max Features: log2 0.798 0.795

Neuroticism Criterion: entropy , Max Features: None 0.589 0.559

Multi Layer

Perceptron

Extroversion Layers: (500, relu), Learning Rate: 0.008 - 0.751

Openness Layers: (300, elu);(500, elu);(200, relu), lr: 0.08 - 0.811

Agreeableness Layers: (200, relu);(300, selu), lr: 0.004 - 0.730

Consciousness Layers: (100, selu);(100, relu), lr: 0.002 - 0.653

Neuroticism Layers: (100, elu), lr: 0.01 - 0.617

BERT

LSTM

Extraversion No of layers: 2, Nodes: 175 0.703 0.718

Openness No of layers: 4, Nodes:200 0.787 0.775

Agreeableness No of layers: 1, Nodes:50 0.540 0.537

Consciousness No of layers: 1, Nodes: 350 0.575 0.608

7 CASE STUDY

The importance of personality in society cannot be

overstated, as it plays a fundamental role in shaping

individual and collective behavior, relationships, and

the overall functioning of communities. Some criti-

cal implications of personality in society are leader-

ship, Career and culture ﬁt, mental health and well-

being, social and cultural norms, gender roles and

equality, etc. Substantial research has been done on

social behavior and personality(Webster and Ward,

2011; Singer and Singer, 1983; Schmitt et al., 2017).

The novelty of our research stems from the utiliza-

tion of mathematical models rooted in machine learn-

ing to articulate the complex social concerns interwo-

ven with personality. These methodologies introduce

original perspectives for discourse and pave the way

for comprehensive automation, as elucidated in sub-

sequent sections. In subsequent research on the topic,

the work answers the following questions:

RQ1: Is there a correlation between zodiac signs and

one’s personality traits?

RQ2: How is gender related to personality based on

empirical evidence?

RQ3: Is personality inﬂuenced by one’s occupational

role within society?

RQ4: How does personality evolve with advancing

age?

To empirically derive conclusions pertaining to

these inquiries, it was imperative to procure and ex-

tract a standardized dataset containing textual state-

ments associated with age, gender, zodiac signs, and

occupational roles. The dataset employed for this pur-

pose is outlined as follows.

7.1 Blog Authorship Corpus

The dataset was named the Blog Authorship corpus

and made available to the public on GitHub(Argamon

et al., 2009). Each data point is a ﬁle in this corpus

in XML format. The naming convention is shown be-

low:

<uniqueId>.<gender>.<age>.<designation>.<zodiacSign>.xml

Each XML ﬁle consists of each blog under a post

tag chronologically arranged under the blog tag. A

sample ﬁle is given below:

<Blog>

ICAART 2024 - 16th International Conference on Agents and Artiﬁcial Intelligence

500

<post>

Well, everyone got up and going this

morning. It’s still raining,...

</post>

<post>

My four-year old never stops talking

...

</post>

...

</Blog>

We parsed the data from the XML ﬁles and created

and collected a corpus of 483090 blogs. Each blog

corresponds to a person, and each person typically

writes more than one blog. Further, every person’s

gender, age, zodiac, and occupation was also ex-

tracted.

It is imperative to acknowledge that the current

study primarily focuses on comprehending the associ-

ations between prevalent beliefs and speculations that

could potentially have implications for a broader de-

mographic. It is essential to clarify that this research

refrains from making deﬁnitive assertions concerning

generic beliefs, relying solely on empirical evidence

to draw its conclusions.

To perform extensive analysis in order to answer

the different research questions posed, we run the

Blog Corpus through our pipeline 5 and generate all

OCEAN labels for each blog ﬁrst. Since all our

questions are aimed at answering people’s respective

traits, it’s essential to take a trait to mean overall blogs

written by each person to get a personality measure

of each respective single person. After this step, we

generate OCEAN personality traits for 18393 peo-

ple. To employ statistical studies for empirical com-

parison for the research questions Mann–Whitney U

test(McKnight and Najab, 2010) Test was employed.

Mann–Whitney U test examines disparities between

two groups on a single, ordinal measure with no de-

ﬁned distribution by generation of p values for each

entity pair. If the ’p’ value of a pair under consid-

eration is ≤0.05 we conclude that the entity pair has

a signiﬁcant difference. The independent samples t-

test, on the other hand, assumes that the sole variable

under consideration conforms to the interval or ratio

scale of measurement, as opposed to the ordinal scale,

and follows a normal distribution. This test essen-

tially compares two unique groups. Therefore, the t-

test and the Mann-Whitney U are fundamentally anal-

ogous tests for detecting whether two sampled groups

belong to the same population. The Mann-Whitney U

is often preferred when the parametric requirements

of the t-test are not fulﬁlled by the data. Since we do

not have an equal number of data points, for example,

out of 18393 people, 1514 people have Gemini; 1470

Table 5: Average OCEAN values based on Zodiac Sign.

Zodiac Sign EXT AGR CON OPN NEU

Cancer 0.862 0.595 0.477 0.666 0.242

Libra 0.873 0.604 0.473 0.675 0.236

Capricorn 0.860 0.604 0.479 0.682 0.240

Sagittarius 0.863 0.603 0.477 0.671 0.239

Virgo 0.863 0.613 0.468 0.678 0.241

Leo 0.867 0.604 0.483 0.676 0.234

Pisces 0.864 0.593 0.473 0.681 0.242

Scorpio 0.862 0.603 0.470 0.672 0.240

Gemini 0.864 0.600 0.468 0.678 0.236

Taurus 0.857 0.621 0.482 0.677 0.236

Aries 0.857 0.607 0.482 0.676 0.240

Aquarius 0.863 0.607 0.465 0.680 0.241

have Sagittarius; 1709 have Virgo, and so on as zo-

diacs, respectively. The t-test assumptions fail here,

hence, we utilize the Mann–Whitney U test.

7.2 RQ1: Zodiac and Personality

A horoscope is an astrological map that shows the

positions of the sun, moon, and other celestial bod-

ies. These speciﬁc alignments of these bodies are

recorded during a person’s birth, which is believed to

provide insights into that person’s life. In Western

astrology, these speciﬁc alignments are called Zodiac

signs. We have 12 zodiac signs. Astrology has gained

quite a traction even though little scientiﬁc evidence

proves their claims. Horoscopes are frequently fea-

tured socially in magazines and newspapers, and most

people believe in these predictions. In order to reach

a conclusion, the inﬂuence of the 12 zodiac signs on

personality differences was thoroughly compared in

this part.

RQ1: Is there a correlation between zodiac signs

and one’s personality traits?

Direct OCEAN Value Analysis Based on Descrip-

tive Statistics: After extracting the 18,393 OCEAN

traits for each individual, we proceed to compute the

average OCEAN values corresponding to each of the

12 zodiac signs. The descriptive statistics are shown

in table 5. On ﬁrst inspection, we note that neuroti-

cism(NEU) mean values are notably lower than the

other four OCEAN traits. This can be attributed to

the lower number of NEU-positive words(have NEU

as 1) and sentences in the Blog Corpus. On inspec-

tion of multiple text statements, it was found that the

overall tone of blogs was more neutral to composed

than anxious or tense. The statistics shown in Table 5

lead to the following observation: most mean values

in different trait categories seem highly similar even

though absolute values vary notably among traits.

Statistical Hypothesis Testing for Comparison of

Zodiac Personality Values. As noted, we used the

Mann–Whitney U test Test as our method for assess-

Social Implications of OCEAN Personality: An Automated BERT-Based Approach

501

Table 6: p values for zodiac signs.

Zodiac pair Trait P-value

(Capricorn, Virgo) NEU 0.047

(Gemini, Sagittarius ) AGR 0.015

(Sagittarius, Cancer) AGR 0.036

(Pisces, Gemini) AGR 0.012

(Pisces, Cancer) AGR 0.028

ing our hypotheses. The ﬁndings of the hypothe-

sis test are shown in table 6, as p-values. However,

we display only the Zodiac entity pairs with p-value

≤0.05. All other zodiac pairs not displayed have p-

values greater than 0.05. The Mann–Whitney U Test

provides a p-value to each possible pair of zodiac

signs for every personality trait when testing hypothe-

ses. The entity pair under examination does not sig-

niﬁcantly differ from each other, as per the null hy-

pothesis (H0). We rejected the null hypothesis using

statistical hypothesis testing if the value of p≤0.05,

and we now declare that there is a signiﬁcant differ-

ence in the personality measure for a speciﬁc trait ( in

OCEAN traits) of the zodiac pair.

On inspection of Table6, we note that only ﬁve

pairs have signiﬁcant differences in personalities with

p values ≤0.05. Added to this, a majority of these

pairs are for Agreeableness. This means for the traits

Openness, Conscientiousness, and Extraversion there

is no notable signiﬁcant difference based on zodiac

signs. In the case of Neuroticism, only the Capricorn-

Virgo pair has a signiﬁcant difference.

Although there is a signiﬁcant difference between

some zodiac pairs, overall, for all pairs, there seems

to be no substantial difference in comparison of per-

sonality measures in all traits. The mean values

from table5 suggest the same. Hence, we infer that

there exists no notable distinction in the person-

ality traits of Openness, Conscientiousness, and

Extroversion predicated upon zodiac signs, albeit

there is a marginal degree of variability. In con-

trast, with regard to Agreeableness, three spe-

ciﬁc zodiac pairings exhibit signiﬁcant differences,

while for Neuroticism, one pairing demonstrates a

signiﬁcant difference as shown.

7.3 RQ2: Gender Based Personality

The personalities of men and women seem to vary in a

number of ways. According to social role theories of

development, sociopolitical power disparities, gender

socialization, and perceived gender roles are the main

causes of gender inequalities(Schmitt et al., 2017).

Further, Gender-based personality differences can be

evident in various aspects of life, such as communi-

cation styles, career choices, relationship dynamics,

hobbies, decision-making, emotional expression, and

Table 7: Average OCEAN values based on Gender.

Gender EXT AGR CON OPN NEU

Male 0.854 0.583 0.501 0.694 0.2475

Female 0.872 0.626 0.449 0.657 0.230

educational preferences. These differences may inﬂu-

ence how individuals interact with others and navigate

different life situations.(Saad and Sackett, 2002)

In this section, we investigate with empirical

methods the difference in gender-based personality,

which could help numerous ﬁelds for increased diver-

sity and well-being. To reach a conclusion, the inﬂu-

ence of typically Male and Female gender on person-

ality traits was thoroughly compared in this part.

RQ2: How is gender related to personality based on

empirical evidence?

OCEAN Value Analysis Based on Descriptive

Statistics: As aforementioned, after extracting

OCEAN traits for each individual, we proceed to

compute the average OCEAN values corresponding

to male and female genders. The descriptive statistics

are shown in table 7. On the ﬁrst inspection, we came

up with the following observations:

• There is a notable difference in mean EXT, AGR,

CON and OPN traits for male and female cate-

gories

• The mean difference for NEU appears to be triv-

ial.

• OPN and CON traits appear to exhibit the highest

degree of disparity between males and females.

Statistical Hypothesis Testing for Comparison of

Male-Female Personality. As noted, we used the

Mann–Whitney U test Test as our method for assess-

ing our hypotheses. The entity pair under examina-

tion does not signiﬁcantly differ from each other, as

per the null hypothesis (H0). We rejected the null

hypothesis using statistical hypothesis testing if the

value of p≤0.05, and we now declare that there is a

signiﬁcant difference in the personality measure for a

speciﬁc trait ( in OCEAN traits) based on Gender.

After running Mann–Whitney U test for all traits

for Male-Female pairs, only the p-value for Open-

ness(OPN) is shown to be ≤0.05. The p-value for

male-female pair for OPN is 8.05e-27. All other traits

show p-values greater than 0.05 showing no signif-

icant difference in male-female personality for each

trait respectively.

Thus we deduce that there is a noteworthy dif-

ference in Openness, Conscientiousness and Ex-

troversion based on mean trait values. There

seems to be minimal variation for Neuroticism.

However, there is clearly a signiﬁcant difference

between male and female Openness . It’s essential

ICAART 2024 - 16th International Conference on Agents and Artiﬁcial Intelligence

502

to point out that these generalizations about gender-

based personality characteristics do not apply to ev-

eryone. Gender is merely one of many aspects that

inﬂuence each person’s distinctive personality char-

acteristics and actions. Human personalities range

widely. Also, when societal conventions and expec-

tations around gender change, there is less of a focus

on stereotyped gender roles and behaviors.

7.4 RQ3: Personality and Its Inﬂuence

on Occupational Industry

Personality can signiﬁcantly inﬂuence career choices

and performance within speciﬁc industries. Different

industries often require distinct personality traits for

success. For example, extroversion and assertiveness

are valued in sales and marketing, while analytical

thinking and attention to detail are important in tech-

nology and IT. Adaptability and teamwork are often

crucial in healthcare, and creativity and innovation are

prized in the arts and design ﬁelds.(Gibby and Zickar,

2008)

RQ3: Is personality related to one’s occupational

role within society?

In the authorship corpus, 38 possible professional

industries were annotated for each person. A fair

number of people(6483) fell into unknown category,

hence these entries were removed for empirical com-

parison. Further since its too involved comparing 38

entity pairs, we manually captured 11 prominent in-

dustries and tagged each person based on this. In

this section, we investigate with empirical methods

the variation of personality in these 11 industries us-

ing the ﬁve ocean traits.

OCEAN Value Analysis Based on Descriptive

Statistics: The computed industry personality aver-

ages are shown as descriptive statistics in table 8. On

the ﬁrst inspection, we infer the following observa-

tions:

• Education has an evident increase in EXT and

AGR but a dip in Conscientiousness.

• Highest CON is shown in the Non-Proﬁt industry.

• Software and Entertainment industry have the

lowest CON. Furthermore, Software also has the

least AGR with the highest OPN

• NEU remains relatively stable for all three age

groups.

Statistical Hypothesis Testing for Comparison of

Personalities Based on Professional Industry. As

aforementioned, we used the Mann–Whitney U test

Test as our method for assessing our hypotheses. If

the value of p≤0.05, we now declare that there is

Table 8: Average OCEAN values based on Professional In-

dustry.

Professional Industry EXT AGR CON OPN NEU

Non-Proﬁt 0.832 0.565 0.523 0.674 0.236

Education 0.875 0.646 0.465 0.680 0.236

Media and News 0.848 0.561 0.513 0.670 0.244

Marketing & Management 0.854 0.540 0.497 0.687 0.246

Software 0.863 0.546 0.486 0.711 0.260

Law and Government 0.845 0.557 0.504 0.677 0.233

Entertainment 0.847 0.581 0.479 0.648 0.230

Engineering & manufacturing 0.851 0.581 0.481 0.683 0.252

Medical 0.852 0.548 0.492 0.698 0.244

Hospitality 0.846 0.564 0.478 0.671 0.238

Finance 0.865 0.530 0.478 0.668 0.239

Figure 1: p values for Openness Trait for different indus-

tries.

a signiﬁcant difference in the personality measure

based on industry for a speciﬁc trait ( in OCEAN

traits)

After running Mann–Whitney U test for all traits

for all industries, the results with p-values ≤0.05 are

shown in table9. Further, we show a graph for the

Openness trait exclusively in Fig(1). On inspection,

it is evident that Engineering and Manufacturing has

signiﬁcant differences with multiple other industries

for AGR and NEU. Next, Law and Gov have signiﬁ-

cant differences w.r.t shown industries for EXT, NEU,

and CON. The software industry shows a signiﬁcant

difference from Med and Law for CON.

Further, in the case of OPN, there are multiple en-

tity pairs with signiﬁcant differences. Software has

signiﬁcant differences with almost every other indus-

try. Similarly, Entertainment has a signiﬁcant differ-

ence with multiple other industries. Other notable en-

tity pairs with p values ≤0.05 are shown in Fig(1).

Thus we infer that there is a signiﬁcant difference

in multiple industries in OCEAN traits. The dif-

ference is more pronounced in Engineering and

Manufacturing, Law and Gov, and Entertainment

Social Implications of OCEAN Personality: An Automated BERT-Based Approach

503

Table 9: p values for Industries.

Industry Trait P-value Industry Trait P-value

(Software, Med and New) CON 7.96e-11 (Med and New, Law and Gov) EXT 0.063

(Mark and Man,Law and Gov) CON 7.96e-11 (Edu, Law and Gov) EXT 0.037

(Eng and Man, Finance) NEU 0.044 (Edu, Eng and Man) NEU 0.035

(Eng and Man, Non-Pro) NEU 0.039 (Med and New, Eng and Man) AGR 0.030

(Eng and Man, Medical) AGR 0.018 (Edu, Eng and Man) AGR 0.002

(Eng and Man, Enter) AGR 0.006 (Edu, Law and Gov) AGR 0.008

(Eng and Man, Fin) AGR 0.035

and software industries respectively.

7.5 RQ4: Personality Variation with

Age

Theories on how people evolve with age, accord-

ing to research, is a period of signiﬁcant changes

in objectives, resources, and coping(Cobb-Clark and

Schurer, 2012) Further, many academics who are in-

terested in personality characteristics and personality

evaluation are skeptical that personality changes in

signiﬁcant and organized ways as people age. The

dataset also could be divided based on age. The

age categories were pre-deﬁned by the authors of

the corpus. Additionally, Male and female bloggers

are equally represented in each age group. The age

groups are Teens(13-17), Young-Adults(23-27) and

Middle Adults(33-47).

In this section, we investigate with empirical

methods the evolution of personality with increasing

age. To reach a conclusion, the changes in all ﬁve

traits for three age groups were extensively compared

in this part.

RQ4: How does personality evolve with advancing

age?

OCEAN Value Analysis Based on Descriptive

Statistics: As aforementioned, after extracting

OCEAN traits for each individual, we proceed to

compute the average OCEAN values corresponding

to male and female genders. The descriptive statistics

are shown in table 7. On the ﬁrst inspection, we come

up with the following observations:

• There is an evident difference in mean EXT, AGR

and CON between Teens and Young Adults

• Noticeable increment is shown from Young

Adults to Middle-aged Adults

• NEU remains relatively stable for all three age

groups.

Statistical Hypothesis Testing for Comparison of

Age Group Based Personality. As aforementioned,

we used the Mann–Whitney U test Test as our method

for assessing our hypotheses. If the value of p≤0.05,

Table 10: Average OCEAN values based on Age.

Age Brackets EXT AGR CON OPN NEU

Teens

(13-17)

0.895 0.699 0.436 0.687 0.239

Young-Adults (23-27) 0.853 0.549 0.490 0.662 0.238

Middle-Adults (33-47) 0.850 0.581 0.489 0.677 0.239

Table 11: p values for Age Groups.

age group pair Trait P-value

(Teens, Y-Adults) OPN 7.96e-11

(Teens, M-Adults ) OPN 0.028

(Y-Adults, M-Adults ) OPN 1.47e-12

we now declare that there is a signiﬁcant difference

in the personality measure for a speciﬁc trait ( in

OCEAN traits) based on Age group category.

After running Mann–Whitney U test for all traits

for all age group pairs, the results with p-values ≤0.05

are shown in table 11. The p-value for every age

group pair for Openness are signiﬁcantly different re-

spectively. This shows that there is drastic difference

in Openness as age progresses. All other traits show

p-values greater than 0.05.

Added to this, there is a clear dip in Extroversion

and Agreeableness from Teens to young adults. con-

scientiousness which typically grows with increasing

responsibility and age goes up from teens to young

adults as expected. Openness too slightly falls down

with age with signiﬁcant differences in correlation for

each of the age groups. Thus we deduce that Ex-

troversion and Agreeableness notably reduce with

increasing age. Neuroticism remains relatively in-

variable. Openness signiﬁcantly differs from teens

to Middle adults with decreases with age. Con-

scientiousness too notably increases from Teens to

young adults but remains stable after.

8 CONCLUSION

The paper aims the creation of a complete automa-

tion of personality detection with proﬁcient perfor-

mance. We achieved the following superior results

on OCEAN personality prediction: Openness: 0.81,

Conscientiousness: 0.80, Extroversion: 0.75, Agree-

ICAART 2024 - 16th International Conference on Agents and Artiﬁcial Intelligence

504

ableness: 0.73, Neuroticism: 0.61.

Next, we perform an extensive comparative anal-

ysis of the social implications of personality with

prominent models and data. The results of the em-

pirical analysis is as follows:

• There is no signiﬁcant difference in Openness,

Conscientiousness, and Extroversion personality

qualities based on zodiac signs.

• Openness is signiﬁcantly difference between men

to women and there is an almost negligible differ-

ence in neuroticism.

• Personality variance in industries is more pro-

nounced for Engineering and Manufacturing, Law

and Government, and Entertainment and software

industries, respectively.

• Extroversion and Agreeableness decrease signif-

icantly with age. Neuroticism typically persists.

Teens through middle adults have dramatically

different openness, decreasing with age. Consci-

entiousness rises from teens to young adults but

stays consistent thereafter.

Further, the pipeline and code are made open

source adding to our objective of enhancing innova-

tion, improving social awareness, fostering commu-

nity, and providing cost-effectiveness. It is to be noted

that all the inferences in the research that involve

social implications are based entirely on predictions

and empirical evidence and are free of any biases or

stereotypes.

ACKNOWLEDGEMENTS

The authors gratefully acknowledge the computing

time provided on the high performance computing fa-

cility, Sharanga, at the Birla Institute of Technology

and Science - Pilani, Hyderabad Campus.

REFERENCES

Argamon, S., Koppel, M., Pennebaker, J. W., and Schler,

J. (2009). Automatically proﬁling the author of

an anonymous text. Communications of the ACM,

52(2):119–123.

Beltagy, I., Peters, M. E., and Cohan, A. (2020). Long-

former: The long-document transformer. arXiv

preprint arXiv:2004.05150.

Briggs Myers, I. (1998). Introduction to type. a guide to

understanding your results on the myers-briggs type

indicator. palo alto. California: Consulting Psychol-

ogists Press, Inc. Brown, Andrew (1998) Organisa-

tional Culture. Harlow: Pearson Buchanan, David

and Badham, Richard.(1999),’Politics and organiza-

tional change: the lived experience’, Human Rela-

tions, 52(5):609–629.

Cambria, E., Poria, S., Hazarika, D., and Kwok, K. (2018).

Senticnet 5: Discovering conceptual primitives for

sentiment analysis by means of context embeddings.

In Proceedings of the AAAI conference on artiﬁcial

intelligence, volume 32.

Cobb-Clark, D. A. and Schurer, S. (2012). The stabil-

ity of big-ﬁve personality traits. Economics Letters,

115(1):11–15.

Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K.

(2019). Bert: Pre-training of deep bidirectional trans-

formers for language understanding.

Dhelim, S., Aung, N., Bouras, M. A., Ning, H., and Cam-

bria, E. (2022). A survey on personality-aware rec-

ommendation systems. Artiﬁcial Intelligence Review,

pages 1–46.

Furnham, A. (1996). The big ﬁve versus the big four: the

relationship between the myers-briggs type indicator

(mbti) and neo-pi ﬁve factor model of personality. Per-

sonality and individual differences, 21(2):303–307.

Gibby, R. E. and Zickar, M. J. (2008). A history of the

early days of personality testing in american industry:

An obsession with adjustment. History of psychology,

11(3):164.

Golbeck, J., Robles, C., Edmondson, M., and Turner, K.

(2011). Predicting personality from twitter. In 2011

IEEE third international conference on privacy, se-

curity, risk and trust and 2011 IEEE third interna-

tional conference on social computing, pages 149–

156. IEEE.

Goldberg, L. R. (1990). An alternative” description of per-

sonality”: the big-ﬁve factor structure. Journal of per-

sonality and social psychology, 59(6):1216.

Jolly, M. (2017). Myers-briggs personality type dataset.

Katare, G., Maurya, A., and Kumar, D. (2022). Personal-

ity prediction from text of social networking sites by

combining myers–briggs and big ﬁve models. In Pro-

ceedings of the International Conference on Cognitive

and Intelligent Computing: ICCIC 2021, Volume 1,

pages 363–372. Springer.

Kazameini, A., Fatehi, S., Mehta, Y., Eetemadi, S., and

Cambria, E. (2020). Personality trait detection us-

ing bagged svm over bert word embedding ensembles.

arXiv preprint arXiv:2010.01309.

Liu, X. and Zhu, T. (2016). Deep learning for construct-

ing microblog behavior representation to identify so-

cial media user’s personality. PeerJ Computer Sci-

ence, 2:e81.

Majumder, N., Poria, S., Gelbukh, A., and Cambria, E.

(2017). Deep learning-based document modeling for

personality detection from text. IEEE Intelligent Sys-

tems, 32(2):74–79.

McKnight, P. E. and Najab, J. (2010). Mann-whitney u test.

The Corsini encyclopedia of psychology, pages 1–1.

Mohammad, S. M. and Turney, P. D. (2013). Crowdsourc-

ing a word-emotion association lexicon. Computa-

tional Intelligence, 29(3):436–465.

Social Implications of OCEAN Personality: An Automated BERT-Based Approach

505

O’Malley, T., Bursztein, E., Long, J., Chollet, F., Jin, H.,

Invernizzi, L., et al. (2019). Kerastuner. https://github.

com/keras-team/keras-tuner.

Pennebaker, J. W. and King, L. A. (1999). Linguistic styles:

language use as an individual difference. Journal of

personality and social psychology, 77(6):1296.

Qin, X., Liu, Z., Liu, Y., Liu, S., Yang, B., Yin, L., Liu,

M., and Zheng, W. (2022). User ocean personality

model construction method using a bp neural network.

Electronics, 11(19):3022.

Saad, S. and Sackett, P. R. (2002). Investigating differen-

tial prediction by gender in employment-oriented per-

sonality measures. Journal of Applied Psychology,

87(4):667.

Schmitt, D. P., Long, A. E., McPhearson, A., O’Brien, K.,

Remmert, B., and Shah, S. H. (2017). Personality and

gender differences in global perspective. International

Journal of Psychology, 52:45–56.

Singer, J. L. and Singer, D. G. (1983). Psychologists look

at television: Cognitive, developmental, personality,

and social policy implications. American Psycholo-

gist, 38(7):826.

Stillwell, D. and Kosinski, M. (2015). mypersonality

project website.

Webster, M. M. and Ward, A. J. (2011). Personality and

social context. Biological reviews, 86(4):759–773.

ICAART 2024 - 16th International Conference on Agents and Artiﬁcial Intelligence

506