Hate Speech Detection using Word Embedding and Deep Learning in the

Arabic Language Context

Hossam Faris

, Ibrahim Aljarah

Maria Habib

and Pedro A. Castillo

Department of Information Technology, King Abdullah II School for Information Technology,

The University of Jordan, Amman, Jordan

Department of Computer Architecture and Technology, ETSIIT - CITIC, University of Granada, Spain

Keywords:

Hate Speech, Classiﬁcation, Word Embedding, Machine Learning.

Abstract:

Hate speech over online social networks is a worldwide problem that leads for diminishing the cohesion of

civil societies. The rapid spread of social media websites is accompanied with an increasing number of social

media users which showed a higher rate of hate speech, as well. The objective of this paper is to propose a

smart deep learning approach for the automatic detection of cyber hate speech. Particularly, the detection of

hate speech on Twitter on the Arabic region. Hence, a dataset is collected from Twitter that captures the hate

expressions in different topics at the Arabic region. A set of features extracted from the dataset based on a word

embedding mechanism. The word embeddings fed into a deep learning framework. The implemented deep

learning approach is a hybrid of convolutional neural network (CNN) and long short-term memory (LSTM)

network. The proposed approach achieved good results in classifying tweets as Hate or Normal regarding

accuracy, precision, recall, and F1 measure.

1 INTRODUCTION

Hate speech is the use of offensive, abusive, or insult-

ing language towards an individual or a minority of

people. The objective of hate speech is disseminat-

ing hatred and discrimination based on the grounds of

race, sex, religion, or disability. The European Court

of Human Rights (ECHR) identiﬁes the concept of

hate speech as any use of the language expressions

that spread, encourage, or incite hatred based on race

or xenophobia, and any form of intolerance towards

immigrants or minorities (Court, ).

Twitter is a social networking site and a micro-

blogging service. Social networking sites facilitate

the ability of freely communicating ideas and opin-

ions among end users. The immense use of social net-

working sites results in a large amount of data that can

be analyzed by smart machine learning algorithms.

The use of machine learning and data mining for in-

terpreting such big data provides the ability to capture

the hidden pattern of information. Hence, a higher po-

tential for detecting the hatred patterns of data.

Natural Language Processing (NLP) is concerned

with applying different statistical preprocessing tech-

niques. The purpose of NLP techniques is transform-

ing the textual datasets into datasets that are feasible

by machine learning algorithms. Such NLP processes

are data normalization, stemming, tokenization, and

features extractions. However, NLP processes face

several obstacles when handling complex languages.

The Arabic language is the fourth used language

on the Internet and ranked as the sixth used language

on Twitter (Inc., ). Arabic is rich and complex mor-

phological language that exposes the Arabic text clas-

siﬁcation problem to many challenges. In essence,

different factors made the Arabic NLP techniques rel-

atively hard to implement. For instance, Arabic lan-

guage has different forms such as the dialectical Ara-

bic. The dialectical colloquial Arabic is the most used

form of Arabic on social media networks. Yet, each

Arabic country has different dialect, which makes the

text preprocessing more difﬁcult than the processing

of simple languages. Further, the colloquial Arabic

has many misspellings that differs morphologically

and phonologically. Also, Arabic has complex or-

thography and morphosyntactic rules. All these fac-

tors made the Arabic preprocessing techniques much

complex (Badaro et al., 2019; Alrefai et al., 2018).

In this paper, a dataset is created that is target-

ing the problem of hate speech on Twitter at the Ara-

bic countries. The data was collected using several

keywords such as racism, sport, and Islam. Mainly,

Faris, H., Aljarah, I., Habib, M. and Castillo, P.

Hate Speech Detection using Word Embedding and Deep Learning in the Arabic Language Context.

DOI: 10.5220/0008954004530460

In Proceedings of the 9th International Conference on Pattern Recognition Applications and Methods (ICPRAM 2020), pages 453-460

ISBN: 978-989-758-397-1; ISSN: 2184-4313

453

the dataset contains two classes; the Hate class and

the Normal class. The word embedding technique is

used for extracting a set of words features that can

capture the hidden relations of words of the dataset.

The utilized word embeddings are the Word2Vec and

the AraVec implementations. Keras (Gulli and Pal,

2017) is a deep learning framework used for the im-

plementation of the deep learning model. The pro-

posed deep learning model is a recurrent convolu-

tional network, which is a combination of convolu-

tional network layer and LSTM network. The pro-

posed methodology is evaluated based on different

performance evaluation measures including the accu-

racy, precision, recall, and F1 measure. The results of

detecting hate tweets based on deep learning frame-

work were very good. Yet promising for further re-

search as the problem of cyber hate speech detection

over Arabic countries is poorly investigated.

The rest of the paper is organized as follows. Sec-

tion 2 gives a review of related works of cyber hate

speech detection. Section 3 is the proposed method-

ology including a description of the dataset, the word

embedding and deep learning frameworks, in addition

to the evaluation performance metrics. Section 4 is a

discussion of the conducted experiment and obtained

results. Finally, Section 5 is the concluding remarks

and potential future works.

2 RELATED WORKS

Recently, Arabic natural language processing has

been sparsely studied. Yet, cyber hate speech detec-

tion in Arabic context is poorly investigated. How-

ever, this section reviews previous research studies for

cyber hate speech detection in Arabic context.

Authors in (Al-Hassan and Al-Dossari, 2019)

presented the main challenges for articulating hate

speech over Arabic online social networks. Where

they stated that the colloquial Arabic has many gram-

matical and spelling mistakes. Also, in some Arabic

countries there are words considered hate, while in

other Arabic countries they are normal. Further, au-

thors claimed that all conducted studies in this area

suffer of low recall and precision values. One of

the early attempts for detecting hate speech of Ara-

bic tweets can be found in (Abozinadah et al., 2015),

in which, three types of features were extracted that

are proﬁle-based features, tweet-based features repre-

sented by Term Frequency (TF) and Term Frequency-

Inverse Document Frequency (TF-IDF) models, and

social graph features. Their proposed approach im-

plemented traditional machine learning algorithms in-

cluding Naive Bayes (NB), Support Vector Machine

(SVM), and decision tree (DT), where it achieved

very good performance in terms of recall, precision,

and f-measure.

In (Mubarak et al., 2017), authors created a corpus

of cyberbullying words, which used for abusive lan-

guage detection over Arabic social media. Whereas,

(Alakrot et al., 2018) constructed a dataset for of-

fensive speech detection on YouTube, covering sev-

eral Arabic dialects. The constructed dataset encom-

passes three classes; offensive, inoffensive, and neu-

tral. Additionally, (Albadi et al., 2018) constructed a

dataset for religious hate speech detection over Arabic

Twitter environment. In the proposed approach, au-

thors developed an Arabic lexicon that contains com-

monly used religious terms with their polarities. Also,

the constructed dataset is applied into different clas-

siﬁcation models involving a lexicon-based, N-gram-

based, and a deep learning-based approach. Where

the implementation of a Recurrent Neural Network

with Gated Recurrent Unit and a pre-trained word em-

bedding model achieved (84%) in terms of Area Un-

der Curve measure (AUC).

Nonetheless, (Mulki et al., 2019) collected a Twit-

ter dataset for hate speech and abusive language de-

tection in Arabic context. The created dataset is a

benchmark dataset known as (L-HSAB). The pro-

posed dataset is classiﬁed into three classes; normal,

abusive, and hate. However, authors applied the N-

gram and TF word representation models into SVM

and NB classiﬁers, where the results were very good

in terms of accuracy, recall, precision, and f-measure.

Similarly, (Haddad et al., 2019) designed a dataset for

hate speech detection for Tunisian dialect, which aims

for an automatic prevention of any toxic language.

While (Bleiweiss, ), presented an LSTM approach

based transfer learning for abusive speech detection

on Twitter, which accomplished very good results in

regard to F-measure.

To the best of our knowledge, very few studies in-

vestigated cyber hate speech detection on Arabic on-

line context. Even that, cyber hate speech detection is

also presented in other different languages such as En-

glish, Italian, and Indonesian. For instance, (Watan-

abe et al., 2018) proposed an approach for hate speech

detection on Twitter, in which, unigram and senti-

ment features were extracted and fed into SVM, DT,

and Random forest (RF), achieving good performance

regarding accuracy, recall, precision, and f-measure.

While (Pitsilis et al., 2018) presented an ensemble

approach of recurrent neural networks, for the detec-

tion of racism and sexism on Twitter. In (Del Vi-

gna12 et al., 2017), authors introduced an approach

for Italian hate speech detection on Facebook, where

several syntactical, sentimental, and word embedding

ICPRAM 2020 - 9th International Conference on Pattern Recognition Applications and Methods

454

features were extracted and adopted into SVM and re-

current neural networks. As a result, the proposed ap-

proach showed promising performance. In addition,

(Polignano and Basile, 2018) presented an ensemble

of deep neural networks for Italian hate speech detec-

tion.

The authors in (Fauzi and Yuniarti, 2018) pre-

sented the application of hate speech detection in In-

donesia. Where the collected dataset is applied into

various machine learning algorithms such as SVM,

NB, and k-nearest neighbors. Nonetheless, there are

other efforts in other languages including the Span-

ish (Ortiz et al., 2019), Turkish ( ¸Sahi et al., 2018),

German (Jaki and De Smedt, 2018), and Sri Lanka

(Wijeratne, 2018).

3 METHODOLOGY

In order to develop an intelligent approach for cy-

ber hate speech detection, a methodology is designed.

First step is collecting and preparing the dataset. Sec-

ond is transforming the texts (tweets) into features

that are comprehensible by the machine learning algo-

rithms. Third is building the smart prediction model

that is a deep learning model. Finally, is the evalu-

ation of the model in detecting Hate tweets. Fig. 1

shows a summary of the designed methodology.

DataCollection&Preprocessing

FeaturesExtraction

(WordEmbedding)

ModelDevelopment

(LSTM-CNN)

ModelEvaluation

(Accuracy,Recall,Precision,F1)

Figure 1: An overview ﬂowchart of the methodology.

3.1 Dataset

The dataset is collected from Twitter using Twitter

streaming Application Programming Interface (API)

and “rtweet” library

. The collection of data covers

different critical and debatable areas including sport,

religion, racism, and journalism. Table 1 shows the

rtweet: https://rtweet.info/

used keywords for collecting the tweets. The total

number of collected tweets after removing duplicates

and irrelevant tweets is 3696.

The collected tweets were annotated based on the

overall perceived meaning of the tweet into Hate or

Normal. The total number of Hate tweets is 843,

while the number of Normal tweets is 791. The rest

of the tweets accounts for 2062 and were annotated

as Neutral, which does not exhibit neither Hate nor

Normal orientations. The dataset is a combination of

merely Hate and Normal classes, where the positive

class is the Hate and the negative class is the Nor-

mal class. Table 1 shows the count of the tweets and

the distribution of the classes per each used keyword.

From Table 1, it is obvious that the rate of Hate tweets

is the largest in case of religion and terrorist, as well

as in case of immigrants and journalism even they are

slightly less.

Primarily, the process of preparing the data is de-

scribed in Fig. 2. It starts by collecting the tweets us-

ing R language and Twitter APIs, annotating the data

by two volunteers, cleaning the data from any irrele-

vant or redundant tweets, normalizing the tweets, tok-

enizing tweets, and text vectorization for the objective

of features representation.

DataCollection

DataAnnotation

DataCleaning

DataNormalization

DataTokenization

TextVectorization

Figure 2: The steps of data collection and preprocessing.

In data cleaning, all non-Arabic characters, num-

bers, symbols, punctuation, hashtags, web addresses,

diacritics, and the Arabic stop words were ﬁltered

out. Whereas, tweets normalization is the process of

converting the Arabic characters from the standard-

writing way into the colloquial-writing way. For in-

stance, the Alef Arabic character is transformed from

, , Á, Â into . Another example is transforming © into

«.

Tweets tokenization is concerned with dividing

the tweets into set of words based on the white-space

delimiter. Herein, the tokenization process is imple-

mented using the Natural Language Toolkit (NLTK)

library (Loper and Bird, 2002) within Python devel-

opment framework. Text vectorization is the pro-

Hate Speech Detection using Word Embedding and Deep Learning in the Arabic Language Context

455

Table 1: Description of used keywords for collecting tweets and the percentage of hate and normal classes.

Keyword Translated Keyword Tweets# Positive (%) Negative(%)

dw Alwahadat sport club 14 28.6% 71.4%

¨ C¯ ©C¤d The Jordanian league 44 13.6% 86.4%

 C¯ ¨lOyf Faisaly Jordan 24 41.7% 58.3%

®F¯ ryd ,A¡C¯¤ ®F¯ Islam and terrorism, damage Islam 100 82.0% 18.0%

T§rOn` Racism 1193 46.8% 53.2%

w·¯ ,¸¯ Refugees 240 70.0% 30.0%

,rÌt {¡A ,Vw ,®¯ ,T§r

rWt ©CAs§

Freedom, media, homeland, Nahed

Hattar, extreme

19 78.9% 21.1%

cess of transforming the raw textual data into a col-

lection of numerical features. Mainly, the numeri-

cal features express statistical measurements of the

words. The collections of features are represented

like vectors; thus, it is called text vectorization. In

literature, there are diverse models for text vectoriza-

tion. The most popular models are the Bag-of-Words,

Term Frequency, and Term Frequency-Inverse Docu-

ment Frequency. However, these models cannot cap-

ture the underlying semantic of the words. Therefore,

a more resilient approach for understanding the im-

plicit relationships between words is required which

is the word embedding. In this article, the Word2Vec

word embedding model is utilized.

3.2 Word Representation

Recently, several representations have been proposed

for representing texts in a way that is understandable

by machine or deep learning algorithms. Taking an

example is the one-hot encoding approach. In the

one-hot encoding, all unique words are extracted and

represented in a vector. Then each word is encoded

into a numerical vector of a length equals to the num-

ber of unique words. Where the numerical vector is

a vector of zeros except at the index of the respec-

tive word is set to one. This model is inefﬁcient as it

cannot grasp the implicit meaning of words, while the

words are mainly represented as vectors of zeros.

A more robust approach is the word embedding

mechanism. In word embedding approach, all words

of similar meaning have an analogous encoding. Em-

bedding means a dense vector, where the length of

the vector is a parameter that is set previously. Also,

the components of the dense vector are parameters

that are learned during the training process. A higher

dimension of the embedding corresponds to a better

ability for learning the semantic meaning of words,

but also the need for more large training data.

One of the models that is developed by the word

embedding approach is the Word2Vec (Goldberg and

Levy, 2014). Word2Vec is a neural network of three-

layers, the input layer, the hidden layer, and the out-

put layer. However, this architecture of neural net-

works is merely used to learn the weights of the hid-

den layer that are the words embeddings. Mainly,

Word2Vec has two structures of implementations;

the Continuous Bag-of-Words (CBoW) and the Skip-

Gram (SG) models. The CBoW model depends on

several neighboring words to predict a certain mid-

dle word. Whereas, SG model is the opposite, that is

used to predict the neighboring words for a single de-

termined word. Fig. 3 depicts the main principle of

CBoW and SG models. In which, the CBoW model

takes a set of input words, where the number of words

is determined by a window size. When the window

size is two; four words are considered to predict the

corresponding word (w(t)). The four words are two

preceding words w(t-1), w(t-2) and another two fol-

lowing words w(t+1), w(t+2). While in the SG model,

a single word is considered to predict the neighboring

words. Predicting the neighboring words is accom-

plished by ﬁnding the highest probability words that

are the most similar to the corresponding word.

In this paper a pre-trained word embedding model

is utilized. AraVec is an open source implementation

of word embedding for the Arabic NLP processes.

AraVec created 12 different models based on Arabic

content from Twitter, Wikipedia, and general Arabic

websites, in which, the total number of used vocabu-

laries is 3,300,000,000, all integrated within SG and

CBoW models (Soliman et al., 2017). The word em-

bedding inside the deep learning framework is rep-

resented by an embedding layer. Each embedding

layer is also known as a lookup table that has an in-

put length and a dimension length. The input length

is the size of the unique vocabulary of the respective

dataset. While, the dimension length is a parameter

speciﬁes the length of the embedding. The dimension

length should be ﬁt to suite the problem and the size

of the dataset.

3.3 Recurrent Convolutional Network

Architecture

This subsection presents the proposed approach for

detecting the hate orientation of tweets. A general ar-

chitecture of the proposed deep learning approach is

ICPRAM 2020 - 9th International Conference on Pattern Recognition Applications and Methods

456

w(t-2)

w(t-1)

w(t+2)

w(t+1)

w(t)

Input

Output

CBoW

w(t)

w(t-2)

w(t-1)

w(t+2)

w(t+1)

Input

Output

Embedding

layer

Embedding

layer

Figure 3: An illustration of continuous bag-of-words and skip gram word embedding models.

presented in Fig. 4, which is a sequential layered steps

start by the embedding layer, the dropout layer, the

convolutional layer, the max pooling layer, the LSTM

layer, the dense layer, and the output layer. The em-

bedding layer is constructed using a word embedding

approach which is the Word2Vec, as discussed in the

previous sub-section. The need of the dropout layer is

for avoiding the tendency of overﬁtting.

Even that the embedding layer encoded each word

into a unique vector. But now the objective is how

to propose a model that understands a textual sen-

tence to decide if it carries Hate or Normal expres-

sions. Given that the sentence is a sequence of words.

One of the approaches to represent a variable length

sentences (sequences) is the convolutional approach.

The convolutional layer takes a sequence of embed-

ding vectors as an input and produce a compressed

tensor as an output. The convolutional layer has a col-

lection of ﬁlters that take a group of words of variable

window-sized in order to predict new potential fea-

tures (words). Fig.5 represents a convolutional layer.

The aim of the max-pooling layer is to reduce the di-

mension of the produced tensor by applying a max

ﬁlter.

LSTM is a kind of artiﬁcial Recurrent Neural Net-

works (RNN). An LSTM unit acts like a memory cell.

The structure of the cell includes input and output ac-

tivation layers in addition to three gates. The input

gate, the output gate, and the forget gate. The ob-

jective of LSTM network is to solve the problem of

the vanishing gradient of neural networks (Gers et al.,

1999). In addition, LSTM is very efﬁcient for han-

dling sequences of data, or sequences of words in case

of textual data (Collobert et al., 2011).

The fully connected layer (the dense layer) takes

the output of the LSTM and converts it into class la-

bels or probabilities. The output of the dense layer

goes through an output layer with Sigmoid activation,

since the problem is a binary classiﬁcation.

3.4 Evaluation Measures

The used performance evaluation measures are accu-

racy, precision, recall, and F1 measure. The accuracy

is the ratio of correctly classiﬁed Hate and Normal

tweets over all the correct and the incorrect number of

classiﬁed tweets. Where the accuracy is formulated in

Eq. 1 (Han et al., 2011).

Accuracy =

T P + T N

T P + T N + FP + FN

(1)

The precision metric is identiﬁed as the ratio of tweets

that correctly identiﬁed as Hate over the total number

of Hate tweets, which is represented in Eq. 2.

Precision =

T P

FP + T P

(2)

Whereas, the recall (known as sensitivity) quantiﬁes

how much the classiﬁer can recognize Hate tweets

that is given by Eq. 3.

Recall =

T P

FN + T P

(3)

The F1 measure is a metric for indicating the balance

between precision and recall measures, represented

by Eq. 4.

F1 = 2 x

Precision x Recall

Precision + Recall

(4)

4 EXPERIMENTS AND RESULTS

This section presents the experimental settings for ap-

plying the collected hate speech dataset into a deep

learning framework. Keras is used as a deep learning

framework, and utilized using Spyder platform (Ray-

baut, 2009) and Python version 3.7.

The dataset is divided into (80%) for training and

(20%) for testing. The dataset is approximately bal-

anced at the class level, since the positive class ac-

counts for 52%, whereas the negative class is 48%.

Hate Speech Detection using Word Embedding and Deep Learning in the Arabic Language Context

457

Input(Tweets) Embeddinglayer

Convolutionallayer

Dropoutlayer

LSTMlayer Maxpoolinglayer

Fullyconnected

(Denselayer)

Outputlayer

(Sigmoid)

Output

(HateorNormal)

Figure 4: Proposed deep learning architecture.

Tweet1

(nxd)representationof

eachtweetoverdifferent

windowssizes

Convolutional

layerwiththree

filters

Max-

pooling

Figure 5: A mapping between the embedding layer, the convolutional layer, and the max-pooling layer.

The problem of detecting hate speech is formulated as

binary classiﬁcation problem; where the “1” labeled

class is the Hate and “0” is the Normal.

The unique tokens of the dataset were stemmed

using ISRI Arabic stemmer from NLTK library

(Loper and Bird, 2002). ISRI stemmer is a light

stemmer for ﬁltering out the afﬁxes of words. Thus,

the stemmed dataset is adopted into two different

implementations of pre-trained word embedding ap-

proaches. The used embedding approaches are the

Word2Vec and the AraVec. The Word2Vec is an im-

plementation from Gensim library (Rehurek and So-

jka, 2011). In which, the window size is set to 5,

the number of workers is 4, and the minimum count

of word occurrences is 1. For the AraVec approach,

two models were used. The full N-gram with SG

model and the full N-gram with CBoW model. The

two models are pre-trained using Twitter with embed-

ding dimension (100). In consequence, using either

the Word2Vec or the AraVec the embedding layer is

constructed.

A sequential model of deep learning is designed.

The sequential deep learning model is a layered ap-

proach, which means that the model is created of

stacked layers. Where each layer has different func-

tion. All tweets sequences before entering the deep

learning model were padded to the maximum length

of the tweets. Hence, the ﬁrst layer is the previously

constructed embedding layer. Next is the dropout

layer with a rate of 25%. Then is the one-dimensional

convolutional layer (conv1D). The conv1D layer con-

volves with the input in one dimensional space in or-

der to create the output tensor. The number of ﬁlters

of the conv1D layer is set to 64, the kernel size is 5,

the used activation is “relu”, and the number of strides

is 1. For the max-pooling layer, the pool size is set to

4, while the LSTM output size is set to 70. The output

layer is designed of one unit with Sigmoid activation

function.

The model is compiled with the “Adam” opti-

mizer which is an adaptive learning optimization al-

gorithm. The loss parameter of the optimizer is

set to “binary-crossentropy” that is recommended for

binary classiﬁcation problems. Table 2 shows the

performance evaluation results of the proposed deep

learning model.

The recurrent convolutional model is tested over

different number of epochs as shown in Table 2.

Obviously, by comparing the performance of the

Word2Vec and the AraVec, AraVec accomplishes su-

perior results in terms of all metrics. Hence, achieved

best accuracy of (66.564%), best recall of (79.768%),

ICPRAM 2020 - 9th International Conference on Pattern Recognition Applications and Methods

458

Table 2: The performance results of the recurrent convolutional network based on the accuracy, precision, recall, and F1

measure.

Epochs Word embedding structure Embedding dimension Accuracy Recall Precision F1-score

25 word2vec(cbow) 100 52.760 68.072 52.803 59.473

50 word2vec(cbow) 100 58.282 70.700 55.223 62.011

25 Aravec(N_grams & cbow) 100 64.110 65.142 67.058 66.086

50 Aravec(N_grams & cbow) 100 60.429 64.935 57.124 60.790

25 Aravec(N_grams & SG) 100 65.337 59.523 68.965 63.897

50 Aravec(N_grams & SG) 100 66.564 79.768 65.094 71.688

best precision of (68.965%), and best F1 measure of

(71.688%). Even that Word2Vec achieved relatively

close performance regarding the recall and F1 mea-

sure, but AraVec still superior.

The result of comparing the two AraVec mod-

els shows that the SG model obtained better results

than the CBoW model. For instance, when the num-

ber of epochs was 25, the SG model achieved bet-

ter accuracy and precision by holding percentages of

(65.337%) and (68.965%), respectively. Even that

they were performing slightly close. Whereas, com-

paring the SG and CBoW models at 50 epochs shows

that the SG model outperformed the CBoW regarding

all metrics. Where it achieved 66.564%, 79.768%,

68.965%, and 71.688% for the accuracy, recall, pre-

cision, and F1 measure, respectively.

5 CONCLUSION AND FUTURE

WORK

Cyber hate speech is a critical serious problem not

only over the Arabic region, but also worldwide.

Very few studies have investigated the problem of

hate speech detection over online networks. Nonethe-

less, much fewer targeted the Arabic language since

it is highly rich complex language. This paper in-

terpreted the hate speech detection on Twitter, where

a dataset is collected and processed using NLTK li-

brary. The detection approach is a deep learning ap-

proach that takes the word embeddings features as an

input. While the deep learning model is a hybrid of

convolutional and LSTM networks. The results of

the AraVec word embedding approach with the re-

current convolutional networks were very good and

competent. This research disclosed such uncharted

challenges for further exploration. For example, the

need for more standard and large benchmark datasets

for hate speech, the need of more comprehensive lex-

ical resource of abusive offensive Arabic expressions,

as well as, to more deeply dive into deep learning ap-

proaches, yet investigate the evolutionary optimiza-

tion within the deep learning.

ACKNOWLEDGEMENTS

This research is funded by the Deanship of Scientiﬁc

Research in the University of Jordan, Amman, Jordan.

REFERENCES

Abozinadah, E. A., Mbaziira, A. V., and Jones, J. (2015).

Detection of abusive accounts with arabic tweets. Int.

J. Knowl. Eng.-IACSIT, 1(2):113–119.

Al-Hassan, A. and Al-Dossari, H. (2019). Detection of hate

speech in social networks: asurvey on multilingual

corpus. Computer Science & Information Technology

(CS & IT), 9(2):83.

Alakrot, A., Murray, L., and Nikolov, N. S. (2018). Dataset

construction for the detection of anti-social behaviour

in online communication in arabic. Procedia Com-

puter Science, 142:174–181.

Albadi, N., Kurdi, M., and Mishra, S. (2018). Are they

our brothers? analysis and detection of religious hate

speech in the arabic twittersphere. In 2018 IEEE/ACM

International Conference on Advances in Social Net-

works Analysis and Mining (ASONAM), pages 69–76.

IEEE.

Alrefai, M., Faris, H., and Aljarah, I. (2018). Sentiment

analysis for arabic language: A brief survey of ap-

proaches and techniques. International Journal of Ad-

vanced Science and Technology, 119(1):13–24.

Badaro, G., Baly, R., Hajj, H., El-Hajj, W., Shaban, K. B.,

Habash, N., Al-Sallab, A., and Hamdi, A. (2019). A

survey of opinion mining in arabic: A comprehen-

sive system perspective covering challenges and ad-

vances in tools, resources, models, applications, and

visualizations. ACM Transactions on Asian and Low-

Resource Language Information Processing (TAL-

LIP), 18(3):27.

Bleiweiss, A. Lstm neural networks for transfer learning in

online moderation of abuse context.

Collobert, R., Weston, J., Bottou, L., Karlen, M.,

Kavukcuoglu, K., and Kuksa, P. (2011). Natural lan-

guage processing (almost) from scratch. Journal of

machine learning research, 12(Aug):2493–2537.

Court, E. European court of human rights.

https://www.echr.coe.int. Accessed: July, 2019.

Del Vigna12, F., Cimino23, A., Dell’Orletta, F., Petrocchi,

M., and Tesconi, M. (2017). Hate me, hate me not:

Hate Speech Detection using Word Embedding and Deep Learning in the Arabic Language Context

459

Hate speech detection on facebook. t Italian Confer-

ence on Cybersecurity (ITASEC17), Venice, Italy.

Fauzi, M. A. and Yuniarti, A. (2018). Ensemble method

for indonesian twitter hate speech detection. Indone-

sian Journal of Electrical Engineering and Computer

Science, 11(1):294–299.

Gers, F. A., Schmidhuber, J., and Cummins, F. (1999).

Learning to forget: Continual prediction with lstm.

IET digital library.

Goldberg, Y. and Levy, O. (2014). word2vec explained:

deriving mikolov et al.’s negative-sampling word-

embedding method. arXiv preprint arXiv:1402.3722.

Gulli, A. and Pal, S. (2017). Deep Learning with Keras.

Packt Publishing Ltd.

Haddad, H., Mulki, H., and Oueslati, A. (2019). T-hsab:

A tunisian hate speech and abusive dataset. In Inter-

national Conference on Arabic Language Processing,

pages 251–263. Springer.

Han, J., Pei, J., and Kamber, M. (2011). Data mining: con-

cepts and techniques. Elsevier.

Inc., S. The most common languages on the inter-

net. https://www.statista.com/statistics/262946/share-

of-the-most-common-languages-on-the-internet. Ac-

cessed: July, 2019.

Jaki, S. and De Smedt, T. (2018). Right-wing german hate

speech on twitter: Analysis and automatic detection.

Manuscript submitted.

Loper, E. and Bird, S. (2002). Nltk: The natural lan-

guage toolkit. In In Proceedings of the ACL Work-

shop on Effective Tools and Methodologies for Teach-

ing Natural Language Processing and Computational

Linguistics. Philadelphia: Association for Computa-

tional Linguistics.

Mubarak, H., Darwish, K., and Magdy, W. (2017). Abusive

language detection on arabic social media. In Pro-

ceedings of the First Workshop on Abusive Language

Online, pages 52–56.

Mulki, H., Haddad, H., Ali, C. B., Alshabani, H., and iCom-

pass Consulting, T. (2019). L-hsab: A levantine twit-

ter dataset for hate speech and abusive language. In

Proceedings of the Third Workshop on Abusive Lan-

guage Online, pages 111–118.

Ortiz, G., Gómez-Adorno, H., Reyes-Magaña, J., Bel-

Enguix, G., and Martinez, G. E. S. (2019). Detection

of aggressive tweets in mexican spanish using multi-

ple features with parameter optimization. In In Pro-

ceedings of the First Workshop for Iberian Languages

Evaluation Forum (IberLEF 2019), CEUR WS Pro-

ceedings.

Pitsilis, G. K., Ramampiaro, H., and Langseth, H. (2018).

Effective hate-speech detection in twitter data us-

ing recurrent neural networks. Applied Intelligence,

48(12):4730–4742.

Polignano, M. and Basile, P. (2018). Hansel: Italian hate

speech detection through ensemble learning and deep

neural networks. EVALITA Evaluation of NLP and

Speech Tools for Italian, 12:224.

Raybaut, P. (2009). Spyder-documentation. Available on-

line at: pythonhosted. org.

Rehurek, R. and Sojka, P. (2011). Gensim–python frame-

work for vector space modelling. NLP Centre, Faculty

of Informatics, Masaryk University, Brno, Czech Re-

public, 3(2).

¸Sahi, H., Kılıç, Y., and Sa

glam, R. B. (2018). Automated

detection of hate speech towards woman on twitter.

In 2018 3rd International Conference on Computer

Science and Engineering (UBMK), pages 533–536.

IEEE.

Soliman, A. B., Eissa, K., and El-Beltagy, S. R. (2017). Ar-

avec: A set of arabic word embedding models for use

in arabic nlp. Procedia Computer Science, 117:256–

265.

Watanabe, H., Bouazizi, M., and Ohtsuki, T. (2018). Hate

speech on twitter: A pragmatic approach to collect

hateful and offensive expressions and perform hate

speech detection. IEEE Access, 6:13825–13835.

Wijeratne, Y. (2018). The control of hate speech on social

media: Lessons from sri lanka. CPR South.

ICPRAM 2020 - 9th International Conference on Pattern Recognition Applications and Methods

460