Homicide Network Detection based on Social Network Analysis

Victor Chang

, Yeqing Mou

, Qianwen Ariel Xu

1,2

, Harleen Kaur

and Ben S. Liu

Artificial Intelligence and Information Systems Research Group, School of Computing,

Engineering and Digital Technologies, Teesside University, Middlesbrough, U.K.

IBSS, Xi’an Jiaotong-Liverpool University, Suzhou, China

Department of Computer Science and Engineering, School of Engineering Sciences and Technology,

Jamia Hamdard, New Delhi, India

Department of Marketing, Quinnipiac University, U.S.A.

Keywords: Social Network Analysis, Crime Network Detection, K-core Decomposition, Centrality Analysis.

Abstract: This paper aims to explore the use of social network analysis in identifying the most active suspects and

possible crime gangs in the network. The homicide dataset provided by White & Rosenfeld is employed and

both the victim network and suspects network are structured by the use of Rstudio. This paper finds that the

criminal gang and group of victims in homicide cases could be investigated by conducting centrality analysis

and detecting cliques in these two one-mode networks. Moreover, the same features of victims or suspects

are significant indicators for distinguishing and discovering victim groups or criminal gangs. As suspects or

victims with the same features will be gathered into the same community in the community analysis of SNA,

it is more effective to identify victim groups or criminal gangs by analyzing their characteristics, so that crimes

can be resolved more efficiently or even prevented.

1 INTRODUCTION

Nowadays, the crime rate is increasing rapidly, and

massive data on different categories of crime is

generated every year around the world. This is a huge

challenge for the police departments to collect and

store these data. For example, developed networks

and telecom provide criminals with an accessible and

convenient channel to carry out financial crimes

worldwide with developed networks and telecom.

Numbeo (2021) collects crime data from the majority

of countries in the world and evaluates their overall

level of crime by the crime index. The crime level of

a given country is considered as very low if the crime

index is lower than 20, low if the crime index is

between 20 and 40, moderate if the crime index is

between 40 and 60, high if the crime index is between

60 and 80 and very high if the crime index is higher

than 80. According to the statistics from Numbeo

(2021), among the 135 countries it investigated, only

50 countries’ overall levels of crime are considered as

low or very low, and 23 countries’ overall levels of

crime are considered as high or very high at the same

time. In addition, according to the Statistica (2021),

in 2020, Los Cabos, Mexico, had the highest murder

rate of any city in the world, with 111.3 murders per

100,000 population, and the USA had a significantly

higher homicide rate than the global average. Chinese

Judicial Big Data Research Institute (2019) revealed

that the number of telecoms and network fraud cases

increased by 51.47% from 2015 to 2016.

The

statistics on crime are so surprising; therefore, it is

important to develop an effective method for crime

governance, which can effectively identify suspicious

criminals or criminal gangs to prevent future crimes.

Zhang (2019) states that new opportunities are

available for new crime governance, prevention and

control with big data and social network analysis

(SNA), although crime methods also evolve.

There are different types of widespread crimes

and rampant crime methods during different ages.

They are either influenced by local political policies

or benefited from the development of science and

technology. In China, ten kinds of crimes are induced

into criminal offense cases and both financial crimes

and homicides are two of the ten kinds of a criminal

offense. The residents of St. Louis, Missouri, USA, in

the 1990s, lived in the age of rampant homicides.

White & Rosenfeld (2019) collect the homicides'

relevant data from the police. The dataset records the

suspects (682), victims (569) and witnesses (195),

who included the murders that happened in St. Louis

Chang, V., Mou, Y., Xu, Q., Kaur, H. and Liu, B.

Homicide Network Detection based on Social Network Analysis.

DOI: 10.5220/0010533803290337

In Proceedings of the 6th International Conference on Internet of Things, Big Data and Security (IoTBDS 2021), pages 329-337

ISBN: 978-989-758-504-3

 2021 by SCITEPRESS – Science and Technology Publications, Lda. All rights reser ved

329

and the relation between all roles and each homicide

composes a social network (SN).

This paper will first identify the most active

suspects and possible crime gangs in the network

based on the information mentioned above. The

social network analysis will be conducted from two

aspects, including the centrality analysis and

community detection. Then practical advances for

police to monitor and investigate financial crimes

established on SNA are proposed.

Based on the advanced social network analysis

method, this paper, on the one hand, contributed to

the detection of both a one-mode SN and a bipartite

SN. Firstly, the classical homicide network is

analyzed based on suitable methods. Then, the most

active suspects and potential crime gangs are

detected. On the other hand, all the achievements are

available for the law enforcement agency to monitor

and detect financial fraud cases at a faster pace.

2 LITERATURE REVIEW

2.1 Data Analytics in Crime

Governance

With the development of emerging information

technologies, such as big data analytics, data mining,

machine learning, etc., the methods for crime

governance have become more and more advanced.

Kumar and Nagpal (2019) established a Naive

Bayesian classifier to introduce a solution to deal with

and mitigate the issue of crime incidents. They

proposed this solution to identify the most possible

criminal of a specific crime.

To improve the

effectiveness of crime detection and control of criminal

behavior, Xu et al. (2020) employed techniques of data

mining and machine learning algorithms, e.g., k-means

clustering, with cloud computing to carry out image

processing. Moreover, Bhuyan and Pani (2021)

employed a geographical crime mapping algorithm to

determine areas that are at greater risk, and the artificial

neural network was trained by the data provided by

these areas so that the network can be used to model

the crime trends. Additionally, they adopted Hadoop

software to enhance the approach.

Colladon and Remondi (2017) applied social

network analysis to the area of money laundering

detection and employed some common measures in

SNA, such as degree, closeness and betweenness

centrality, as well as network constraint. In this paper,

we will explore the ability of social network analysis

in identifying the most active suspects and possible

crime gangs in the network. Although this method has

been applied in similar areas, in addition to using the

most basic measures, our research conducts a deeper

study of criminal networks through community

analysis, including the clique analysis and k-core

decomposition.

2.2 Social Network Analysis

Social network refers to a net combined by nodes and

links. From a social network perspective, the human

interaction in the social environment can be considered

as a relationship-based model or principle and social

network a quantitative way to reflect this model. SNA

can be divided into two types, which are complete

network analysis (CNA) and egocentric network

analysis (ENA). A CNA concentrates on all nodes

instead of a specific one and it includes all the relations

among these nodes (Marin and Wellman, 2009). For

example, Cox et al. (2019) apply the complete network

analysis in personal drinking intentions. They use it to

study the impact of misperceptions of peer drinking on

college students’ drinking intentions.

Unlike the CNA, an ENA concentrates on the

network around one node, usually called ego. The

network includes all the nodes connected to the ego

and their relations (Marin and Wellman, 2009). To

distinguish real close friends from

mere acquaintances

in social media friends for marketing purposes, Stolz

and Schlereth (2021) employ the egocentric network

analysis to

predict the strength of real-world

connections through online metrics of similarity

interaction and network data. SNA is applied in many

fields, including Criminology. For example, when core

suspects leave a criminal gang, the influence is

forecasted by (Zhou & Bao, 2014) based on SNA.

Additionally, in the study of hacker social networks,

Décary-Hétu and Dupont (2012) show that SNA is

helpful in two aspects. Firstly, SNA is able to identify

the key players' positions in the network. Moreover,

SNA helps determine how many resources are required

to deal with a target organization.

2.2.1 Centrality Analysis

Centrality analysis is a common concept in SNA and

it describes location information of a node or a person

in a network via the number. Thus, the importance of

every vertex would be measured by centrality divided

into three parts: degree, closeness and betweenness

centrality. The degree centrality presents the number

of other nodes adjacent to a given node and it is used

to measure the local centrality (Scott, 1991). A node

with a high degree of centrality means that the node

is more attractive or active than other nodes.

IoTBDS 2021 - 6th International Conference on Internet of Things, Big Data and Security

330

However, it also means that the node is more

powerful and can negatively impact a group by

withholding or distorting the information.

Betweenness centrality is used to evaluate the

degree of a specified node, which stands on the

shortest paths between other nodes in the graph

(Borgatti, 1995) like that identified in critical path

analysis. A node with a high betweenness centrality

means that it plays a significant part by its

gatekeeping role in the network because it can control

the information dispersal among other nodes.

Closeness centrality (Freeman, 1980) is used to

evaluate the global centrality of nodes by measuring

the distance from other nodes. A node with the

highest closeness centrality indicates that it can get

information most efficiently.

2.2.2 Community Detection

The clique analysis is conducted in our research. For

the given figure G= (V, E) - Among them, the V= {1,

…, n} is the vertex set of Figure G, and E is the edge

set of Figure G. The group of Figure G is a collection

of nodes with connecting edges. A clique is a subplot

of G, where all vertices are directly connected to one

another (Bonacich, 1972). A maximum clique is a

clique with the most significant number of vertices.

In a ‘maximum clique’, the meaning of ‘maximum’ is

not the same as ‘maximal’. A maximum clique must

be maximal as well, but the opposite is not necessarily

true. (Ashay Dharwadker, 2006). A clique is always

maximal because it cannot add a node or vertex unless

it makes it less connected (Kouznetsov and

Tsvetovat, 2019).

Apart from the clique, k-core decomposition is

also an available algorithm applied to execute

community detection among a social network. The

rule of the k-core decomposition is that when a vertex

or a node is removed, the edges connected to it are

removed as well and when a node with degree ≤ k is

removed, the nodes leftover with a new degree ≤ k are

then removed. If a node belongs to the graph with k-

core but removed from the graph with (k+1)-core,

then this k is the node’s K-value. The largest K-value

is the K-value of the graph (Alvarez-Hamelina et al.,

2005). Community refers to the group that shares the

same K-value and K value means that all entities are

linked to at least k other objects in the group (OZGUL

et al., 2010). This means that a community refers to a

cluster and the connections between nodes in this

cluster are closer than those with external nodes. The

nodes have the same feature, and with this respect, the

feature refers to the minimum number of nodes linked

to it. Additionally, the K-core algorithm is a layer

analysis used to evaluate the network structure and

extends from the outer layer to the inner layer of the

expanded network hierarchy.

3 NETWORK ANALYSIS

3.1 Data

White & Rosenfeld (2019) provided the dataset this

paper used, including data on crime and data on sex.

The crime data set has 870 participants involved in

crime events, including 569 victims, 682 suspects and

195 witnesses. Among them, 41 had two identities.

The sex data set recorded the gender of each

individual. After the data collection and management

for SNA, Rstudio was adopted as the tool to perform

the subsequent analysis in terms of visualizing the

victim and suspect networks and measuring the

network metrics.

3.2 SNA based on Victims Network

Both bipartite and one-mode networks are applied in

this part. The one-mode network shown in Figure 1 is

connected by victims. The degree centrality of part

victims is shown in Table 1, as extracted from various

internet sources. The names of victims involved in the

murder case are also visible. The lines in the graph

refer to the person at both ends are involved in the

same homicides as victims. The nodes with the most

significant size and the color in the graph represent

the highest degree.

Figure 1: One Mode Network Connected by Victims.

Homicide Network Detection based on Social Network Analysis

331

Figure 2: Bipartite Network Connected by Victims and Suspects.

Figure 3: Taking Abrams Chad as an example.

IoTBDS 2021 - 6th International Conference on Internet of Things, Big Data and Security

332

Table 1: The Centrality Degree of Each Victim.

Victims Degree

BarryKyle 14

DaleMike 14

ArmandeBrian 7

DicksonCatherine 7

AbramsChad 6

BarronPhoebe 6

Besides, for those crimes with considerable victims or

suspects, the relation between these victims and

suspects is connected and visualized in Fig. 2. This is

a bipartite network combined by victims and the

suspects. The nodes with incoming and outgoing

arrows represent victims and suspects, respectively.

The node with the biggest size or the deepest color

means that the person is involved in the largest

number of homicides as suspects or victims. The

numerical results, including the in-degree and out-

degree of each node, are recorded in Table 2.

Table 2: Taking Abrams Chad as an example to show how

the roles change in different crimes.

Homicides

The Number

of Victims

The Number

of Suspects

X940103 15 unknown

X970531 8 unknown

X940088 7 unknown

X950187 6 1

X940133 6 1

X950227 6 unknown

X940105 5 unknown

X960271 4 4

X940092 4 unknown

X950226 4 unknown

X940117 4 1

X940107 4 2

X970409 4 unknown

X950236 4 1

3.2.1 Degree and Clique Analysis

The homicides with the most significant number of

victims would be detected by degree centrality and it

reveals how many victims are included in the same

homicide. The largest degree of victims is 14 in Table

1, which means that there are 14 persons involved in

this homicide with him. This is the largest number of

victims in the same homicide. Table II revealed that

the top three crimes with the most victims are

homicides X940103, X970531, and X940088.

However, the suspects haven't been locked down yet.

The three maximal blue cliques in Fig. 1 show the

relation between these victims. They only connect

with the other persons who are involved in the same

homicides with them. For those groups in Fig. 1,

which cannot be connected as a clique, implies that

some victims of them fall into more than one murder

case.

Fig. 3 shows that Abrams Chad is a victim as well

as a suspect in a different case as he has both

incoming arrows and outgoing arrows. While being

oppressed in multiple cases, he became a suspect in

the murder of others. Thus, mental health should be

noticed to prevent the negative psychology of victims.

3.3 SNA based on Suspects Network

Suspects connect the one-mode network with the

weight shown in Figure 4 in this part and the

corresponding degree is represented in Table . The

nodes represent suspects and they are connected if

they participated in the same murder case. The

thickness of the nodes’ edges represents the weight,

expressing the number of times two criminal suspects

participated in the same cases. In this graph,

independent points are deleted as the suspects who

have no relationship with others are not the research

target of

this report. Fig. 5 reveals the community

detection situation of this suspect's network based on

the K-core decomposition algorithm and the

interpretation for results is in Table 4.

Table 3: The result summary of the suspects’ centrality.

Suspects Degree Betweenness

Keane Arthur 33 1131.6

Marion Joe 26 424.07

Jones Ezekial 17 105.47

Williamson Bradford 2 344

Fig. 5 and Table 2 show the outcomes of the K-core

algorithm, in which the color and size distinguish the

nodes from each other and form the clusters, and the

number at the center of each node represents the K

value. Taking the largest blue cluster as an example,

the nodes in this cluster are linked to at least 12 other

nodes. A particular instance, X403 (Williamson

Bradford), is presented separately in Fig. 6.

Homicide Network Detection based on Social Network Analysis

333

Figure 4: The one-mode network of suspects.

Figure 5: Visualization result of the decomposition.

IoTBDS 2021 - 6th International Conference on Internet of Things, Big Data and Security

334

Table 4: Numeric results of the decomposition.

Color Black Red 1 Green Blue 1

Light

blue

k-value 1 2 3 4 5

Color Purple yellow Grey Red 2 Blue 2

k-value 6 7 8 10 12

Figure 6: Taking Williamson Bradford as an example.

The total number of nodes and edges in each network

is shown as follows.

Table 5: Number of Nodes and Edges.

Graph

Total number

Nodes Edges

Victims - Suspects 51 227

Victims - Victims 213 769

Suspects - Suspects 306 657

3.3.1 Degree and Community Analysis

The degree could be used to target the suspects

involved in the homicides with the largest number of

suspects. There are two possibilities for these

individuals. Firstly, they participate in homicides as a

suspect in a vast number of homicides. Secondly,

there is a considerable number of cahoots with them.

In short, they build the most links with others.

Besides, betweenness centrality, an index that reveals

how many nearest distance routes a node throughout

could explain the importance of nodes. Table III

shows that Keane Arthur and Marion Joe are the most

critical suspects and they deserve the most attention

from the police since they connect with the most

suspects and establish the most significant number of

the shortest paths for all suspects.

Fig. 5 and Table 4 conduct community detection.

The k value of the most significant blue nodes is 12

and this means that all nodes in this community

connected with at least 12 suspects. All the nodes

with this feature will be gathered into the same

community and represented as the largest blue nodes.

Keane Arthur and Marion Joe are also induced into

this largest blue group with 33 and 26 as the degree

centrality, respectively, which further improves the

activity of the two suspects. Nevertheless, the

numeric classification is not the absolute indicator

that could be applied to decide the importance or level

of risk of a suspect. Taking Williamson

Bradford(X403) as an example, he is an active

suspect and connects with some crucial suspects.

However, the k- value is 2, which is the second-

lowest value in this network. Fig. 6 shows that this

suspect is related to two crime cases with substantial

K-values, 3 and 12, separately. In other words, he is

familiar with dangerous persons, even though he is

not active.

4 CONCLUSION AND

SUGGESTION

Our paper applies advanced social network analysis

to the area of crime governance. From the homicide

cases analyzed in our research, the criminal gang and

group of victims could be investigated by detecting

cliques in the two one-mode networks, victim

network and suspect network. Our method can also be

applied to other types of crimes, including financial

fraud detection. With the continuous generation of

crime data at present, it is easy to replicate and be

used by the police departments.

On the one hand, for the network connected by

suspects, for criminal gangs, there is a networked

structure among the internal members of the criminal

organization. Gang crime presents a particular social

network characteristic (Zhong et al., 2019). Both

clique and the community results of the K-core

decomposition algorithm are the significant structure

for the recognition of potential criminal gangs. When

a crowd of individuals always shows the same crimes

as suspects, the relation between them could be

displayed by the bold lines in a clique via social

network analysis methods. Besides, relevant features

could be assigned into the suspects' network in the

form of attributes and these common characteristics

are the necessary factors to clarify a criminal gang.

For example, for telecom financial fraud criminal

gangs, the phone number region can be a significant

Homicide Network Detection based on Social Network Analysis

335

clue for distinguishing and discovering the strong

point of criminal gangs. Furthermore, education and

work experience are powerful features to recognize a

criminal gang. All these features could be added as an

additive in practice to detect criminal gangs based on

cliques or communities.

On the other hand, both telecom financial fraud

and homicides are always well-focused, leading to the

same feature of victims in the same fraud case.

According to Ma (2018), the only way to prevent or

reduce this kind of case is to first combat from the

source and then carry out a full chain strike on the

upstream, midstream and downstream links.

In short, first of all, centrality for suspects is a

critical indicator to investigate how dangerous a

suspect is in a criminal case. Degree centrality is not

the only way to detect itself, betweenness and

closeness centrality are too. A better approach should

combine all three so that the system can issue a

certificate to monitor a suspect's activities. Then,

clique and community detection are the two advanced

methods in SNA to investigate criminal gangs. Some

relevant attributes are also a kind of compelling proof

to recognize targeted criminal gangs.

Our research also has several limitations. First of

all, the data we used is secondary data. It is difficult

to confirm the accuracy and integrity of the data,

which may lead to a possible bias in the analysis

results. The second limitation is that the case we

analyzed is limited to homicides. Thus the

effectiveness of this method in detecting other kinds

of crimes may vary. In our future work, we will

collect more crime data on different kinds of crimes

and test the effectiveness of the social network

analysis method.

FUNDING

This work is supported by VC Research (grant

number VCR 0000094).

REFERENCES

Alexander Kouznetsov, Maksim Tsvetovat, 2019, Chapter 4.

Cliques, Clusters and Components. [Online] Available

at: https://www.oreilly.com/library/view/social-network-

analysis/9781449311377/ch04.html [Accessed 26 12

2019].

Alvarez-Hamelin, J. I., Dall'Asta, L., Barrat, A., &

Vespignani, A., 2005. k-core decomposition: A tool for

the visualization of largescale networks. arXiv preprint

cs/0504107.

Bhuyan, H. K., & Pani, S. K. (2021). Crime Predictive

Model Using Big Data Analytics. Intelligent Data

Analytics for Terror Threat Prediction: Architectures,

Methodologies, Techniques and Applications, 57-78.

Bonacich, P., 1972. Factoring and weighting approaches to

status scores and clique identification. Journal of

mathematical sociology, 2(1), 113-120.

Borgatti, S. P., 1995. Centrality and AIDS. Connections,

18(1), 112-114.

Chinese Judicial Big Data Research Institute, 2019. China

Justice Big Data Service Platform. [Online] Available

at: http://data.court.gov.cn/pages/index.html [Accessed

20 7 2019].

Colladon, A. F., & Remondi, E. (2017). Using social

network analysis to prevent money laundering. Expert

Systems with Applications, 67, 49-58.

Cox, M. J., DiBello, A. M., Meisel, M. K., Ott, M. Q.,

Kenney, S. R., Clark, M. A., & Barnett, N. P. (2019).

Do misperceptions of peer drinking influence personal

drinking behavior? Results from a complete social

network of first-year college students. Psychology of

addictive behaviors, 33(3), 297.

Dharwadker, A., 2006. The clique algorithm. Proceedings

of the Institute of Mathematics, 1-41.

Freeman, L. C., 1980. The gatekeeper, pair-dependency and

structural centrality. Quality and Quantity, 14(4), 585-

592.

Kumar, R., & Nagpal, B. (2019). Analysis and prediction

of crime patterns using big data. International Journal

of Information Technology, 11(4), 799-805.

Ma, Z., 2018. The Research of the Detection Difficulty and

Solution for New Internet Crimes Based on Telecom

Fraud. Journal of People’s Public Security University

of China (Social Sciences Edition), 3, pp. 78-86.

Marin, A., & Wellman, B., 2011. Social network analysis:

An introduction. The SAGE handbook of social

network analysis, 11.

Numbeo, 2021. Crime Index by Country 2021. Available

at: https://www.numbeo.com/crime/rankings_by_

country.jsp [Accessed 20 03 2021]

Ozgul, F., Bowerman, C., Erdem, Z. & Atzenbeck, C.,

2010. Comparison of feature-based criminal network

detection models with k-core and n-clique. 2010

International Conference on Advances in Social

Networks Analysis and Mining, pp. 400-401.

Scott, J., 1991. Social Network Analysis: A Handbook

SAGE Publications Ltd. London, UK.

Statista, 2021. Crime and punishment around the world -

Statistics & Facts. Available at:

https://www.statista.com/topics/780/crime/#dossierSu

mmary__chapter1 [Accessed 20 03 2021]

Stolz, S., & Schlereth, C. (2021). Predicting Tie Strength

with Ego Network Structures. Journal of Interactive

Marketing, 54, 40-52.

White, N. & Rosenfeld, R., 2019. KONET. [Online]

Available at: http://moreno.ss.uci.edu/data.html#crime

[Accessed 20 07 2019].

IoTBDS 2021 - 6th International Conference on Internet of Things, Big Data and Security

336

Xu, Z., Cheng, C., & Sugumaran, V. (2020). Big data

analytics of crime prevention and control based on

image processing upon cloud computing. Journal of

Surveillance, Security and Safety, 1(1), 16-33.

Zhang, J., 2019. Progress, Hot Spots and Frontiers in

Criminology Research 2018. Study on Crime and

Rehabilitation, 03, pp. 13-23.

Zhong, H., Zhang, H., Yin, D. & Shen, H., 2019. The rank

of suspects in the criminal gang based on Page Rank.

Journal of Guangxi Normal University (Natural

Science Edition), 7, pp. 79-86.

Zhou, Z. & Bao, L., 2014. Application of social network

analysis in the investigation of gang fraud crime.

Journal of Jiangxi Policy Institute, 5, pp. 39-44.

Homicide Network Detection based on Social Network Analysis

337