MATRIX: A Comprehensive Graph-Based Framework for Malware

Analysis and Threat Research

Marco Simoni

1,3

and Andrea Saracino

Sapienza Universit

a di Roma, Rome, Italy

Department of Excellence in AI and Robotics (DiPE), TeCIP Institute, Scuola Superiore Universitaria Sant’Anna,

Pisa, Italy

Istituto di Informatica e Telematica, Consiglio Nazionale delle Ricerche, Pisa, Italy

Keywords:

Malware Analysis, Cyber Threat Intelligence, Knowledge Graph, Structured Threat Information Expression.

Abstract:

This paper presents MATRIX (Malware Analysis and Threat Research with STIX), a graph database for the

comprehensive analysis and research of malware and threats. To provide a uniﬁed view of the threat land-

scape, MATRIX integrates data from major cybersecurity frameworks, including MITRE ATT&CK, DEF3ND,

CAPEC, Malware Behavior Catalog (MBC), Metasploit, Common Vulnerabilities and Exposures (CVE) and

Common Weakness Enumeration (CWE). Developed in Neo4j using the Structured Threat Information Expres-

sion (STIX™) standard, MATRIX includes more than 22,910 nodes and combines 14 STIX Domain Objects

(SDOs) and 6 STIX Relationship Objects (SROs) to provide a detailed analysis of malware behavior, detection

rules and defense strategies, making it a valuable tool for cybersecurity research. The system also integrates

real-world malware reports and is automatically updated with data from sources such as VirusTotal, Malware-

Bazaar and VirusShare, supporting continuous and up-to-date threat analysis. We demonstrate its versatility

through case studies comparing malware objectives and analyzing the impact of detection and mitigation.

1 INTRODUCTION

Cybersecurity research demands efﬁcient methods to

represent and analyze diverse data. Graph databases

are increasingly adopted for their ability to model

complex relationships (Reading, 2021), integrating

alerts and logs from multiple tools (Neo4j, 2021)

and revealing hidden patterns (Sheikhalishahi et al.,

2022). Knowledge graphs, a form of graph database,

map real-world entities and relationships, supporting

CTI (Sikos, 2023) (Bolton et al., 2023) and enabling

real-time retrieval, especially in Retrieval Augmented

Generation (RAG) systems (Lewis et al., 2020).

However, cybersecurity research still lacks uni-

ﬁed models that combine disparate data and provide

real-time updates, essential for dealing with evolving

threats. Effective analysis of malware and threats re-

quires an understanding of both the individual compo-

nents and their interactions. For this reason, systems

that are able to logically and semantically combine

different cybersecurity elements into a cohesive struc-

ture are essential to improve threat and malware anal-

ysis. Graphs can help researchers achieve this goal

by enabling the connection of different components,

such as vulnerabilities, exploits, malware and attack

patterns, into a single, interconnected model. It is also

important to constantly update this system to reﬂect

the ever-changing threat landscape.

Contribution. This paper presents MATRIX

(Malware Analysis and Threat Research with STIX), a

graph-based framework speciﬁcally designed for the

comprehensive analysis of malware and threats. It in-

tegrates and links data from MITRE ATT&CK (Cor-

poration, 2025b), DEF3ND (Corporation, 2025d),

CAPEC (Corporation, 2025a), Malware Behavior

Catalog (MBC), Metasploit Framework (Project,

2025a) (Rapid7, 2025), Common Vulnerabilities and

Exposures (CVE), and Common Weakness Enumer-

ation (CWE) to provide a comprehensive overview

of the threat landscape. All data within MA-

TRIX has been obtained through an extensive crawl-

ing process of the aforementioned sources. The

structured information is collected and organized

using the Structured Threat Information Expres-

sion (STIX™) (OASIS, 2020) standard, leveraging

datasets from mitre/cti (Corporation, 2025c) and

MBCProject (Project, 2025b) to ensure a detailed

and consistent representation of malware and threats.

The latest data crawling operation was conducted

in January 2025, ensuring that MATRIX maintains

an up-to-date and reliable knowledge base for cy-

bersecurity analysis. MATRIX contains 14 differ-

Simoni, M., Saracino and A.

MATRIX: A Comprehensive Graph-Based Framework for Malware Analysis and Threat Research.

DOI: 10.5220/0013629300003979

In Proceedings of the 22nd International Conference on Security and Cryptography (SECRYPT 2025), pages 495-502

ISBN: 978-989-758-760-3; ISSN: 2184-7711

495

ent node types, called STIX Domain Objects (SDOs):

Malware, Malware Behavior, Malware Objective,

Malware Method, Indicator, Course of Action,

Data Component, Data Sources, Tool, Intrusion

Set, Campaign, Weaknesses, Vulnerabilities and

Exploit. There are 6 different edge types, called STIX

Relationship Objects (SROs): related-to, mitigates,

uses, indicates, detects, exploits. By linking differ-

ent cybersecurity elements, MATRIX helps to better

understand the behavior of different malware and im-

prove detection, analysis and defense against complex

threats, making it a valuable tool for cybersecurity re-

search and intelligence. The main contributions of

MATRIX:

• We introduce a Neo4j-based graph that integrates

mitre/cti, MBCProject, 14 SDOs, 6 SROs

and rules from CAPA (Mandiant, 2025a) and

SIGMA (SigmaHQ, 2025). Malware reports from

VirusTotal (VirusTotal, 2025) are linked via Elas-

ticSearch (Elastic, 2025). All data and containers

are publicly available.

1 2

• The graph includes over 22,910 nodes (excluding

vulnerabilities and exploits), making it 5x larger

than mitre/cti and 25x larger than MBCProject,

and contains more than 10,000 real malware

hashes.

• MATRIX is continuously updated with data from

VirusTotal, MalwareBazaar (abuse.ch, 2025), and

VirusShare (VirusShare.com, 2025), ensuring on-

going relevance and completeness.

• The graph enables detailed analyses of malware

behavior, objectives, and defensive impact, with

case studies such as rule-based comparison of ob-

jectives, impact evaluation of mitigations, behav-

ior linking across malware families, and tactic-

speciﬁc API analysis.

Paper Organisation. The paper is organized

as follows: Section 2 covers background on graph

databases, CTI, and STIX. Section 3 describes the

MATRIX architecture. Section 4 showcases example

analyses. Section 5 reviews related work in cyberse-

curity knowledge graphs. Section 6 concludes with

future directions.

2 BACKGROUND

Graph Databases and Knowledge Graphs. A

graph database is a NoSQL model optimized for

https://github.com/MATRIX-Malware-Analysis/MA

TRIX/

https://hub.docker.com/r/matrixmalware/matrix

managing complex, often directed, relationships via

graph structures. A knowledge graph extends this

by deﬁning a labeled, directed graph G = (V, E, L),

where entities V are linked by labeled edges E ⊆

V × V × L, representing typed relationships.

Cyber Threat Intelligence (CTI). CTI provides

actionable insights on threats, enabling organizations

to enhance defenses. It operates at tactical (immediate

threats), operational (actors/campaigns), and strategic

(long-term planning) levels. Effective CTI must be

complete, accurate, relevant, and timely.

Structured Threat Information Expression

(STIX). STIX is a standardized format for shar-

ing machine-readable CTI. STIX 2.1 models threat

data as a graph, using STIX Domain Objects (SDOs)

as nodes and STIX Relationship Objects (SROs) as

edges. It supports objects like Malware, Indicator,

and Threat Actor, connected via predeﬁned or custom

relationships (e.g., indicates).

3 MATRIX ARCHITECTURE

The MATRIX architecture, shown in Fig. 1, was built

using the Neo4j framework to organize and connect

the key elements of malware and threat analysis. Most

of the components are based on the MITRE ATT&CK

framework, but to provide a more complete view of

the threat landscape, we have also integrated data

from the Malware Behavior Catalog (MBC), which

focuses speciﬁcally on malware objectives and be-

haviors. The mitre/cti and MBCProject are two

of the most important STIX standards and collections

used in cybersecurity. In MATRIX, all nodes and re-

lationships are based on the STIX objects from the

mitre/cti dataset, with the exception of Malware

Behavior, Objective and Method (in blue in Fig. 1),

which follow the format of the MBCProject. This en-

sures that our graph conforms to the MBC STIX stan-

dard and provides a more detailed and consistent ap-

proach to analyzing malware. The nodes Weaknesses,

Vulnerabilities, and Exploit are not included in any

of the two standard collections; the node Weaknesses

is a new SDO that we have speciﬁcally deﬁned. In

addition, the graph is kept up to date through contin-

uous and automatic updates and becomes more com-

prehensive over time so that it always reﬂects the lat-

est threat data. Below is a breakdown of the SDOs of

the graph.

Malware includes deﬁnitions from MITRE

ATT&CK and MBC, providing details such as aliases,

descriptions, and external references. Malware Be-

havior captures the actions of malware using tech-

niques from MITRE ATT&CK, MBC (with preﬁxes

SECRYPT 2025 - 22nd International Conference on Security and Cryptography

496

Figure 1: MATRIX Architecture Overview.

T, B, E, C, F), and CAPEC. These behaviors are

associated with CAPA and Sigma rules through the

detection rules ﬁeld and are modeled as Attack

Pattern objects in STIX. Malware Objective repre-

sents the high-level intent behind malware behaviors,

derived from ATT&CK tactics and expanded with

objectives from MBC. Malware Method refers to

how behaviors are executed, often represented as sub-

techniques or speciﬁc implementations. These meth-

ods are always associated with behaviors and cannot

exist independently. Indicators include over 10,000

malware hashes and YARA rules from 269 families,

automatically collected from sources like Malware-

Bazaar and VirusShare. Corresponding reports from

VirusTotal are stored in an Elasticsearch database for

analysis, with regular updates ensuring the dataset re-

mains current. Course of Action nodes represent

defensive strategies derived from MITRE ATT&CK

Mitigations and the DEFEND framework, providing

guidance on how to prevent or reduce the impact of

threats. Intrusion Sets describe groups of threat ac-

tors—referred to as Groups in ATT&CK—that oper-

ate over time to conduct campaigns or coordinated

attacks. Campaigns are coordinated sets of mali-

cious activities carried out by an intrusion set, usu-

ally targeting speciﬁc sectors or organizations. Data

Sources represent broader categories of information

such as logs or telemetry that are relevant to identi-

fying ATT&CK techniques. Data Components, on

the other hand, are the speciﬁc elements or system

events—like API calls or process creations—that al-

low the detection of malicious behaviors. Finally,

Tools are legitimate software applications that can be

leveraged by attackers. Analyzing their usage helps in

proﬁling threat actor tactics and understanding how

campaigns are executed. Weaknesses refer to soft-

ware or hardware ﬂaws identiﬁed in the CWE catalog,

which may expose systems to potential risks. Vulner-

abilities correspond to publicly disclosed issues listed

in the CVE database, each describing a speciﬁc ﬂaw

that can be exploited to compromise a system. Ex-

ploits are modules from the Metasploit Framework

designed to target known CVEs, used to simulate or

conduct real-world attacks.

Table 1: MATRIX Nodes and Relationships Summary.

Node Size Number of Nodes

Campaign 120K 28

Course of Action 25M 6029

Data Component 452K 109

Data Source 156K 38

Exploit 38M 4531

Indicator 54M 13407

Intrusion Set 860K 163

Malware 3.4M 829

Malware Behavior 20M 1705

Malware Method 2.0M 482

Malware Objective 144K 35

Tool 352K 85

Vulnerabilities 968M 231315

Weaknesses 5.1M 964

Relationship Type Size Count

MATRIX Relationships 351M 87,642

Table 2: Dataset Comparison.

Dataset Nodes Relat. Malware Indic.

mitre/cti 4237 22259 734 -

MBCProject 892 1015 50 183

Table 1 outlines the MATRIX dataset structure,

including over 230K Vulnerabilities (968MB), 13K

Indicators (54MB), and 6K Courses of Action

(25MB), along with Malware, Tools, and Weak-

nesses. Behavioral data is captured through nodes

like Malware Behavior (1.7K nodes, 20MB) and

Malware Method. The graph comprises 87,642 re-

lationships (351MB). As shown in Table 2, MATRIX

is 541% larger than mitre/cti and 2568% larger than

MBCProject, with over 22,910 nodes (excluding vul-

nerabilities and exploits), and includes over 13,000

real-world indicators and malware reports.

4 APPLICATION OF MATRIX TO

THREAT AND MALWARE

ANALYSIS

We present 7 examples of analytical insights that can

be derived from the graph, along with the time re-

MATRIX: A Comprehensive Graph-Based Framework for Malware Analysis and Threat Research

497

quired to compute them. The analyzes that can be per-

formed are not limited to those we have shown. For

example, while we often perform studies on Malware

Objectives, similar analyzes can also be performed on

Malware Behaviors that are more complex to visual-

ize. More examples can be found here

. The ﬁrst ﬁve

examples were determined using the graph based on

information from MITRE, which would otherwise be

difﬁcult to recognise without putting this information

into a graph. However, the last two examples depend

on the number of real hashes collected; the more mal-

ware samples collected, the more accurate the analy-

sis will be.

Comparison of Malware Objectives Based on

API and String Correlation. Fig. 2 compares the

objectives of 60 Ransomware families, 38 Spyware

families, and 112 Trojan families based on APIs and

strings extracted from detection rules in Malware Be-

havior nodes. The graphs were created by linking

the behaviors of each malware to its objectives and

measuring the correlation between APIs, strings and

objectives. The distribution shows how often APIs

or strings are correlated with an objective, while the

entropy (Morato et al., 2018) indicates the variety of

APIs or strings used to reach each objective. Fig. 2a

shows that all malware types correlate strongly with

the objectives Discovery and Defense Evasion based

on APIs. Ransomware and Trojan have a low cor-

relation with Impact and Persistence, while spyware

has none. Trojan are the only ones that show a low

correlation with Credential Access. Fig. 2b shows

higher API entropy for Discovery and Defense Eva-

sion for ransomware and Trojan, while spyware uses

more predictable APIs. Fig. 2c shows that all malware

types correlate strongly with Credential Access based

on strings, but Trojan show no correlation with Im-

pact and Persistence. Finally, Fig. 2d shows that ran-

somware and Trojan use more distinct strings for Dis-

covery and Defense Evasion, while spyware is more

predictable across all objectives. The time required to

obtain these results is approximately 0.06s.

Analyzing Data Component Impact on Mal-

ware Categories. Figure 3 shows the impact of var-

ious common Data Components on 60 Ransomware

families, 38 Spyware families, 112 Trojan families

and 15 Worm families. The impact is calculated from

the frequency with which each Data Component con-

tributes to the detection of a malware type in relation

to the total data components that affect this malware.

Process Creation and Command Execution are very

inﬂuential for all malware types, especially spyware,

suggesting that system-level behaviors are important

https://github.com/MATRIX-Malware-Analysis/MA

TRIX/tree/main/EXAMPLES

detection indicators. Spyware also shows a greater

reliance on OS API Execution, Script Execution and

File Access, reﬂecting the use of commands and ﬁle

operations for malicious purposes. Trojan are also

characterized by their dependency on OS API Execu-

tion, but also on Network Trafﬁc Flow and Connection

Metadata, so network monitoring is crucial for their

detection. Both the worm and the ransomware affect

Service Metadata and Windows Registry Key Modi-

ﬁcation, probably to persist in the system and main-

tain control by changing critical settings. On the other

hand, other data components such as Process Modiﬁ-

cation and File Creation have minimal impact on all

malware types, making these behaviors less important

for detecting these speciﬁc threats. The time required

to obtain these results is approximately 0.3 ms.

Prioritizing Mitigation Techniques Based on

Malware Type. Figure 4 shows the impact of dif-

ferent mitigation strategies (Course of Action) on 60

Ransomware families, 38 Spyware families, 112 Tro-

jan families and 15 Worm families. The impact is

based on how often each strategy effectively defends

against a malware type in relation to the total Courses

of Action that affect that malware. The graph shows

that spyware relies heavily on Data Loss Preven-

tion and User Training, underlining the importance

of educating users and preventing data exﬁltration.

Trojan particularly beneﬁt from Execution Preven-

tion and Endpoint Behavior Prevention on Endpoint,

highlighting the need to block unauthorized execution

and monitor malicious behavior. Ransomware is pri-

marily inﬂuenced by User Account Management and

Restrict File and Directory Permissions, which em-

phasizes the importance of managing access rights.

Worms, on the other hand, are strongly inﬂuenced

by User Account Management and Restrict File and

Directory Permissions, showing that restrictions and

user permission management are important strategies.

Less effective strategies such as Limit Access to Re-

sources Over Network and Account Use Policies play

a lesser role in containing all malware types. The

time required to obtain these results is approximately

0.04s.

How Key Techniques Connect and Support Di-

verse Malware Types. Measuring Betweenness Cen-

trality (Pontecorvi and Ramachandran, 2015) allows

us to identify which Techniques play a crucial role

in connecting different malware categories. The ob-

jective is to pinpoint key actions that multiple mal-

ware families depend on for propagation, persistence,

or execution. By targeting techniques with high

betweenness centrality, defense strategies can effec-

tively disrupt multiple malware types simultaneously.

As illustrated in Figure 6, T1082 (System Infor-

SECRYPT 2025 - 22nd International Conference on Security and Cryptography

498

Discovery

Defense Evasion

Collection

Execution

Privilege Escalation

Persistence

Impact

Credential Access

Distribution (%)

Ransomware

Spyware

Trojan

(a) Percentage distribution of Objectives in Ran-

somware, Spyware, and Trojan based on unique APIs

Discovery

Defense Evasion

Collection

Execution

Privilege Escalation

Persistence

Impact

Credential Access

APIs Entropy

Ransomware

Spyware

Trojan

(b) Entropy of Objectives in Ransomware, Spyware, and

Trojan based on unique APIs based on unique Strings

Credential Access

Defense Evasion

Discovery

Impact

Execution

Privilege Escalation

Collection

Persistence

Distribution (%)

Ransomware

Spyware

Trojan

somware, Spyware, and Trojan

Collection

Credential Access

Defense Evasion

Discovery

Execution

Impact

Privilege Escalation

Persistence

Strings Entropy

Ransomware

Spyware

Trojan

(d) Entropy of Objectives in Ransomware, Spyware, and

Trojan based on unique Strings

Figure 2: Comparison of percentage distribution and entropy of objectives across Ransomware, Spyware, and Trojan based

on unique APIs, (2a) and (2b), and Strings, (2c) and (2d).

Process Creation

Command Execution

OS API Execution

File Modiﬁcation

Windows Reg. Key Mod.

Service Metadata

Script Execution

Network Trafﬁc Flow

File Access

Network Connection Creation

Network Trafﬁc Content

Process Termination

File Metadata

Process Metadata

File Deletion

Cloud Storage Deletion

Snapshot Deletion

Process Access

Service Creation

Windows Reg. Key Creation

WMI Creation

Scheduled Job Modiﬁcation

Windows Reg. Key Access

Application Log Content

Windows Reg. Key Deletion

Module Load

File Creation

Active Dir. Object Access

Host Status

Process Modiﬁcation

Impact (%)

Ransomware

Spyware Trojan

Worm

Figure 3: Impact of various common Data Components on Ransomware, Spyware, Trojan and worms.

mation Discovery) exhibits the highest centrality for

Ransomware-RAT and RAT-Spyware connections, a

ﬁnding consistent with reports from (Security, 2025)

(Mandiant, 2025b). Similarly, T1105 (Ingress Tool

Transfer) is critical for Backdoor-Ransomware and

Backdoor-Worm relationships, as conﬁrmed by (Ca-

nary, 2025) (for Threat-Informed Defense, 2025) and

(MITRE, 2025). Additionally, T1140 (Deobfusca-

tion/Decoding) serves as a central technique link-

ing Backdoor-Spyware and RAT-Spyware, corrobo-

rated by ﬁndings in (Mandiant, 2025b). T1210 (Com-

mand and Scripting Interpreter), cited as ﬁrst tech-

niques in the top-10 by (MITRE, 2025) exhibits high-

est centrality for Ransomware-RAT and Backdoor-

Ransomware.

Overall, techniques characterized by high interde-

pendence, such as deobfuscation and system discov-

ery, act as essential links between different malware

categories. This makes them prime targets for dis-

rupting malware operations and strengthening cyber-

security defenses. The computational time required

to obtain these results is 5.9 seconds.

Evaluating the Importance of Malware Tech-

niques Using PageRank Analysis. Figure 5 shows

the PageRank (Gleich, 2015) values for the tech-

niques used by 60 Ransomware families, 38 Spy-

ware families, 112 Trojan families, 15 Worm fam-

ilies, 15 RAT families and 208 Backdoor families.

MATRIX: A Comprehensive Graph-Based Framework for Malware Analysis and Threat Research

499

User Account Management

Restrict File and Dir. Permis.

Execution Prevention

Network Segmentation

Restrict Registry Permis.

Operating System Conﬁg.

Data Backup

Behavior Prevention on Endpoint

Privileged Account Management

User Training

Code Signing

Network Intrusion Prev.

Data Loss Prevention

Antivirus/Antimalware

Filter Network Trafﬁc

Update Software

Encrypt Sensitive Information

Disable or Remove Feat. or Prog.

Audit

Restrict Web-Based Content

Exploit Protection

Appl. Isolation and Sandboxing

Multi-factor Authentication

Password Policies

Boot Integrity

App. Developer Guidance

Vulnerability Scanning

Software Conﬁguration

Limit Hardware Installation

Account Use Policies

2.5

7.5

12.5

Impact (%)

Ransomware

Spyware Trojan

Worm

Figure 4: Impact of different mitigation strategies (Course of Action) on Ransomware, Spyware, Trojan and Worm.

A higher PageRank indicates that a particular tech-

nique plays a greater role in the malware’s opera-

tions or is used more frequently. Worms have con-

sistently high PageRank values for several key tech-

niques, such as File and Directory Discovery, as con-

ﬁrmed by (MITRE, 2025) and Native API, which are

essential for their distribution and operation. This

means that the worms rely on a few key actions, mak-

ing these techniques prime targets for disrupting their

spread. Backdoor relies on techniques such as Ingress

Tool Transfer, Web Protocols (also very important for

spyware) and Windows Command Shell, as conﬁrmed

by (Canary, 2025), to gain unauthorized access and

assert itself in a system. In particular, ransomware

relies on techniques such as Inhibit System Recov-

ery, also this conﬁrmed by (MITRE, 2025) and Native

API to disable restore options and manipulate system-

level functions, making it more difﬁcult for users to

restore their system. In contrast, RAT and spyware

have lower PageRank scores, suggesting that they use

a wider range of techniques without relying heavily

on a single action. Even though these types of mal-

ware spread their operations over several techniques,

focusing on a wide range of defense strategies can still

help mitigate their impact. The time required to ob-

tain these results is approximately 0.15s.

Behavioral and Technical Similarities Be-

tween Two Emotet Samples. Table 3 sum-

marises the common API calls, registry keys,

loaded modules and MITRE ATT&CK tech-

niques observed in two Emotet malware samples:

2fd433c3ff68507ddbf0ec3e90a6320b35b44c8089504

403c457bc9819190a0a and 214946b987ad69fa46f1d

27ab35026b856a4fcd2abd46b0b5ba86dc71be58d89.

The data was extracted from CAPE sandbox reports

with real malware samples from VirusShare. The

Jaccard similarity (Fender et al., 2017) score of 0.97

indicates a very high similarity between the two

malware samples based on their common characteris-

tics. The listed API calls, such as VirtualProtect,

GetCPInfo and CloseHandle, show that both

malware samples are involved in similar activities,

including process control, memory management

and system information retrieval. These are typical

actions used by Emotet to achieve persistence and

execute its malicious operations. MITRE techniques

used by both malware samples include Credential

Dumping, Virtualization/Sandbox Evasion and Im-

pair Defenses, suggesting that they focus heavily

on credential evasion and theft. This reﬂects the

typical behaviour of Emotet, which is known for its

ability to bypass security measures and collect sen-

sitive information from infected systems. he loaded

modules, including BCRYPT.DLL and WININET.DLL,

also conﬁrmed by (Shaddy43, 2025), indicate that

both examples use the same Windows libraries for

cryptographic functions, Internet communication and

shell operations. The time required to obtain these

results is 0.2 s.

5 RELATED WORK

Knowledge graphs (KGs) have become crucial in cy-

bersecurity for representing and analyzing complex,

multi-source data. Sikos (Sikos, 2023) emphasizes

their role in enhancing cyber situational awareness

and supporting machine learning. Li et al. (Li et al.,

2024) focus on KG construction and quality evalu-

ation to improve cybersecurity analysis, while Li et

al. (Li et al., 2023a) and (Li et al., 2023b) propose

methods integrating KGs and pre-trained models for

cyber threat intelligence extraction and automation.

Bolton et al. (Bolton et al., 2023) explore ATT&CK-

based threat mapping, and Wang et al. (Wang et al.,

2021) demonstrate how graph databases capture at-

tack behaviors to improve 6G network security. Ren

et al. (Ren et al., 2022) present CSKG4APT, com-

bining KGs and deep learning for APT tracking and

proactive defense. Liu et al. (Liu et al., 2020) design

an ontology for network security based on STIX, en-

hancing attack representation and CTI sharing. Chen

et al. (Chen et al., 2024) improve IoC management on

OpenCTI, achieving a 25.18% increase in conﬁdence

SECRYPT 2025 - 22nd International Conference on Security and Cryptography

500

File and Dir. Discovery

Ingress Tool Transfer

LSASS Memory

Service Execution

Inhibit System Recovery

Remote System Discovery

Exploit. of Remote Services

Replic. Through Remov. Media

Clear Windows Event Logs

Web Protocols

Archive via Custom Method

Proxy: Multi-hop

Windows Manag. Instrument.

Network Share Discovery

Sys. Bin. Proxy Exec.: Rundll32

Asymmetric Cryptography

Network Service Discovery

Windows Command Shell

Component Object Model

Unix Shell

Masquerading

Native API

Domain Generation Algorithms

Code Signing

Credentials from Web Browsers

Peripheral Device Discovery

Account Disc.: Local Account

Hidden Files and Directories

Scheduled Task

Binary Padding

0.5

1.5

·10

−2

PageRank

Ransomware

Spyware Backdoor

RAT Worm

Figure 5: PageRank values for Techniques used by Ransomware, Spyware, Backdoor, RAT and Worm.

T1082 T1083 T1059/003 T1105 T1140 T1106 T1071/001T1070/004 T1057 T1655/001

100

200

300

Betweenness

Ransomware RAT

Backdoor Ransomware

Ransomware Worm

RAT Spyware

Backdoor Spyware

Backdoor Worm

Figure 6: Betweenness Centrality of Behaviors across Malware Categories.

Table 3: Summary of Common Calls, MITRE Techniques between two Emotet hashes.

Category Values

Calls Highlighted VirtualProtect, CryptStringToBinaryA, GetCurrentHwProﬁleA, CloseHandle, TlsGetValue,

EnterCriticalSection, GetLastError, srand, IsValidCodePage, GlobalLock, VirtualAlloc,

CreateToolhelp32Snapshot, GetCurrentThreadId, CoCreateInstance, GetCPInfo, HeapAl-

loc, LeaveCriticalSection, InterlockedDecrement, GetProcessHeap, GetVersionExA, Pro-

cess32Next, RegOpenKeyExA, GetModuleHandleW, ...

MITRE Techniques T1503, T1497.001, T1003, T1552.001, T1005, T1562, T1081, T1071, T1032, T1555,

T1106, T1497, T1562.001, T1071.001, T1012, T1552, T1082, T1089, T1057, T1555.003.

Modules Loaded BCRYPT.DLL, NTMARTA.DLL, WINHTTP.DLL, SHELL32.DLL, GDIPLUS.DLL,

CRYPT32.DLL, NTDLL.DLL, SECHOST.DLL, WININET.DLL, WS2 32.DLL,

CRYPTBASE.DLL, CFGMGR32.DLL, OLE32.DLL, ADVAPI32.DLL, GDI32.DLL,

RPCRT4.DLL, SHLWAPI.DLL, RSTRTMGR.DLL, USER32.DLL, MSVCR100.DLL,

NSI.DLL, KERNEL32.DLL ...

scoring. Bhalekar et al. (Bhalekar and Saini, 2024)

and Habaybeh and Marshall (Habaybeh and Mar-

shall, ) highlight the use of graph databases for cy-

bersecurity data analysis and legal assessments. Un-

like these approaches, MATRIX aggregates malware,

threat, and vulnerability data from multiple sources

into a uniﬁed and extensible framework, improving

research and advanced analysis capabilities.

6 CONCLUSION AND FUTURE

WORKS

We presented MATRIX, a uniﬁed graph-based frame-

work for malware and threat analysis. Built on

STIX 2.1 and integrating data from seven cyber-

security frameworks (MITRE ATT&CK, MBCProject,

CAPEC, DEF3ND, etc.), MATRIX provides a semanti-

cally consistent view of the threat landscape. MA-

TRIX is over 5x larger than mitre/cti and 25x larger

than MBCProject, and includes 10,000+ real mal-

ware samples from VirusTotal, MalwareBazaar,

and VirusShare, linked to detection rules, behaviors,

and objectives for in-depth analysis. We showcased

MATRIX’s capabilities via case studies on malware

behavior correlations, mitigation impacts, and tech-

nique inﬂuence across families. The system is de-

signed for continuous updates and future expansion.

Upcoming work will focus on integrating MATRIX

into RAG systems to support real-time analysis.

MATRIX: A Comprehensive Graph-Based Framework for Malware Analysis and Threat Research

501

ACKNOWLEDGMENTS

This work was partly supported by the HORIZON

Europe Framework Programme by the MUR PRIN-

2022-PNRR ASSISTANTS (P2022WEAH7) project

funded under the EU RESTART program.

REFERENCES

abuse.ch (2025). Malwarebazaar - malware samples and

feeds.

Bhalekar, P. M. and Saini, J. R. (2024). Comprehensive

exploration of the role of graph databases like neo4j

in cyber security. In 2024 International Conference on

Emerging Smart Computing and Informatics (ESCI),

pages 1–4. IEEE.

Bolton, J., Elluri, L., and Joshi, K. P. (2023). An overview

of cybersecurity knowledge graphs mapped to the

mitre att&ck framework domains. In 2023 IEEE In-

ternational Conference on Intelligence and Security

Informatics (ISI), pages 01–06. IEEE.

Canary, R. (2025). Threat detection report - top att&ck tech-

niques.

Chen, S., Hwang, R., Ali, A., Lin, Y., Wei, Y., and

Pai, T. (2024). Improving quality of indicators of

compromise using STIX graphs. Comput. Secur.,

144:103972.

Corporation, M. (2025a). Common Attack Pattern Enumer-

ation and Classiﬁcation (CAPEC).

Corporation, M. (2025b). MITRE ATT&CK.

Corporation, M. (2025c). Mitre cti github repository.

Corporation, M. (2025d). MITRE DEF3ND.

Elastic (2025). Elasticsearch - distributed, restful search and

analytics engine.

Fender, A., Emad, N., Petiton, S. G., Eaton, J., and Naumov,

M. (2017). Parallel jaccard and related graph clus-

tering techniques. In Alexandrov, V., Geist, A., and

Dongarra, J. J., editors, Proceedings of the 8th Work-

shop on Latest Advances in Scalable Algorithms for

Large-Scale Systems, ScalA@SC 2017, Denver, CO,

USA, November 13, 2017, pages 4:1–4:8. ACM.

for Threat-Informed Defense, C. (2025). Top 15 techniques

sightings ecosystem.

Gleich, D. F. (2015). Pagerank beyond the web. SIAM Rev.,

57(3):321–363.

Habaybeh, N. and Marshall, A. M. Towards a historic mal-

ware frequency database. Available at SSRN 4392182.

Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin,

V., Goyal, N., K

uttler, H., Lewis, M., Yih, W.-t.,

Rockt

aschel, T., et al. (2020). Retrieval-augmented

generation for knowledge-intensive nlp tasks. Ad-

vances in Neural Information Processing Systems,

33:9459–9474.

Li, H., Shi, Z., Pan, C., Zhao, D., and Sun, N. (2024). Cy-

bersecurity knowledge graphs construction and qual-

ity assessment. Complex & Intelligent Systems,

10(1):1201–1217.

Li, J., Li, J., Xie, C., Liang, Y., Qu, K., Cheng, L., and Zhao,

Z. (2023a). Pipckg-bs: A method to build cybersecu-

rity knowledge graph for blockchain systems via the

pipeline approach. Journal of Circuits, Systems and

Computers, 32(16):2350274.

Li, Z.-X., Li, Y.-J., Liu, Y.-W., Liu, C., and Zhou, N.-X.

(2023b). K-ctiaa: automatic analysis of cyber threat

intelligence based on a knowledge graph. Symmetry,

15(2):337.

Liu, Z., Sun, Z., Chen, J., Zhou, Y., Yang, T., Yang, H., and

Liu, J. (2020). Stix-based network security knowledge

graph ontology modeling method. In ICGDA 2020:

3rd International Conference on Geoinformatics and

Data Analysis, Marseille, France, April 15-17, 2020,

pages 152–157. ACM.

Mandiant (2025a). Capa rules - mandiant.

Mandiant (2025b). M-trends report.

MITRE (2025). Top 10 lists.

Morato, D., Berrueta, E., Maga

na, E., and Izal, M. (2018).

Ransomware early detection by the analysis of ﬁle

sharing trafﬁc. Journal of Network and Computer Ap-

plications, 124:14–32.

Neo4j (2021). Graphs for cybersecurity: Defending against

sophisticated attacks.

OASIS (2020). Stix version 2.1.

Pontecorvi, M. and Ramachandran, V. (2015). A faster

algorithm for fully dynamic betweenness centrality.

CoRR, abs/1506.05783.

Project, M. (2025a). Exploit mapping to maec.

Project, M. (2025b). Malware behavior catalog stix reposi-

tory.

Rapid7 (2025). Metasploit framework.

Reading, D. (2021). Picking the right database tech for cy-

bersecurity defense.

Ren, Y., Xiao, Y., Zhou, Y., Zhang, Z., and Tian, Z. (2022).

Cskg4apt: A cybersecurity knowledge graph for ad-

vanced persistent threat organization attribution. IEEE

Transactions on Knowledge and Data Engineering.

Security, P. (2025). The top ten mitre att&ck techniques.

Shaddy43 (2025). Emotet malware analysis.

Sheikhalishahi, M., Saracino, A., Martinelli, F., and Marra,

A. L. (2022). Privacy preserving data sharing and

analysis for edge-based architectures. Int. J. Inf. Sec.,

21(1):79–101.

SigmaHQ (2025). Sigma rules - generic signature format

for siem systems.

Sikos, L. F. (2023). Cybersecurity knowledge graphs.

Knowledge and Information Systems, 65(9):3511–

3531.

VirusShare.com (2025). Virusshare - collection of malware

samples.

VirusTotal (2025). Virustotal - free online virus, malware

and url scanner.

Wang, W., Zhou, H., Li, K., Tu, Z., and Liu, F. (2021).

Cyber-attack behavior knowledge graph based on

capec and cwe towards 6g. In International Sym-

posium on Mobile Internet Security, pages 352–364.

Springer.

SECRYPT 2025 - 22nd International Conference on Security and Cryptography

502