A Multi-Model Approach to Enhance Automatic Matching

of Vulnerabilities to Attack Patterns

Marine Sauze-Kadar and Thomas Loubier

Univ. Grenoble Alpes, CEA, LETI MINATEC Campus, F-38054 Grenoble, France

Keywords:

Security Knowledge Database, CAPEC, CVE, Vulnerability Assessment, Test Automation, LLM.

Abstract:

Security knowledge databases represent key information in the process of vulnerability assessment and test

automation of industrial products. The CVE and CAPEC databases respectively describe vulnerabilities and

attack patterns. Linking a CVE entry to CAPEC can facilitate the generation of test plans, in the context

of product test automation. Unfortunately, the great majority of CVE have no direct references to CAPEC.

Several research works have focused on matching automatically CVE and CAPEC by computing text sim-

ilarity on their descriptions, evaluating various models, in particular the term frequency inverse document

frequency (TF-IDF) technique and transformer-based models such as SBERT. Depending on CVE description

characteristics and evaluation criteria, these models are likely to perform differently by capturing different

information types: vocabulary, preprocessing methods, context around words, etc. Hence, we propose a new

classiﬁer-based approach to select the most adapted similarity computation model from a given selection to

match a CVE description with linked CAPEC descriptions. We evaluate this method on a recent set of CVE

with CAPEC labels and show an improvement of matching accuracy compared to state-of-the-art methods

leveraging a single model to compute text similarity. Our results also highlight the bias in the training and test

set of CVE-CAPEC pairs.

1 INTRODUCTION

The common vulnerability exposure

(CVE) (MITRE.org, 2024), and common at-

tack pattern enumeration and classiﬁcation

(CAPEC) (MITRE.org, 2023) databases are in-

terconnected resources (Valence, 2023) that provide

comprehensive insights into vulnerabilities, their

causes, and potential exploitation methods. To help

in reproducing vulnerabilities, the CAPEC database

references common attack patterns and provides

detailed insights on potential exploits. Every CAPEC

has an “Execution Flow” section, which describes

how to Explore, Experiment, and Exploit a vulnera-

bility. Linking CVE to CAPEC is not systematically

done and is essential to automate the identiﬁcation

and management of vulnerabilities, i.e. with the

objective for an organization to enhance its ability to

prioritize and mitigate threats effectively. However,

these security databases contain many entries (i.e.

more than 270,000 CVE (NIST, 2025) and more

than 500 CAPEC (MITRE.org, 2023)). Thus, the

manual mapping between CVE and CAPEC can

be time-consuming and error prone for product

developers and testers.

Recent research has demonstrated the ability of

Large Language Models (LLM) to solve Natural Lan-

guage processing (NLP) tasks by learning complex,

hierarchical representations of language data (Naveed

et al., 2023). For example, LLM have been deployed

in common vulnerability scoring system (CVSS) to

assist CVE scoring during CVE registration (Shahid

and Debar, 2021). More speciﬁcally, many research

works (Kanakogi et al., 2021b; Kuppa et al., 2021;

Das et al., 2022; ?; Bonomi et al., 2025) pro-

pose text similarity computation methods based on

transformer-based models like SBERT (sbert.net, ) to

identify correlations between CVE and CAPEC de-

scriptions.

Since the mapping of CVE to CAPEC is not

straightforward for the majority of CVE, as explored

by Kanakogi et al. (Kanakogi et al., 2021a) and dis-

cussed in ICAR (Valence, 2023); it is difﬁcult to eval-

uate the ability of a method to accurately identify

matching CVE and CAPEC. Thus, the evaluation cor-

pus must be comprehensive and diverse enough to ad-

equately reﬂect real-world scenarios where the frame-

work will operate.

658

Sauze-Kadar, M., Loubier and T.

A Multi-Model Approach to Enhance Automatic Matching of Vulnerabilities to Attack Patterns.

DOI: 10.5220/0013555900003979

In Proceedings of the 22nd International Conference on Security and Cryptography (SECRYPT 2025), pages 658-665

ISBN: 978-989-758-760-3; ISSN: 2184-7711

Consequently, matching vulnerabilities to attack

patterns implies the following challenges:

• Building a large and relevant corpus of labelled

CVE and CAPEC, considering a good coverage

of CAPEC and CVE is difﬁcult. The manual cre-

ation of a corpus is time consuming and error-

prone.

• State-of-the-art models to match CVE and

CAPEC descriptions do not perform uniformly

well across different CVE and CAPEC pairs to

match. The relevance and applicability of a model

potentially varies depending on the CVE charac-

teristics. Thus, adapting the model to CVE repre-

sents a promising approach to improve CVE and

CAPEC descriptions matching accuracy.

Our contributions consist of the followings:

• A novel multi-model framework, which com-

bines several Large Language Models (LLM) and

Term Frequency-Inverse Document Frequency

(TF-IDF) models to compute similarity between

CAPEC and CVE descriptions. We show how

our multi-model based method is able to im-

prove CAPEC and CVE matching compared to

approaches using a single model. To the best

of our knowledge, this is the ﬁrst work to pro-

pose a framework combining several models

to adapt CVE-CAPEC matching according to

CVE description characteristics.

• The exploitation of a large evaluation corpus of

CVE and CAPEC, composed of 6,167 pairs of

CVE-CAPEC. We build it automatically with re-

cent CVE from 2021 to 2024, which have CAPEC

labels through the capecId reference.

The remainder of this paper is structured as fol-

lows: Section 2 provides an overview of related work,

while Section 3 introduces our multi-model method

for CVE and CAPEC matching. In Section 4 we eval-

uate the performance of the proposed framework on a

newly introduced large corpus of recent CVE. Finally,

Section 5 concludes this work.

2 RELATED WORK

Table 1 summarizes literature research in the ﬁeld of

security databases mapping. We compare the differ-

ent works with the following criteria:

• the approach to match text descriptions;

• the type of databases they apply to;

• the corpus size used for the evaluation.

Kanakogi et al. (Kanakogi et al., 2021a; Kanakogi

et al., 2021b; Kanakogi et al., 2022) compare several

LLM and TF-IDF models to match CVE and CAPEC

by computing similarity of text descriptions. They

suggest to use ensemble learning in future work to en-

hance matching accuracy (Kanakogi et al., 2021b), by

adapting the matching model to CVE characteristics.

Our multi-model method leverages the same simi-

larity computation models they describe (Kanakogi

et al., 2021a). But to the best of our knowledge, this

is the ﬁrst work to introduce and evaluate a CVE-

adaptive approach for matching CVE and CAPEC

descriptions. The authors deﬁne a corpus of 58 pairs

of CVE and CAPEC based on the information con-

tained under the CAPEC description of Examples In-

stances section. The corpus is rather small and data

coverage is low compared to the corpus we introduce:

• 55 CVE: this represents less than 1% of the

275, 000 CVE registered between 1999 and 2024.

• 40 CAPEC: this represents 7% of the 556 CAPEC

available at the time of writing (2025).

Pan et al. (Pan et al., 2023) propose a new ap-

proach called Weighted-SBERT (WSBERT), which

is also based on text similarity calculation to match

CVE to CAPEC. Using a pretrained BERT-BiLSTM-

CRF NER model, they assign more weight to speciﬁc

words in order to focus on critical information. They

use a corpus of 63 pairs of CVE and CAPEC, compar-

ing their method to TF-IDF and usual SBERT models.

Their framework demonstrates a stronger generaliza-

tion and an ability to catch representative keywords

with maximum semantic contribution, especially for

long descriptions.

Das et al. introduce two frameworks for match-

ing CVE to CAPEC. Firstly, they present V2W-

BERT (Das et al., 2021), a transformer-based model

designed speciﬁcally for mapping CVE to CWE. This

model leverages the bidirectional encoding capabil-

ity of BERT. Secondly, Das et al. propose VWC-

MAP (Das et al., 2022) for mapping CWE to CAPEC

leveraging text descriptions. For CWE to CAPEC

mapping, they use a Text-to-Text model (Google

T5 (Raffel et al., 2023)) for learning the relationships;

i.e. given a CWE description as input, the model gen-

erates a text description, and ﬁnds the closest match

within CAPECs. Both works propose a complete

framework for linking CVE to CAPEC via CWE.

Bonomi et al. (Bonomi et al., 2025) enrich the

input data of their matching framework compared

to (Kanakogi et al., 2021b), by taking more CAPEC

information into consideration: CAPEC name, de-

scription, attack execution ﬂow, mitigations, prereq-

uisites, and resources. They also enlarge the initial

corpus of Kanakogi et al. with:

A Multi-Model Approach to Enhance Automatic Matching of Vulnerabilities to Attack Patterns

659

Table 1: Comparison of works in the ﬁeld of security databases mapping.

Work Approach Mapping Corpus size

(Kanakogi et al., 2021a) TF-IDF / SBERT CVE → CAPEC 58

(Kuppa et al., 2021) MLTC CVE → MITRE ATT&CK 200

(Shahid and Debar, 2021) BERT CVE → CVSS 45,926

(Das et al., 2022) V2W-BERT / Google T5 CVE → CWE →CAPEC 170,000

(Pan et al., 2023) Weighted-SBERT / SBERT / TF-IDF CVE → CAPEC 63

(Bonomi et al., 2025) ATTACKBERT Hyb & SBERT Hyb CVE → CAPEC 223

This work

Multi-model combination

CVE → CAPEC 6,167

(TF-IDF and SBERT)

• 60 representative pairs of CVE and CAPEC, using

7 CVE selected per year from 1999 to 2024;

• 223 CVE and CAPEC pairs, including 160 CVE

and 118 CAPEC.

They use a hybrid approach using both SBERT and

ATTACKBERT including keywords search, addition-

ally to the similarity comparison, called SBERT Hyb

and ATTACKBERT Hyb. They compare their re-

sults to the methodology adopted by Kanakogi et

al. (Kanakogi et al., 2022): this approach improves

the recall of the traditional SBERT models by being

able to catch basic semantic similarity. It is particu-

larly efﬁcient to capture words context, compared to

the approach by Kanakogi et al. They also propose

to use keywords for ﬁltering, as well as other CAPEC

information.

Kuppa et al. (Kuppa et al., 2021) propose a Multi-

Label Text Classiﬁcation (MLTC) method for map-

ping CVE to ATT&CK (MITRE.org, 2023) tech-

niques, using textual descriptions. Their framework

is based on a multi-head joint embedding neural net-

work architecture. Due to the lack of link between

these 2 security databases, as demonstrated in (Va-

lence, 2023), they propose a new unsupervised label-

ing technique, using the spaCy (spaCy, 2025) python

library to extract context words around a CVE and

ATT&CK descriptions. They create a labelled knowl-

edge base of 200 CVE found in threat reports: it is

composed of 150 attack scenarios exploiting vulner-

abilities and 50 mitigation strategies to enrich CVE

descriptions, covering 17 techniques. It helps learn-

ing attacker and defender views of a given CVE.

Shahid et Debar (Shahid and Debar, 2021) lever-

age recent advances in NLP to determine the CVSS

vector and the associated severity score of a vulnera-

bility. They trained BERT classiﬁers on 45,000 CVE

and split train and test data to a 50% ratio. They prove

that NLP applied to security databases could help de-

velopers and cybersecurity experts in better under-

standing CVE criticality.

Figure 1: CVE-CAPEC matching framework architecture.

3 MULTI-MODEL CVE-CAPEC

MATCHING FRAMEWORK

This section describes the multi-model framework for

matching CVE and CAPEC descriptions. Section 3.1

provides an overview of the framework architecture,

while Section 3.2 and Section 3.3 respectively fo-

cus on the matching model selection process and the

matching engine, which are the main parts of the sys-

tem.

3.1 Framework Architecture

The goal of the framework is to improve the state-

of-the-art accuracy to match CAPEC to CVE. The

matching process is described on Figure 1. We fol-

low an approach similar to the one of Kanakogi et

al. (Kanakogi et al., 2021a); the basic idea is to ap-

ply text similarity computation models to match CVE

and CAPEC text descriptions.

While previous works focus on evaluating various

similarity computation models such as TF-IDF and

LLM, we propose a multi-model approach to combine

several state-of-the-art similarity computation models

to match CVE with CAPEC.The intuitive motivation

for combining different models is to take advantage of

each model’s matching ability, since different models

seem to perform differently depending on the input

CVE description. Hence, we introduce a classiﬁer

based model selector to select the model to be ap-

SECRYPT 2025 - 22nd International Conference on Security and Cryptography

660

Table 2: Preprocessing types.

Preprocessing type Description

No preprocessing Nothing is changed in the

original description.

Basic preprocessing Filtering of stop words

(“the”, “of”, etc.) and

punctuation.

Basic preprocessing

and Lemmatize

Converts word into base

form which considers the

context.

Basic preprocessing

and Stemming

Single words are replaced

by root words. This is

the preprocessing option

which induces the smallest

vocabulary.

plied for a given CVE description to identify linked

CAPEC leveraging inherent CVE characteristics: this

is the main contribution of this work.

The selected model is a parameter provided as in-

put, together with the corresponding CVE descrip-

tion and all CAPEC descriptions of the corpus to the

matching engine. The matching engine computes text

similarity between the CVE and each CAPEC de-

scriptions. It returns the list of CAPEC of the input

corpus, which are ranked by decreasing correlation.

The framework returns the N best CAPEC descrip-

tions (i.e. with lowest rank), with N ∈ N

∗

a ﬁxed pa-

rameter set by the user, considering the trade-off be-

tween output size and matching accuracy. Logically,

the higher N (i.e. the more CAPEC in the output set),

the higher the matching accuracy.

3.2 CVE-CAPEC Matching Model

Selection from CVE Description

3.2.1 Preprocessing

We apply preprocessing on the input CVE descrip-

tion before it is manipulated by the model selector.

First, we manually remove from the text description

words which we consider useless in our context: e.g.

“attack”, “adversary”, “moreover”, etc. Second, we

lemmatize the description (Table 2); i.e. words are

replaced by their base form.

We use non-normalized embeddings to represent

preprocessed CVE descriptions; i.e. each description

corresponds to a vector of the vocabulary size, where

each feature corresponds to the count of word occur-

rences in the description. The vocabulary is deﬁned

on the training set: words in the test set with no oc-

currence in the training set are ignored.

Figure 2: Classiﬁer-based model selector training process.

3.2.2 Model Selector

The model selector takes a preprocessed CVE de-

scription as input. It returns the corresponding pre-

dicted label, which is the type of matching model to

select.

For the training process, we deﬁne the label as the

best model in the selection to match the input CVE

description with linked CAPEC descriptions. Labels

(i.e. the selected model for each CVE description) are

deﬁned in the training corpus as follows:

• For each matching model in scope, we compute

the ranks of correct matching CAPEC for each

CVE-CAPEC pair in the corpus.

• We set the selected model as the model inducing

the lowest rank for the correct CAPEC among all

matching models in scope.

The model selector is a random-forest based clas-

siﬁer, which we train following the process illustrated

on Figure 2. Since the vocabulary is rather large, we

apply dimensions reduction to reduce the number of

features; we select features according to the k-nearest

neighbors. We deﬁne parameters empirically, for the

optimization method (chi2) and the number of neigh-

bors k to consider (Section 4.2).

3.3 CVE-CAPEC Matching Engine

The multi-model framework leverages 10 text similar-

ity computation models to match CVE with CAPEC.

Each one of these 10 models corresponds to a speciﬁc

label type of the modef selector. These 10 matching

models can be divided in two main categories:

• Large Language Models (LLM), applied on

text description without preprocessing. We

test the 6 following LLM pretrained sentence

transformers models, available from SBERT

website (sbert.net, ):

– all-mpnet-base-v2

– paraphrase-albert-small-v2

A Multi-Model Approach to Enhance Automatic Matching of Vulnerabilities to Attack Patterns

661

– paraphrase-multilingual-mpnet-base-v2

– multi-qa-mpnet-base-dot-v1

– all-MiniLM-L12-v2

– all-distilroberta-v1

• Term Frequency–Inverse Document Fre-

quency (TF-IDF), was the method showing the

best recall in previous work (Kanakogi et al.,

2021a). We test 4 conﬁgurations applying

TF-IDF on text description after preprocessing

(Table 2):

– no preprocessing and TF-IDF;

– basic preprocessing and TF-IDF;

– basic preprocessing, lemmatize, and TF-IDF;

– basic preprocessing, stemming, and TF-IDF.

Figure 3: CVE count by year for evaluation corpuses.

4 EVALUATION OF THE

MULTI-MODEL MATCHING

FRAMEWORK

We describe the evaluation corpuses and metrics in

Section 4.1. Experimental results related to CVE and

CAPEC matching are in Section 4.2.

4.1 Experimental Setup

4.1.1 CAPEC and CVE Corpuses

We consider two corpuses of matching CAPEC and

CVE for the evaluation:

• Kanakogi et al. Database (Kanakogi et al.,

2021a) (C

Kan

): this corpus consists of 58 pairs

of matching CAPEC and CVE (i.e. card(C

Kan

) =

58), which were set by Kanakogi et al. from the

ﬁeld “CVE examples” in the information of cer-

tain CAPEC. This corpus is not adapted to train

the classiﬁers deﬁned in Section 3 because of its

small size. Thus, we use it as a test database for

comparison with the state of the art.

• all labelled CVE (C

Lab

): A new ﬁeld “CAPEC

ID” appears in CVE database from 2021 (accord-

ing to CVE published date). Since then, many

CVE have been labelled with CAPEC. We use this

corpus for training classiﬁers since it is much big-

ger than C

Kan

, i.e. card(C

Lab

) = 6, 167. The goal

is to assess the relevance of our multi-model ap-

proach on a larger scale.

Note that the corpus C

Kan

was built in 2021 and there

is no overlap between both corpuses. Figure 3 shows

the distribution of CVE by year for each case. Within

the database of CVE (which contains 241, 589 CVE

at the time of writing), only 6, 037 CVE have CAPEC

labels. In the set of CVE in the corpus C

Lab

, we con-

sider CVE with links to CAPEC which are neither ob-

solete nor deprecated: this represents a total of less

than 2.6% of the CVE database. In both corpuses,

the relation between CAPEC and CVE is many-to-

many. Considering all CAPEC and CVE pairs, the

corpus C

Lab

corresponds to 6, 167 pairs. In C

Lab

, we

note a maximum of 19 CAPEC linked to a single

CVE.

We split the corpus C

Lab

into training C

Lab

train

and test C

Lab

test

set with a ration 80%-20%, so that

card(C

Lab

train

) = 4,934 and C

Lab

test

= 1, 233. We split

CVE randomly between two sets so that we do not in-

duce an additional bias in the training and test phases,

i.e. based on CVE publication year for example.

The vocabulary used to represent preprocessed de-

scriptions is deﬁned over the training set C

Lab

train

. For

the test set, unknown words in the descriptions are ig-

nored. Table 3 shows the relation between vocabulary

size and preprocessing method.

Table 3: Vocabulary size (on C

Lab

train

)

Preprocessing type Words count

No preprocessing 11,426 (100%)

Basic preprocessing 11,299 (99%)

Basic preprocessing and

lemmatize

10,693 (94%)

Basic preprocessing and

stemming

9,464 (83%)

4.1.2 Evaluation Metrics

We use several metrics to test our multi-model ap-

proach. First, we consider the prediction accuracy to

evaluate the ability of the classiﬁer-based model se-

lector to make correct predictions. For a given set of

label predictions and corresponding actual labels, the

SECRYPT 2025 - 22nd International Conference on Security and Cryptography

662

prediction accuracy is the number of predictions that

exactly match the corresponding actual labels. We

show the normalized ratio of prediction accuracy so

that:

prediction accuracy =

count of all correct predictions

count of all labels

(1)

Second, following the work of Kanakogi et

al. (Kanakogi et al., 2021a), we leverage recall@n

metric to evaluate the matching accuracy of the ap-

proach over a test corpus. Intuitively, recall@n rep-

resents the probability of having the correct matching

CAPEC in the ﬁnal output set of n CAPEC returned

by the multi-model framework.

4.2 Experimental Results

4.2.1 Prediction Accuracy of the Model Selector

As described in Section 3.2.2, we apply k-nearest

neighbors-based features selection to reduce the num-

ber of features of the random-forest classiﬁer from the

vocabulary size to k. Hence we test various k values

between 1 and 9,464, which is the vocabulary size

after preprocessing (Table 3); results are shown in Ta-

ble 4. In the remainder of the experiments, we set

k = 1, 500, which corresponds to the best prediction

accuracy observed on the test corpuses C

Lab

test

and C

Kan

The prediction accuracy is very high on the train-

ing corpus (higher than 95% compared to the results

on test corpuses (< 65%), with a tendency to over-

ﬁtting for high values of k. A random prediction ac-

curacy would be 10%, since the classiﬁer deﬁnes 10

classes corresponding to the 10 matching models of

the framework. In that regard, prediction accuracy on

both test corpuses C

Lab

test

and C

Kan

is above this random

reference in most of the conﬁgurations for k ≥ 100.

However, the model selector is much more accurate

on the test corpus we introduced, C

Lab

test

, than on the

state-of-the-art corpus C

Kan

Table 4: Prediction accuracy for the classiﬁer-based model

selector for various dimension reduction conﬁgurations (k

features).

Prediction accuracy

Lab

train

Lab

test

Kan

20 0.62 0.61 0.07

50 0.82 0.61 0.07

100 0.90 0.61 0.14

500 0.96 0.63 0.19

1, 000 0.96 0.64 0.12

1, 500 0.97 0.64 0.22

2, 000 0.97 0.64 0.19

2, 500 0.97 0.64 0.12

3, 000 0.97 0.64 0.17

4.2.2 Matching Accuracy of the Multi-Model

Approach

As described in Section 3.1, we propose a multi-

model based CVE-CAPEC matching approach, be-

cause different models perform differently depend-

ing on the pair of CVE and CAPEC to match. The

recall@n results for both C

Lab

test

and C

Kan

corpuses are

shown respectively on Figure 4 and on Figure 5. A

focus on the results for C

Lab

test

is provided in Table 5.

Our results validate the initial intuition of improv-

ing matching by combining various matching models:

for both corpuses, the best case (ideal models combi-

nation, in red) shows a signiﬁcantly higher recall@n

rate compared to every tested single model.

Figure 4: recall@n as a function of n for C

Lab

test

A Multi-Model Approach to Enhance Automatic Matching of Vulnerabilities to Attack Patterns

663

Table 5: recall@n for different models on C

Lab

test

corpus.

recall@n

all-MiniLM-L12-v2 multi-model based Ideal models

method combination

1 0.37 0.44 0.52

2 0.45 0.51 0.58

3 0.48 0.54 0.62

10 0.59 0.63 0.72

20 0.65 0.71 0.79

50 0.76 0.85 0.92

100 0.86 0.91 0.96

200 0.94 0.95 0.99

300 0.97 0.97 1.00

Figure 5: recall@n as a function of n for C

Kan

After training the multi-model matching system

on C

Lab

train

, we show that it performs well on C

Lab

test

(i.e.

recall@n is higher than for any of the single models),

but poorly on C

Kan

: as seen on Figure 5, the model

based on stemming and TF-IDF (as described in Sec-

tion 3.3) performs comparatively better. As shown in

Table 4, the reason for the difference of performance

between both corpuses is caused by the poor predic-

tion accuracy of the model selector (22% of accuracy

for C

Kan

, 64% for C

Lab

test

). The degradation of inference

for C

Kan

means that there is a bias in the training cor-

pus (C

Lab

train

) or in the test corpus (C

Kan

), i.e. the CVE

in C

Kan

are different from the CVE in C

Lab

train

, which

are used to train the classiﬁer of the model selector.

5 CONCLUSION AND FUTURE

WORK

We proposed a new classiﬁer-based multi-model ap-

proach to select the most adapted similarity computa-

tion model from a given selection to match a CVE de-

scription with linked CAPEC descriptions. We evalu-

ated the multi-model method on a recent set of CVE

with CAPEC labels. We leveraged 10 models, from

various LLM models and TF-IDF computation using

several preprocessing approaches. We tested match-

ing accuracy for every single model and our multi-

model method. Our results show an improvement of

matching accuracy compared to state-of-the-art meth-

ods leveraging a single model to compute text similar-

ity: considering a number of 20 matching CAPEC,

the correct CAPEC is in the output set in 71% of

cases, 65% in the best case using one single model.

Our results also highlight the bias in the training and

test sets of CVE-CAPEC pairs: because of the poor

accuracy of the classiﬁer to select the model accord-

ing to the CVE description, the matching accuracy de-

creases globally on a different state-of-the-art small

test corpus introduced by Kanakogi et al.

A ﬁrst open problem to address in future work

is how to build a representative evaluation corpus

of matching CVE and CAPEC, so that the multi-

model framework is generic enough to be applied

SECRYPT 2025 - 22nd International Conference on Security and Cryptography

664

on any CVE description. Building a representative

corpus would require to understand biases in current

databases in use. Second, in our current multi-model

approach, we manually ﬁx the number of best match-

ing CAPEC to be returned by the framework. We pro-

pose to investigate how to reduce and adapt this num-

ber, ﬁltering irrelevant CAPEC descriptions using

meta information from CAPEC and CVE databases

(e.g. abstraction level, keywords, etc.). Third, it

would be interesting to study the practical integra-

tion of such CVE-CAPEC matching tool, which po-

tentially induces matching errors, in a fully automated

test generation process towards full test automation of

industrial products.

ACKNOWLEDGEMENTS

The authors wish to express their gratitute to Raphael

Collado, Maxime Lecomte, Ulysse Vincenti, Victor

Breux, and Lalie Arnoud for the helpful discussions

and to the anonymous reviewers for their useful com-

ments. The work presented in this paper was funded

by the “France 2030” government investment plan

managed by the French National Research Agency,

under the reference “ANR-22-PECY-0005”.

REFERENCES

Bonomi, S., Ciavotta, A., Lenti, S., and Palma, A. (2025).

Beyond the Surface: An NLP-based Methodology to

Automatically Estimate CVE Relevance for CAPEC

Attack Patterns. arxiv (https://arxiv.org/abs/2501.

07131), Version Number: 1.

Das, S. S., Dutta, A., Purohit, S., Serra, E., Halappanavar,

M., and Pothen, A. (2022). Towards automatic map-

ping of vulnerabilities to attack patterns using large

language models. In IEEE International Symposium

on Technologies for Homeland Security (HST).

Das, S. S., Serra, E., Halappanavar, M., Pothen, A., and Al-

Shaer, E. (2021). V2w-bert: A framework for effec-

tive hierarchical multiclass classiﬁcation of software

vulnerabilities. https://arxiv.org/abs/2102.11498.

Kanakogi, K., Washizaki, H., Fukazawa, Y., Ogata, S.,

Okubo, T., Kato, T., Kanuka, H., Hazeyama, A., and

Yoshioka, N. (2021a). Tracing CAPEC Attack Pat-

terns from CVE Vulnerability Information using Nat-

ural Language Processing Technique.

Kanakogi, K., Washizaki, H., Fukazawa, Y., Ogata, S.,

Okubo, T., Kato, T., Kanuka, H., Hazeyama, A., and

Yoshioka, N. (2021b). Tracing CVE Vulnerability In-

formation to CAPEC Attack Patterns Using Natural

Language Processing Techniques. volume 12, page

298.

Kanakogi, K., Washizaki, H., Fukazawa, Y., Ogata, S.,

Okubo, T., Kato, T., Kanuka, H., Hazeyama, A., and

Yoshioka, N. (2022). Comparative evaluation of nlp-

based approaches for linking capec attack patterns

from cve vulnerability information. In Applied Sci-

ences.

Kuppa, A., Aouad, L., and Le-Khac, N.-A. (2021). Linking

CVE’s to MITRE ATT&CK Techniques. In Proceed-

ings of the 16th International Conference on Avail-

ability, Reliability and Security, pages 1–12, Vienna

Austria. ACM.

MITRE.org (2023). CAPEC - Common Attack Pattern

Enumeration and Classiﬁcation (CAPEC™). https://

capec.mitre.org/.

MITRE.org (2024). MITRE ATT&CK®. https://attack.

mitre.org/.

Naveed, H., Khan, A. U., Qiu, S., Saqib, M., Anwar, S., Us-

man, M., Akhtar, N., Barnes, N., and Mian, A. (2023).

A Comprehensive Overview of Large Language Mod-

els. arXiv (https://arxiv.org/abs/2307.06435), Version

Number: 10.

NIST (2025). NVD - Vulnerabilities. https://nvd.nist.gov/

vuln.

Pan, M., Li, B., Zou, Y., Yang, W., Wang, D., and Zhang,

T. (2023). Tracing Vulnerability to Attack Patterns

Using Text Similarity. In 2023 3rd International

Conference on Electronic Information Engineering

and Computer Science (EIECS), pages 1330–1334,

Changchun, China. IEEE.

Raffel, C., Shazeer, N., Roberts, A., Lee, K., Narang, S.,

Matena, M., Zhou, Y., Li, W., and Liu, P. J. (2023).

Exploring the Limits of Transfer Learning with a Uni-

ﬁed Text-to-Text Transformer. arXiv:1910.10683 [cs]

(http://arxiv.org/abs/1910.10683).

sbert.net. Pretrained Models — Sentence Transformers doc-

umentation. https://www.sbert.net/docs/sentence tra

nsformer/pretrained models.html.

Shahid, M. and Debar, H. (2021). CVSS-BERT: Explain-

able Natural Language Processing to Determine the

Severity of a Computer Security Vulnerability from its

Description. arxiv (https://arxiv.org/abs/2111.08510)

Version Number: 1.

spaCy (2025). spaCy · Industrial-strength Natural Language

Processing in Python. https://spacy.io/.

Valence, A. (2023). ICAR, a categorical framework to con-

nect vulnerability, threat and asset managements.

A Multi-Model Approach to Enhance Automatic Matching of Vulnerabilities to Attack Patterns

665