On the Formal Robustness Evaluation for AI-based Industrial Systems

Mohamed Ibn Khedher

, Afef Awadid

, Augustin Lemesle

and Zakaria Chihani

IRT - SystemX, 2 Bd Thomas Gibert, 91120 Palaiseau, France

CEA, The French Alternative Energies and Atomic Energy Commission, France

{augustin.lemesle, zakaria.chihani}@cea.fr

Keywords:

Uncertainty in AI, AI Veriﬁcation, AI Robustness, Adversarial Attacks, Formal Evaluation, Industrial

Application.

Abstract:

The paper introduces a three-stage evaluation pipeline for ensuring the robustness of AI models, particularly

neural networks, against adversarial attacks. The ﬁrst stage involves formal evaluation, which may not always

be feasible. For such cases, the second stage focuses on evaluating the model’s robustness against intelligent

adversarial attacks. If the model proves vulnerable, the third stage proposes techniques to improve its robust-

ness. The paper outlines the details of each stage and the proposed solutions. Moreover, the proposal aims

to help developers build reliable and trustworthy AI systems that can operate effectively in critical domains,

where the use of AI models can pose signiﬁcant risks to human safety.

1 INTRODUCTION

Over the last decade, there has been a signiﬁcant

advancement in Artiﬁcial Intelligence (AI) and, no-

tably, Machine Learning (ML) has shown remarkable

progress in various critical tasks. Speciﬁcally, Deep

Neural Networks (DNN) have played a transforma-

tive role in machine learning, demonstrating excep-

tional performance in complex applications such as

cybersecurity (Jmila and Khedher, 2022) and robotics

(Khedher et al., 2021).

Despite the capacity of Deep Neural Networks to

handle high-dimensional inputs and address complex

challenges in critical applications, recent evidence in-

dicates that small perturbations in the input space

can lead to incorrect decisions (Bunel et al., 2018).

Speciﬁcally, it has been observed that DNNs can be

easily misled, causing their predictions to change with

slight modiﬁcations to the inputs. These carefully

chosen modiﬁcations result in what are known as ad-

versarial examples. These discoveries underscore the

critical challenge of ensuring that machine learning

systems, especially deep neural networks, function as

intended when confronted with perturbed inputs.

Adversarial examples are specially crafted inputs

that are designed to fool a machine learning model

into making a wrong prediction. These examples are

not randomly generated but created with precise cal-

culations. There are various methods for generating

adversarial examples, but most of them focus on min-

imizing the difference between the distorted input and

the original one while ensuring the prediction is in-

correct. Some techniques require access to the en-

tire classiﬁer model (white-box attacks), while others

only need the prediction function (black-box attacks).

Adversarial attacks pose a signiﬁcant threat to

critical industrial applications, particularly in sectors

such as manufacturing, energy, and infrastructure,

where precision and reliability are paramount. These

attacks, carefully crafted to exploit vulnerabilities in

machine learning models, introduce subtle modiﬁca-

tions to input data. In critical industrial processes, the

consequences of misclassiﬁcation or data manipula-

tion by adversarial attacks can result in operational

failures, compromised safety, and potentially catas-

trophic outcomes.

To illustrate the severity of adversarial attacks in

crucial applications like anomaly detection in the cy-

bersecurity domain, consider Figure 1. An attacker,

possessing malicious trafﬁc, can manipulate the traf-

ﬁc by adding imperceptible perturbations, making it

appear benign to the cybersecurity system, allowing

it to pass undetected. Such attacks can severely com-

promise the system’s ability to identify and mitigate

threats, posing signiﬁcant security risks.

In this paper, we recommend a three-stage

pipeline (Khedher et al., 2023) to industrialists to in-

vestigate the robustness of their models and, if possi-

Khedher, M., Awadid, A., Lemesle, A. and Chihani, Z.

On the Formal Robustness Evaluation for AI-based Industrial Systems.

DOI: 10.5220/0012618100003645

Paper published under CC license (CC BY-NC-ND 4.0)

In Proceedings of the 12th International Conference on Model-Based Software and Systems Engineering (MODELSWARD 2024), pages 311-321

ISBN: 978-989-758-682-8; ISSN: 2184-4348

311

Figure 1: Adversarial attack generation in Network Intrusion Detection System (NIDS) (Jmila and Khedher, 2022).

An adverse sample is generated by adding a small perturbation to the original sample. Thus, malicious perturbed

trafﬁc can be misclassiﬁed as benign and thus bypass the intrusion detection system. This can have serious

consequences for the system (Jmila and Khedher, 2022).

ble, obtain a formal guarantee of robustness. The ﬁrst

stage is formal evaluation, which consists of evalu-

ating all possible inputs of the model and formally

verifying its output. This stage is very complex, or

even impossible, if the model is complex or the data

is of very high dimensionality. In this last case, we

propose a second stage that consists of evaluating the

robustness of the AI model against a set of adversarial

attacks. If the model fails against adversarial attacks,

we recommend the third stage, which consists of im-

proving robustness using techniques called defenses

or adversarial training.

In the rest of the paper, in section 2, the need of

Formal methods for AI-based critical systems is de-

tailed. Then, in sections 3, 4 and 5, we describe each

of the three stages of robustness evaluation pipeline

and describe the popular solutions proposed in the

state of the art. Finally, the section 6 concludes the

paper.

2 THE NEED FOR

TRUSTWORTHY AI-BASED

CRITICAL SYSTEMS

The use of Artiﬁcial Intelligence techniques is be-

coming increasingly popular in various applications

(Miglani and Kumar, 2019), as the technologies ma-

ture and become more affordable (Boardman and

Butcher, 2019). These techniques can be physically

embodied as in the case of safety-critical systems

such as electricity grids or on-board aircraft networks

or exist only as software agents that autonomously

process data at speeds or for durations that humans

are not capable of.

Applying AI techniques can confer a compet-

itive advantage to the industry by providing both

high value-added products and services and support

to decision-makers (Mattioli et al., 2023). In this

sense, production efﬁciency, product quality, and ser-

vice level will be improved by artiﬁcial intelligence

(Li et al., 2017). However, while AI has much poten-

tial for innovative applications (advanced automation

and autonomous systems), it raises several concerns

such as security and safety (El-Sherif et al., 2022).

These concerns are even more salient when it comes

to AI-based critical systems, where the integration of

AI technologies can pose signiﬁcant risks to human

safety.

AI-based critical systems are deﬁned as those sys-

tems containing AI-based components alongside tra-

ditional software components and whose failure leads

to unacceptable circumstances such as loss of hu-

man lives (Mwadulo, 2016). Such systems are not

only safety-critical but also complex. Consequently,

they impose strict certiﬁcation requirements and high

safety standards (Ferrari and Beek, 2022). In this

sense, the integration of AI technologies in those sys-

tems is constrained by the need for veriﬁcation, ex-

tensive testing, and certiﬁcation.

When employing veriﬁcation in the engineering

process, trust in the safety of the resulting system will

to a large extent be deduced from trust in the inte-

grated AI/ ML models. Therefore, such models need

to be robust and the results of the veriﬁcation process

need to be trustworthy. To achieve the highest lev-

els of trustworthiness, one might proceed by formally

verifying the robustness of AI/ ML models. Indeed,

formal robustness evaluation helps developers build

reliable and trustworthy AI systems that can operate

effectively in critical domains. Thus, it is essential for

AI safety-critical systems for several reasons includ-

ing:

• Identifying vulnerabilities: AI systems are sus-

ceptible to various vulnerabilities, including ad-

versarial attacks, data distribution shifts, and

model biases. Formal robustness evaluation helps

identify these vulnerabilities by subjecting the

system to rigorous testing and analysis. By under-

standing the system’s weaknesses, developers can

take appropriate measures to mitigate risks and

improve the system’s robustness.

• Safety assurance: AI systems are increasingly

being deployed in safety-critical domains such

MBSE-AI Integration 2024 - Workshop on Model-based System Engineering and Artiﬁcial Intelligence

312

as autonomous vehicles, healthcare, and ﬁnance.

These systems must operate reliably and safely, as

failures can have severe consequences, including

loss of life or signiﬁcant ﬁnancial losses. Formal

robustness evaluation helps ensure that the system

behaves as intended and can handle unexpected

scenarios or adversarial attacks.

• Adversarial robustness: Adversarial attacks in-

volve intentionally manipulating inputs to deceive

or mislead AI systems. Formal robustness evalu-

ation helps assess the system’s resilience against

such attacks. By analyzing the system’s response

to adversarial inputs, developers can identify vul-

nerabilities and develop defenses to ensure the

system remains robust in the face of potential at-

tacks.

• Compliance with regulations and standards:

Many safety-critical domains have regulations

and standards that govern the deployment of AI

systems. Formal robustness evaluation helps

demonstrate compliance with these requirements

by providing evidence of the system’s reliability,

safety, and ability to handle unexpected situations.

It allows developers to provide a rigorous and sys-

tematic assessment of the system’s performance,

which is crucial for gaining regulatory approvals

and ensuring public trust.

• Handling edge cases: AI systems often en-

counter edge cases or scenarios that are not well-

represented in the training data. These cases can

lead to unexpected behavior or errors. Formal

robustness evaluation involves testing the system

with a wide range of inputs, including edge cases,

to ensure it can handle them appropriately. By

identifying and addressing potential issues in edge

cases, developers can improve the system’s over-

all reliability and safety.

3 FORMAL METHODS FOR

NEURAL NETWORK

VERIFICATION

Formal methods are a set of techniques that use

logic and mathematics to rigorously prove that soft-

ware meets speciﬁc requirements. These methods

were initially developed in the 1960s and involved

hand-written proofs for small programs (Floyd, 1967;

Hoare, 1969). However, recent advancements have

led to the creation of automated software veriﬁcation

tools.

These tools focus on the semantic properties of

programs, which are concerned with how the pro-

gram behaves when it runs. Unlike syntactic prop-

erties, which can be determined by simply looking

at the code, semantic properties cannot be fully au-

tomated due to the undecidability of Rice’s theorem.

This means that veriﬁcation tools must make some

trade-offs between automation, generality, and com-

pleteness. While they cannot always provide a com-

plete answer, formal methods ensure that any property

proven by a tool is indeed true.

Formal methods are widely used in the hardware

and software industries, and are mandated by safety

standards for critical embedded systems like avion-

ics software. Their adoption is expanding into less

critical domains, such as Facebook and Amazon’s use

cases (Newcombe et al., 2015). This demonstrates the

maturity and scalability of formal methods for soft-

ware veriﬁcation.

3.1 Formal Methods Categories

Different approaches to verify neural networks can be

categorized into three categories based on the formu-

lation of the problem :

• Reachability: This method aims to identify in-

puts that lead to speciﬁc states or properties within

the network. It approximates around selected rep-

resentative inputs, propagates the over approxi-

mation along the layers.

• Optimization: This method employs optimiza-

tion techniques to ﬁnd instances that violate the

given property. The network’s function is incor-

porated as a constraint in the optimization pro-

cess. Due to the non-convex nature of the opti-

mization problem, various techniques have been

developed to represent nonlinear activation func-

tions as linear constraints.

• Search: This method systematically explores the

input space to ﬁnd inputs that contradict the prop-

erty. It often utilizes heuristics or probabilistic

search strategies to guide the exploration.

In the following sections, we will discuss formal

methods for verifying neural networks. We will

present a variety of solutions that have been proposed

in the literature.

3.2 Reachability Approaches

In the veriﬁcation of neural networks, the Reachabil-

ity approach involves computing the reachable set for

a speciﬁc output, meaning ﬁnding the set X such that

every element in that set produces a predeﬁned output

y when propagated through the network:

On the Formal Robustness Evaluation for AI-based Industrial Systems

313

R (X , f ) := y : y = f (x), ∀x ∈ X (1)

The straightforward task of verifying all outputs

produced by a set of inputs can already be expensive,

depending on the size of the input set. This indicates

that the reverse problem of ﬁnding the sets for which

a property on the output holds is not scalable and is

infeasible in practice. For this reason, most work in

this domain uses over-approximations of the sets, for

example, employing abstract domains instead of exact

sets, as deﬁned in Abstract Interpretation.

Two main approaches to reachability analysis for

neural networks are proposed in the literature : exact

and approximate reachability. Below, we outline for-

mal methods associated with each type of reachabil-

ity approach. The choice between exact and approxi-

mate reachability depends on the speciﬁc application

and the desired balance between accuracy and com-

putational efﬁciency. For safety-critical applications

where precise results are required, exact reachability

may be the preferred method. However, for applica-

tions where speed is more critical, approximate reach-

ability can be a valuable alternative.

3.2.1 Exact Reachability

This approach aims to determine the precise set of

outputs associated with a given set of neural network

inputs. It is particularly well-suited for piece-wise lin-

ear networks, where the reachable set can be com-

puted by dividing the network into smaller segments

and analyzing each segment individually. However,

this method becomes computationally demanding as

the network grows in size due to the exponential

growth of linear segments.

ExactReach. Exact reachability analysis for neural

networks using either linear or ReLU activations is

achieved in (Xiang et al., 2017b) by precisely rep-

resenting input sets as unions of polyhedra. Exac-

tReach, a specialized tool designed for this purpose,

utilizes polyhedron manipulation tools for computing

the reachable set at each network layer. For ReLU

activation functions, ExactReach demonstrates a key

property: the reachable set of outputs can also be rep-

resented as a union of polyhedra if the input set is rep-

resented in the same manner. This property enables

ExactReach to thoroughly explore the reachable set

for networks with ReLU activations, ensuring precise

determination of the exact boundaries of the reach-

able set, considering all possible combinations of in-

put values and activation functions. As a result, Exac-

tReach provides comprehensive insights into the net-

work’s behavior and helps identify output ranges as-

sociated with speciﬁc input regions.

3.2.2 Approximate Reachability

Approximate reachability provides an over-

approximation of the reachable set, meaning that

it estimates the set that is guaranteed to contain all

possible outputs, even if it may not be the exact set.

This method is more efﬁcient than exact reachability

as it does not require dividing the reachable set into

smaller pieces, but it sacriﬁces some accuracy.

Abstract Interpretation. Abstract interpretation

(Cousot and Cousot, 1977) is a powerful technique

for computing sound approximations of the reach-

able set of a given function, expressed as speciﬁc sets

of properties that can be automatically determined.

Originally developed for analyzing program seman-

tics, abstract interpretation provides a systematic ap-

proach to relating an often-intractable concrete func-

tion with a more manageable abstract function, also

known as an abstract transformer. This abstraction en-

ables conservative over-approximations of the reach-

able set, allowing for efﬁcient reasoning about com-

plex functions. Crucially, proving that the computed

over-approximation is safe implies that the concrete

reachable set is also safe.

An abstract transformer operates on abstract ele-

ments, which are typically logical properties rep-

resenting sets of corresponding concrete elements

within the actual function. These abstract elements

collectively form an abstract domain. The design

of effective abstract transformers and domains in-

volves a delicate balance between accuracy and ef-

ﬁciency. These abstractions must be sufﬁciently pre-

cise to identify erroneous cases as unreachable while

maintaining computational tractability and scalability

to handle large and complex functions.

Abstract interpretation, a technique for analyzing

program semantics, has found a new application in

analyzing neural networks. In the paper (Gehr et al.,

2018), the authors introduce a method for proving

properties of neural networks using abstract trans-

formers constructed from CAT (Conditional Afﬁne

Transformation) functions. This approach offers a

promising avenue for formally verifying the behavior

of neural networks.

MaxSens. In (Xiang et al., 2017a), the authors con-

ducted a study on the estimation of reachable sets

by focusing on the computation of maximal sensi-

tivity in neural networks. They treated this problem

as a series of convex optimization problems within

a simulation-based framework that utilizes interval

arithmetic. This approach is scalable and supports

monotonic activation functions, making it applicable

MBSE-AI Integration 2024 - Workshop on Model-based System Engineering and Artiﬁcial Intelligence

314

Figure 2: Abstract Interpretation for Neural Network Veriﬁcation (Gehr et al., 2018).

to a wide range of scenarios. However, it is impor-

tant to note that this method is incomplete and prone

to high over-approximations, particularly as the num-

ber of layers in the network increases. The authors

initially provide a description of how to calculate

the maximum sensitivity of a Multi-Layer Perceptron

(MLP) network. The sensitivity refers to the Lipschitz

constant of the network’s function, which measures

the output deviation resulting from a bounded distur-

bance around an input. The maximum sensitivity, on

the other hand, represents the maximum variation in

output for a robustness ball deﬁned by the `

∞

norm.

3.3 Optimization Approaches

Optimization techniques are applied to challenge

the assertion, treating the neural network’s function

as a constraint in the optimization process. Con-

sequently, the optimization problem becomes non-

convex. In primal optimization, several approaches

have emerged to represent nonlinear activation func-

tions as linear constraints. Methods like NSVerify

(Lomuscio and Maganti, 2017a), MIPVerify (Tjeng

et al., 2019), Big-M (Ibn-Khedher et al., 2021), and

ILP (Bastani et al., 2016) are instances of techniques

that simplify constraints through primal optimiza-

tion. Alternatively, dual optimization techniques pro-

vide another avenue for simplifying constraints. La-

grangian dual methods, including Duality (Dvijotham

et al., 2018) and ConvDual (Kolter and Wong, 2017),

as well as semideﬁnite programming methods such as

Certify (Raghunathan et al., 2018) and SDP (Fazlyab

et al., 2022), represent representative approaches for

dual optimization.

3.3.1 Primal Optimization

Primal optimization methods tackle the veriﬁcation

task by embedding the network structure as a con-

straint in the optimization problem. However, these

methods are currently limited to networks with ReLU

(Rectiﬁed Linear Unit) activations. To address this,

researchers have developed techniques to encode the

network using linear or mixed integer linear con-

straints, taking advantage of the piece-wise linear-

ity of ReLU activations. This approach allows for

a more focused exploration of optimization solutions

tailored to the speciﬁc characteristics of ReLU-based

networks.

Big-M. In the work presented in (Ibn-Khedher

et al., 2021), the authors advocate for utilizing the

bigM technique as an automated encoder for lin-

earizing the ReLU, a non-linear activation function.

This technique involves a mixed-integer linear pro-

gramming transformation that precisely converts non-

linear constraints into linear inequalities.

To illustrate, considering the example depicted in

Eq.2 where neurons H

and H

are activated with

ReLU, employing Big-M for the veriﬁcation problem

results in formulating it as a maximization problem:

max Z = Constant, subject to a set of constraints that

delineate the relationships between neurons across

different layers. The core of this approach lies in the

linearization of the ReLU function, achieved by re-

placing it with a set of constraints. For instance, the

relationship between neuron H

and inputs x

and x

is expressed as the following system of equations.

On the Formal Robustness Evaluation for AI-based Industrial Systems

315

out

) ≥ (θ

− 1)M + (x

+ x

)

out

) ≤ (1 − θ

)M + (x

+ x

)

out

) ≤ θ

× M

+ x

≥ (θ

− 1) × M

+ x

≤ θ

× M

∈ {0, 1}

(2)

MIPVerify. The authors of (Tjeng et al., 2019) sim-

ilarly encode the neural network using mixed inte-

ger linear constraints. However, there are two key

distinctions between MIPVerify and NSVerify (Lo-

muscio and Maganti, 2017b). Firstly, MIPVerify em-

ploys node bounds to reﬁne the constraints. Secondly,

MIPVerify addresses an adversarial problem that aims

to estimate the maximum permissible perturbation on

the input side. Like NSVerify, MIPVerify is also rec-

ognized as a complete method, ensuring that it can

successfully handle and analyze all feasible scenarios.

Formally, MIPVerify computes maximum allowable

disturbance using mixed integer linear programming.

Mathematically, MIPVerify optimizes for the adver-

sarial bound (min

x,y

||x − x

,s.t.y 6∈ Y , y = f (x)) us-

ing mixed integer encoding (with lower and upper

bounds) and returns adversarial bounds.

ILP. The authors of ILP (Iterative Linear Program-

ming) (Bastani et al., 2016) represents the neural net-

work as a series of linear constraints by linearizing

it around a reference point. In ILP, the optimization

problem addresses an adversarial scenario aiming to

determine the maximum permissible perturbation on

the input side. The optimization is performed itera-

tively. However, it is important to note that ILP is not

considered a complete method as it only considers a

single linear segment of the network, thus potentially

overlooking certain aspects of the overall network be-

havior.

3.3.2 Dual Optimization

While various techniques have been developed for en-

coding constraints in primal optimization methods,

dual optimization offers an alternative approach by

simplifying the constraints within the primal problem.

This simpliﬁcation typically leads to fewer and more

manageable constraints in the dual problem. How-

ever, the corresponding objectives in dual optimiza-

tion tend to be more complex than those in the primal

problem. The construction of the dual problem often

involves incorporating relaxations, which means that

these approaches are considered incomplete due to the

approximation inherent in the relaxation process.

Duality. (Dvijotham et al., 2018) addressed vari-

ous challenges associated with SMT-based, branch-

and-bound, or mixed-integer programming methods,

particularly concerning scalability and restrictions

linked to speciﬁc piece-wise linear activation func-

tions. They introduced a universal approach capable

of accommodating any activation function and spec-

iﬁcation type. The key concept involves formulat-

ing the veriﬁcation property as an optimization prob-

lem, aiming to identify the most substantial violation

of the speciﬁcation. Their strategy entails solving a

Lagrangian relaxation of the optimization problem to

derive an upper bound on the maximum potential vi-

olation of the speciﬁcation. While the approach is

sound, indicating that if the largest violation is less

than or equal to zero, the property holds, it is also

incomplete. A positive largest violation does not con-

clusively establish that the property does not hold in

reality. Moreover, the process can be halted at any

point to obtain the current valid bound on the maxi-

mum violation.

ConvDual. ConvDual (Wong and Kolter, 2018) fol-

lows a dual approach to estimate output bounds. It be-

gins by performing a convex relaxation of the network

within the primal optimization framework to simplify

the dual problem. Unlike explicit optimization, Con-

vDual heuristically computes the bounds by selecting

a ﬁxed, dual feasible solution. This heuristic approach

allows ConvDual to achieve computational efﬁciency

compared to Duality. While the original ConvDual

approach utilizes the obtained bounds for robust net-

work training, this survey primarily concentrates on

the method’s capability to compute these bounds ac-

curately.

3.4 Search Approaches

Reluplex. In (Katz et al., 2017), the authors propose

Reluplex. Reluplex is the abbreviation of ReLU for

the simplex algorithm. The simplex algorithm is an

algorithm for solving linear optimization problems.

Its objective is to minimize a function on a set deﬁned

by inequalities. Reluplex’s principle consists in for-

malizing the neural network by a set of equations. To

solve this system of equations, starting from an initial

assignment, it tries to correct some constraints vio-

lated at each step. The speciﬁcation of this approach

is that, from one iteration to another, the constraints

between the variables can be violated.

PLANET. In (Ehlers, 2017), the authors propose

PLANET, "a Piece-wise LineAr feed-forward NEural

network veriﬁcation Tool". Its principle consists ﬁrst

MBSE-AI Integration 2024 - Workshop on Model-based System Engineering and Artiﬁcial Intelligence

316

of replacing non-linear neural network functions by a

set of linear equations. It tries then to ﬁnd a solution

to the system of equations. The approach supports

both types of nodes: ReLU and Max Pooling.

BaB. BaB (Bunel et al., 2017) employs the branch

and bound technique to calculate the output bounds

of a network. Its modular design allows it to function

as a comprehensive framework that can accommodate

other methods like Reluplex and Planet. This versa-

tility enables BaB to serve as a uniﬁed platform for

various approaches within the ﬁeld.

Fast-Lin. Fast-Lin (Weng et al., 2018) adopts a

layer-by-layer approach and utilizes binary search

within the input domain to determine a certiﬁed lower

bound on the allowable input disturbance for ReLU

networks. This methodology enables efﬁcient compu-

tation and offers a dependable estimation of the max-

imum permissible input perturbation.

Fast-Lip. Fast-Lip (Weng et al., 2018) builds upon

the foundation of Fast-Lin for calculating activation

function bounds while also estimating the local Lip-

schitz constant of the network. Fast-Lin excels in

terms of scalability, being more efﬁcient in this re-

gard. However, Fast-Lip provides enhanced solutions

speciﬁcally for ’L1 bounds,’ offering improved per-

formance in that speciﬁc context.

While we present a rough overview of the differ-

ent techniques developed in each ﬁeld, it is impor-

tant to note that more often than not, state of the art

tools leverage more than one kind of method. They

rely on reachability approaches for their scalability,

adding some optimization to improve the precision

and make their tool complete through search based

approaches. One can look at the performances of state

of the art approaches through the VNN-COMP (Brix

et al., 2023), which brings together and evaluate tools

such as α-β-CROWN (Zhang et al., 2018), Marabou

(Katz et al., 2019), nnenum (Bak et al., 2020) or

PyRAT (Lemesle et al., 2023).

4 ROBUSTNESS TESTING

In some cases the formal evaluation of a neural net-

work is costly in terms of computation time. To get an

idea about the new robustness of the neural network,

it is recommended to start by analyzing the behavior

of the neural network in front of adversary attacks. A

network that is already vulnerable to adversarial at-

tacks needs to improve its robustness using state-of-

the-art adversarial training techniques and it is very

early to formally evaluate its robustness.

4.1 Adversarial Attacks Types

There are two main types of adversarial attacks

(White-box attacks and Black-box attacks) that aim

to fool neural networks into making incorrect predic-

tions. The key difference between the two lies in the

amount of knowledge the attacker has about the target

neural network.

White-box Attacks assume that the attacker has

full knowledge of the neural network, including its

architecture, weights, and training data. This al-

lows the attacker to comprehensively analyze the net-

work’s vulnerabilities and design targeted adversarial

attacks that exploit its speciﬁc weaknesses. In con-

trast, Black-box Attacks operate with limited or no

knowledge of the target network’s internal structure

or parameters. The attacker can only interact with the

network by making input queries and observing the

corresponding output predictions. This poses a sig-

niﬁcant challenge, as the attacker must infer the net-

work’s vulnerabilities and design attacks without di-

rect access to its inner workings.

In the industrial context, the two type of attacks

serve distinct purposes. Organizations typically em-

ploy white-box Attacks internally to enhance their

systems and detect potential vulnerabilities before

they can be exploited by external actors. By gain-

ing comprehensive knowledge of their neural net-

works’ architectures, weights, and training data, orga-

nizations can thoroughly analyze their networks’ ro-

bustness and identify areas where adversarial attacks

could be successful. This proactive approach helps

organizations to detect the weakness of their models

and mitigate risks.

On the other hand, Black-Box Attacks play a cru-

cial role in testing and evaluating neural networks be-

fore they are deployed in real-world applications. By

simulating the actions of an external attacker, Black-

Box Attacks assess the network’s resilience against

unknown or obfuscated threats. This helps organiza-

tions identify potential vulnerabilities that may arise

from limited knowledge of the network’s internals.

4.2 Adversarial White-Box Attacks

Fast Gradient Sign Method (FGSM). (Goodfel-

low et al., 2015) have developed a method for gen-

erating adverse sample based on the gradient descent

method. Each component of the original sample x is

modiﬁed by adding or subtracting a small perturba-

tion ε. The adverse function ψ is expressed as fol-

On the Formal Robustness Evaluation for AI-based Industrial Systems

317

lows:

ψ : X ×Y −→ X

(x, y) 7−→ −ε∇

L(x, y)

Thus, the loss function of the classiﬁer will decrease

when the class of the adverse sample is chosen as y.

We then wish to ﬁnd x

adverse sample of x such as:

− xk

≤ ε.

It is clear that the previous formulation of FGSM is

related to targeted attack case where y corresponds to

the target class that we wish to impose on the original

input sample. However, this attack can be applied in

the case of untargeted attack, by considering the fol-

lowing perturbation:

ρ : X −→ X

x 7−→ −ψ(x, C(x))

Thus, the original input x is modiﬁed in order to

increase the loss function L when the classiﬁer re-

tains the same class C(x). This attack only requires

the computation of the loss function gradient, which

makes it a very efﬁcient method. On the other hand, ε

is a hyperparameter that affects the x

class: if ε is too

small, the ρ(x) perturbation may have little impact on

the x

class.

Basic Iterative Method (BIM). (Kurabin et al.,

2017) proposed an extension of the FGSM attack by

iteratively applying FGSM. At each iteration i, the ad-

verse sample is generated by applying FGSM on the

generated sample at the (i − 1)

iteration. The BIM

attack is generated as the following:

(

= x

i+1

= x

+ ψ(x

,y)

where y represents, in the case of a targeted attack,

the class of the adverse sample and y = C(x

) in the

case of an untargeted attack. Moreover, ψ is the same

function deﬁned in the case of FGSM attack.

Projected Gradient Descent (PGD). The PGD

(Madry et al., 2017) attack is also an extension of

the FGSM and similar to BIM. It consists in applying

FGSM several times. The major difference from BIM

is that at each iteration, the generated attack is pro-

jected on the ball B(x, ε) =

{

z ∈ X : kx − zk

≤ ε

}

The adverse sample x

associated with the original one

x is then constructed as follows:

(

= x

i+1

= Π

+ ψ(x

,y))

where Π

is the projection on the ball B(x, ε) and ψ is

the perturbation function as deﬁned in FGSM. Like-

wise, y refers to the target label you wish to reach, in

the case of targeted attack, or it refers to C(x

) in the

case of an untargeted attack.

DeepFool. DeepFool is a non-targeted attack pro-

posed by (Moosavi-Dezfooli et al., 2015). The main

idea of DeepFool is to ﬁnd the closest distance from

the original input to the decision boundary. To over-

come the non-linearity in high dimension, they per-

formed an iterative attack with a linear approxima-

tion. For an afﬁne classiﬁer f (x) = wT x + b, where

w is the weight of the afﬁne classiﬁer and b is the

bias, the minimal perturbation of an afﬁne classiﬁer

is the distance to the separating afﬁne hyperplane

F = x : wT x + b = 0. Given the example of a linear

binary classiﬁer, the robustness of the classiﬁer f for

an input x

is equal to the distance of x

to the hyper-

plane separating the two classes. In fact, the minimal

perturbation to change the classiﬁer’s decision corre-

sponds to the orthogonal projection of x

onto the hy-

perplane, given by: η

∗

(x) = −

f (x)

kwk

∗ w.

For a general differentiable classiﬁer, DeepFool

assumes that f is linear around x

at each iteration.

The minimal perturbation is expressed as follows:

argmin

kη

subject to f (x

) + ∇ f (x

)

= 0.

This process runs until f (x

) 6= f (x), and the mini-

mum perturbation is eventually approximated by the

sum of η

. This technique can also be extended to

a multi-class classiﬁer by ﬁnding the closest hyper-

planes. It can also be extended to a more general

norm, p ∈ [0, ∞). As mentioned in (Yuan et al.,

2019), DeepFool provides less perturbation compared

to FGSM.

4.3 Adversarial Black-Box Attacks

ZOO. Zeroth Order Optimization (ZOO) is pro-

posed by (Chen et al., 2017) to estimate the gradients

of target DNN in order to produce an adversarial in-

put. ZOO is suitable for problems where gradients

are unavailable, uncomputable or private. The threat

model assumed by the authors is that the target model

can only be queried to obtain the probability scores of

all the classes. The authors then use symmetric differ-

ence quotient method to estimate the gradient

∂ f (x)

∂x

where f is the loss function:

ˆg :=

∂ f (x)

∂x

≈

f (x + he

)

f (x − he

)

(3)

The naive solution above requires querying the

model 2p times, where p is the dimension of the in-

MBSE-AI Integration 2024 - Workshop on Model-based System Engineering and Artiﬁcial Intelligence

318

put. So the authors propose two stochastic coordinate

methods: ZOO-Adam and ZOO-Newton in which a

gradient is estimated for a random coordinate and the

update formula is obtained using ADAM and New-

ton’s Method until it reaches convergence. The au-

thors also discuss the generation of noise in a lower

dimension to improve efﬁciency and specify its ad-

vantages and disadvantages (Bhambri et al., 2019).

Boundary. Boundary Attacks are decision-based

adversarial attacks that are based on a model’s de-

cision. This was the ﬁrst time that decision-based

attacks were introduced that focus on deep learning

models using real-life data sets such as ImageNet.

The proposed algorithm initializes by taking a ran-

dom distribution of pixels for an image in the range

of [0, 255]. Using this image, the algorithm traverses

the decision boundary of the class to which the sam-

ple input image belongs to. The initialized image is

then able to reduce its distance w.r.t. the input image

besides staying inside the adversarial region.

5 DEFENSE TECHNIQUE FOR

IMPROVING ROBUSTNESS

Defense techniques aim to strengthen a model’s re-

silience against adversarial attacks. The literature

distinguishes three categories of defense techniques,

speciﬁcally those based on: i) Modifying the input

data, ii) Modifying the classiﬁer, and iii) Adding an

external model. These categories are elaborated upon

below.

5.1 Modifying the Input Data

Instead of directly altering the network architecture

or training process, data modiﬁcation techniques fo-

cus on manipulating the input data itself. These tech-

niques aim to make the input data less susceptible to

adversarial perturbations, thereby improving the net-

work’s overall robustness. One such technique, Gaus-

sian data augmentation (Zantedeschi et al., 2017), in-

volves adding copies of the original data points with

Gaussian noise. This approach trains the network to

recognize the same class for both the original instance

and its slightly perturbed version, enhancing its abil-

ity to generalize and withstand adversarial modiﬁca-

tions. The simplicity, ease of implementation, and ef-

fectiveness of Gaussian data augmentation have made

it a widely adopted technique for mitigating adversar-

ial attacks.

5.2 Modifying the Classiﬁer

Classiﬁer modiﬁcations involve altering the neural

network architecture or training process to enhance its

resilience to adversarial attacks. One such approach is

gradient masking, which aims to conceal or mask the

network’s gradient information from potential attack-

ers. While many adversarial attacks rely on knowl-

edge of the gradient, gradient masking aims to ob-

scure this information, making it more difﬁcult for at-

tackers to manipulate the network.

Gradient masking techniques work by manipulat-

ing the gradients during the training process, effec-

tively hiding the network’s internal workings from ex-

ternal observers. This makes it challenging for attack-

ers to craft effective adversarial examples based on

gradient information.

5.3 Adding an External Model

Instead of directly modifying the primary network,

external model integration utilizes additional models

to augment the system’s defenses against adversarial

attacks. This approach introduces an extra layer of

protection, making it more challenging for attackers

to manipulate the network’s output.

One such technique, introduced in (Lee et al.,

2017), employs generative adversarial networks

(GANs) to enhance the network’s resilience. A sepa-

rate generator network actively generates adversarial

perturbations that aim to fool the primary network.

By incorporating this adversarial training process, the

system becomes more adept at identifying and coun-

teracting adversarial inputs. The integration of exter-

nal models alongside the primary network during test-

ing provides a robust and effective defense strategy.

This approach demonstrates the potential of combin-

ing different models and techniques to strengthen the

overall security of neural networks against adversarial

attacks.

6 CONCLUSION AND

PERSPECTIVES

Machine learning models are, ultimately, software

artefacts. And as such, there must be adequate consid-

eration and effort devoted to the characterization of its

safety and robustness aspects when it is used in sensi-

tive environments, particularly critical systems. How-

ever, ML models come with their own challenges, de-

manding new methods and tools to be devised. While

this task seems arduous, the goal of this paper is to of-

fer a broad range of avenues for trust, to show the ﬁrst

On the Formal Robustness Evaluation for AI-based Industrial Systems

319

steps in these paths, and to inspire the conviction that

the research community is advancing on this topic and

making greater strides every year, expanding the tool-

box needed by the industrial sector to enforce trust in

their models.

In future work, we propose the application of the

suggested pipeline in real industrial use cases. Given

that the methods mentioned in this survey depend on

types of data (images, time series, tabular data), we

have observed that the approaches proposed for time

series are not sufﬁciently advanced. Therefore, we

suggest investigating the adaptability of this pipeline

for temporal data.

ACKNOWLEDGMENT

This work has been supported by the French govern-

ment under the "France 2030" program, as part of the

SystemX Technological Research Institute within the

Conﬁance.ai programme.

REFERENCES

Bak, S., Tran, H.-D., Hobbs, K., and Johnson, T. T. (2020).

Improved geometric path enumeration for verifying

relu neural networks. In Proceedings of the 32nd In-

ternational Conference on Computer Aided Veriﬁca-

tion. Springer.

Bastani, O., Ioannou, Y., Lampropoulos, L., Vytiniotis,

D., Nori, A. V., and Criminisi, A. (2016). Measur-

ing neural net robustness with constraints. CoRR,

abs/1605.07262.

Bhambri, S., Muku, S., Tulasi, A., and Buduru, A. B.

(2019). A study of black box adversarial attacks in

computer vision. CoRR, abs/1912.01667.

Boardman, M. and Butcher, F. (2019). An exploration

of maintaining human control in ai enabled systems

and the challenges of achieving it. In Workshop on

Big Data Challenge-Situation Awareness and Deci-

sion Support. Brussels: North Atlantic Treaty Organi-

zation Science and Technology Organization. Porton

Down: Dstl Porton Down.

Brix, C., Bak, S., Liu, C., and Johnson, T. T. (2023). The

fourth international veriﬁcation of neural networks

competition (vnn-comp 2023): Summary and results.

Bunel, R., Turkaslan, I., Torr, P. H., Kohli, P., and Kumar,

M. P. (2018). A uniﬁed view of piecewise linear neu-

ral network veriﬁcation. In Proceedings of the 32Nd

International Conference on Neural Information Pro-

cessing Systems, NIPS’18, pages 4795–4804, USA.

Curran Associates Inc.

Bunel, R., Turkaslan, I., Torr, P. H. S., Kohli, P., and Kumar,

M. P. (2017). Piecewise linear neural network veriﬁ-

cation: A comparative study. CoRR, abs/1711.00455.

Chen, P.-Y., Zhang, H., Sharma, Y., Yi, J., and Hsieh, C.-

J. (2017). ZOO: Zeroth Order Optimization based

Black-box Attacks to Deep Neural Networks without

Training Substitute Models. Proceedings of the 10th

ACM Workshop on Artiﬁcial Intelligence and Security,

pages 15–26. arXiv: 1708.03999.

Cousot, P. and Cousot, R. (1977). Abstract interpretation:

A uniﬁed lattice model for static analysis of programs

by construction or approximation of ﬁxpoints. In

Conference Record of the 4th Annual ACM SIGACT-

SIGPLAN Symposium on Principles of Programming

Languages (POPL), Los Angeles, CA, pages 238–252.

ACM.

Dvijotham, K., Stanforth, R., Gowal, S., Mann, T. A., and

Kohli, P. (2018). A Dual Approach to Scalable Ver-

iﬁcation of Deep Networks. In Proceedings of the

Thirty-Fourth Conference on Uncertainty in Artiﬁcial

Intelligence, UAI 2018, Monterey, California, USA,

August 6-10, 2018, pages 550–559.

Ehlers, R. (2017). Formal veriﬁcation of piece-

wise linear feed-forward neural networks. CoRR,

abs/1705.01320.

El-Sherif, D. M., Abouzid, M., Elzarif, M. T., Ahmed,

A. A., Albakri, A., and Alshehri, M. M. (2022). Tele-

health and artiﬁcial intelligence insights into health-

care during the covid-19 pandemic. In Healthcare,

volume 10, page 385. MDPI.

Fazlyab, M., Morari, M., and Pappas, G. J. (2022). Safety

veriﬁcation and robustness analysis of neural net-

works via quadratic constraints and semideﬁnite pro-

gramming. IEEE Transactions on Automatic Control,

67(1):1–15.

Ferrari, A. and Beek, M. H. T. (2022). Formal methods in

railways: a systematic mapping study. ACM Comput-

ing Surveys, 55(4):1–37.

Floyd, R. W. (1967). Assigning meanings to programs.

Proceedings of Symposium on Applied Mathematics,

19:19–32.

Gehr, T., Mirman, M., Drachsler-Cohen, D., Tsankov, P.,

Chaudhuri, S., and Vechev, M. (2018). Ai2: Safety

and robustness certiﬁcation of neural networks with

abstract interpretation. In 2018 IEEE Symposium on

Security and Privacy (SP), pages 3–18.

Gehr, T., Mirman, M., Drachsler-Cohen, D., Tsankov, P.,

Chaudhuri, S., and Vechev, M. (2018). AI2: Safety

and Robustness Certiﬁcation of Neural Networks with

Abstract Interpretation. In 2018 IEEE Symposium on

Security and Privacy (SP), pages 3–18. ISSN: 2375-

1207.

Goodfellow, I. J., Shlens, J., and Szegedy, C. (2015). Ex-

plaining and Harnessing Adversarial Examples. In In-

ternational Conference on Learning Representations.

Hoare, C. A. R. (1969). An axiomatic basis for computer

programming. Commun. ACM, 12(10):576–580.

Ibn-Khedher, H., Khedher, M. I., and Hadji, M. (2021).

Mathematical programming approach for adversarial

attack modelling. In Rocha, A. P., Steels, L., and

van den Herik, H. J., editors, International Conference

on Agents and Artiﬁcial Intelligence, pages 343–350.

MBSE-AI Integration 2024 - Workshop on Model-based System Engineering and Artiﬁcial Intelligence

320

Jmila, H. and Khedher, M. I. (2022). Adversarial machine

learning for network intrusion detection: A compara-

tive study. Comput. Networks, 214:109073.

Katz, G., Barrett, C. W., Dill, D. L., Julian, K., and Kochen-

derfer, M. J. (2017). Reluplex: An efﬁcient SMT

solver for verifying deep neural networks. CoRR,

abs/1702.01135.

Katz, G., Huang, D. A., Ibeling, D., Julian, K., Lazarus, C.,

Lim, R., Shah, P., Thakoor, S., Wu, H., Zeljic, A., Dill,

D. L., Kochenderfer, M. J., and Barrett, C. W. (2019).

The marabou framework for veriﬁcation and analysis

of deep neural networks. In Dillig, I. and Tasiran,

S., editors, Computer Aided Veriﬁcation - 31st Inter-

national Conference, CAV 2019, New York City, NY,

USA, July 15-18, 2019, Proceedings, Part I, volume

11561 of Lecture Notes in Computer Science, pages

443–452. Springer.

Khedher, M. I., Jmila, H., and El-Yacoubi, M. A. (2023).

On the formal evaluation of the robustness of neural

networks and its pivotal relevance for ai-based safety-

critical domains. International Journal of Network

Dynamics and Intelligence, page 100018.

Khedher, M. I., Mziou-Sallami, M., and Hadji, M. (2021).

Improving decision-making-process for robot naviga-

tion under uncertainty. In International Conference on

Agents and Artiﬁcial Intelligence, pages 1105–1113.

Kolter, J. Z. and Wong, E. (2017). Provable defenses against

adversarial examples via the convex outer adversarial

polytope. CoRR, abs/1711.00851.

Kurabin, A., Goodfellow, I. J., and Bengio, S. (2017).

Adversarial examples in the physical world. ICLR,

1607.02533v4.

Lee, H., Han, S., and Lee, J. (2017). Generative adversarial

trainer: Defense to adversarial perturbations with gan.

arXiv preprint arXiv:1705.03387.

Lemesle, A., Chihani, Z., Lehmann, J., and Durand,

S. (2023). PyRAT Analyzer website. https://

pyrat-analyzer.com/. Accessed: December 15th,

2023.

Li, B.-h., Hou, B.-c., Yu, W.-t., Lu, X.-b., and Yang, C.-w.

(2017). Applications of artiﬁcial intelligence in intel-

ligent manufacturing: a review. Frontiers of Informa-

tion Technology & Electronic Engineering, 18:86–96.

Lomuscio, A. and Maganti, L. (2017a). An approach to

reachability analysis for feed-forward relu neural net-

works. CoRR, abs/1706.07351.

Lomuscio, A. and Maganti, L. (2017b). An approach to

reachability analysis for feed-forward relu neural net-

works. CoRR, abs/1706.07351.

Madry, A., Makelov, A., Schmidt, L., Tsipras, D., and

Vladu, A. (2017). Towards deep learning mod-

els resistant to adversarial attacks. arXiv preprint

arXiv:1706.06083.

Mattioli, J., Le Roux, X., Braunschweig, B., Cantat, L.,

Tschirhart, F., Robert, B., Gelin, R., and Nicolas, Y.

(2023). Ai engineering to deploy reliable ai in indus-

try. In AI4I.

Miglani, A. and Kumar, N. (2019). Deep learning models

for trafﬁc ﬂow prediction in autonomous vehicles: A

review, solutions, and challenges. Vehicular Commu-

nications, 20:100184.

Moosavi-Dezfooli, S.-M., Fawzi, A., and Frossard, P.

(2015). Deepfool : a simple and accurate method to

fool deep neural networks. CoRR, 1511.04599.

Mwadulo, M. W. (2016). Suitability of agile methods for

safety-critical systems development: a survey of liter-

ature. International Journal of Computer Applications

Technology and Research, 5(7):465–471.

Newcombe, C., Rath, T., Zhang, F., Munteanu, B., Brooker,

M., and Deardeuff, M. (2015). How amazon web ser-

vices uses formal methods. Communications of the

ACM.

Raghunathan, A., Steinhardt, J., and Liang, P. (2018).

Certiﬁed defenses against adversarial examples. In

6th International Conference on Learning Represen-

tations, ICLR 2018, Vancouver, BC, Canada, April 30

- May 3, 2018, Conference Track Proceedings. Open-

Review.net.

Tjeng, V., Xiao, K. Y., and Tedrake, R. (2019). Evaluating

robustness of neural networks with mixed integer pro-

gramming. In 7th International Conference on Learn-

ing Representations, ICLR 2019, New Orleans, LA,

USA, May 6-9, 2019. OpenReview.net.

Weng, T., Zhang, H., Chen, H., Song, Z., Hsieh, C.,

Daniel, L., Boning, D. S., and Dhillon, I. S. (2018).

Towards fast computation of certiﬁed robustness for

relu networks. In Dy, J. G. and Krause, A., edi-

tors, Proceedings of the 35th International Conference

on Machine Learning, ICML 2018, Stockholmsmäs-

san, Stockholm, Sweden, July 10-15, 2018, volume 80

of Proceedings of Machine Learning Research, pages

5273–5282.

Wong, E. and Kolter, J. Z. (2018). Provable defenses against

adversarial examples via the convex outer adversar-

ial polytope. In Dy, J. G. and Krause, A., editors,

Proceedings of the 35th International Conference on

Machine Learning, ICML 2018, Stockholmsmässan,

Stockholm, Sweden, July 10-15, 2018, volume 80 of

Proceedings of Machine Learning Research, pages

5283–5292. PMLR.

Xiang, W., Tran, H., and Johnson, T. T. (2017a). Out-

put reachable set estimation and veriﬁcation for multi-

layer neural networks. CoRR, abs/1708.03322.

Xiang, W., Tran, H., and Johnson, T. T. (2017b). Reachable

set computation and safety veriﬁcation for neural net-

works with relu activations. CoRR, abs/1712.08163.

Yuan, X., He, P., Zhu, Q., and Li, X. (2019). Adversar-

ial examples: Attacks and defenses for deep learning.

IEEE transactions on neural networks and learning

systems, 30(9):2805–2824.

Zantedeschi, V., Nicolae, M.-I., and Rawat, A. (2017).

Efﬁcient defenses against adversarial attacks. In

ACM Workshop on Artiﬁcial Intelligence and Security,

pages 39–49.

Zhang, H., Weng, T.-W., Chen, P.-Y., Hsieh, C.-J., and

Daniel, L. (2018). Efﬁcient neural network robust-

ness certiﬁcation with general activation functions.

Advances in Neural Information Processing Systems,

31:4939–4948.

On the Formal Robustness Evaluation for AI-based Industrial Systems

321