
tomized, the user has the possibility to save the collec-
tion of chosen perturbation options to the knowledge
graph. A saved collection can be used in deployment
to load this set of options for a new input case. It is
possible to create multiple collections of perturbation
options for one prediction project.
The Deployment section of the tool allows to se-
lect a predefined collection of perturbation options
and apply these perturbation options to new input
cases, thus creating a perturbation assessment for the
respective input case. A user can select one of the
predefined collections of perturbation options from a
drop-down menu. The tool provides the possibility
to view all perturbation options that are included in
the chosen collection as well as to add additional or
to remove included perturbation options from the col-
lection before starting with the assessment of a new
case. Once the user has selected all perturbation op-
tions that should be used for the assessment, the new
input cases should be entered into the tool. A new
case can be entered either by entering a value for each
feature in the user interface, or by uploading a CSV
file that contains the feature values. All entered cases
are shown within a table in the user interface. The
user can click on one of the cases in the table and af-
ter providing a label for this case, the user can start
perturbing of the respective case. After the process-
ing of the perturbed test cases is finished, the result is
shown within a table in the user interface. The result
consists of the original input case, shown in the first
line of the table, and all perturbed test cases which
are created based on the chosen perturbation options.
Each perturbed test case in the result highlights all
perturbed values for the user. Besides the table that
includes all perturbed cases, the user also gets a table
where only those perturbed cases are shown that re-
ceived a changed prediction compared to the original
input case. Perturbed test cases with a changed pre-
diction are of interest for a domain expert to assess the
reliability of the original case’s prediction. The user
has the possibility to download all perturbed cases in
CSV format for further processing.
5.2 Architecture
We decided against building a heavyweight RESTful
implementation of the tool in favor of the lightweight
Python framework Streamlit
4
, which offers enough
flexibility to demonstrate the functionality, including
a simple graphical user interface. As database we
used the Fuseki
5
graph store, which saves any relia-
bility assessment information in the format proposed
4
https://streamlit.io/
5
https://jena.apache.org/documentation/fuseki2/
{
” W i n d D i r e c t i o n ” : {” l e v e l O f S c a l e ” : ” C a r d i n a l ” ,
” u n i q u e V a l u e s ” : [ ” 5 ” , ” 3 6 0 ” ] } ,
” W i n g l e t s ” : {” l e v e l O f S c a l e ” : ” No minal ” ,
” u n i q u e V a l u e s ” : [ ”Y” , ”N” ] } ,
” Runway ” : {” l e v e l O f S c a l e ” : ” O r d i n a l ” ,
” u n i q u e V a l u e s ” : [ ” 0 ” , ” 0 . 2 ” , ” 0 . 4 ” , ” 0 . 6 ” , ” 0 . 8 ” , ” 1 ” ] }
}
Listing 1: Example JSON definition of feature metadata
by Staudinger et al. (2024). In the root folder of
the tool a user can find the three main configuration
files config, sparql, and strings. The config file con-
tains configurable items, e.g., the link to the graph
store. The sparql file contains all SPARQL-queries
that are used to insert or retrieve information from the
graph store, so if any changes to the knowledge graph
schema are necessary, they can be made here. The
strings file contains any text that is shown within the
tool, thus enables easy textual changes or the provi-
sion of the tool in another language.
The tool uses two main inputs in order to assess
the reliability of individual predictions. The first in-
put is a JSON file describing the metadata of the fea-
tures of the prediction model. An illustrative exam-
ple of the structure of the JSON file is shown in List-
ing 1. Every feature is listed with its unique name
(e.g., Wind Direction), its scale of feature (levelOfS-
cale), which can either be Cardinal, Nominal, or Or-
dinal, and the unique values (uniqueValues) for each
feature. A cardinal feature should specify a mini-
mum (e.g., 5) and a maximum (e.g., 360) value that
is allowed for this feature. Nominal and ordinal fea-
tures should specify a list of all allowed feature values
whereby the list should be ordered for ordinal features
(e.g., “0”, “0.2”, “0.4”). The information, contained
in the JSON file, is the minimum information that is
required in order to perform the assessment.
The second input is an already trained prediction
model. Since the training of prediction models can
take hours, it is not possible to retrain a model ev-
ery time the tool is started. Therefore, we offer the
possibility to upload any pre-trained model that was
exported using the python library pickle
6
. Once up-
loaded, the user can choose which prediction model
should be used for a new input case.
The output of the tool is a collection of perturbed
test cases, which is presented in the user interface. A
user has the possibility to download the collection in
CSV format, where the first line represents the orig-
inal test case and all following rows represent per-
turbed test cases, including the respective prediction
6
https://docs.python.org/3/library/pickle.html
Implementing the Perturbation Approach for Reliability Assessment: A Case Study in the Context of Flight Delay Prediction
85