A Comparison of Bayesian and Frequentist Approaches for the Case
of Accident and Safety Analysis, as a Precept for All AI Expert
Models
Moldir Zholdasbayeva and Vasilios Zarikas
Department of Mechanical and Aerospace Engineering, Nazarbayev University, Nur-Sultan, Kazakhstan
Keywords: Artificial Intelligence with Uncertainty, Bayesian Networks, Supervised Learning, Regression Method,
Frequentist Statistics, Causal Analysis, Elevator Accidents, Safety Rules.
Abstract: Statistical modelling techniques are widely used in accident studies. It is a well-known fact that frequentist
statistical approach includes hypothesis testing, correlations, and probabilistic inferences. Bayesian networks,
which belong to the set of advanced AI techniques, perform advanced calculations related to diagnostics,
prediction and causal inference. The aim of the current work is to present a comparison of Bayesian and
Regression approaches for safety analysis. For this, both advantages and disadvantages of two modelling
approaches were studied. The results indicated that the precision of Bayesian network was higher than that of
the ordinal regression model. However, regression analysis can also provide understanding of the information
hidden in data. The two approaches may suggest different significant explanatory factors/causes, and this
always should be taken into consideration. The obtained outcomes from this analysis will contribute to the
existing literature on safety science and accident analysis.
1 INTRODUCTION
The choice of a modelling approach remains one of
the main issues in accident studies (Alkheder,
Alrukaibi, & Aiash, 2020; Mujalli, Calvo, & O, 2011;
Zong, Xu, & Zhang, 2013; Gregoriades and
Christodoulides, 2017). For these studies, the severity
of accident or injury is often chosen as a key
dependent/target variable (Eboli et al., 2020; Fountas,
Ch, & Mannering, 2018; Michalaki & Quddus, 2015).
By expert judgement, several factors, which affect the
severity of accident or injury, can be found. In recent
years, researchers elaborate on studying the causes of
the occurrence of various accidents by running a
concrete diagnostics and making predictions using
modern statistical and Artificial Intelligent (AI)
methods.
A frequentist statistical approach allows an expert
to infer associations between factors (i.e. the
characteristics of injured people: age and gender, the
description of accidents, etc.). It evaluates the
likelihood of previous and current events, therefore
making it possible to prevent fatal accidents from its
occurrence. Studying the causation of events (i.e. a
causal approach) may be a dynamical case. Hence, it
can be beneficial to find the causes of this event and
predict the effect based on a dynamic change of
evidences. It can be implemented efficiently with
Bayesian Networks (BNs) that have learning
capabilities. However, modern implementations of
regression algorithms can also be updated as far as the
conditions of the experiment remain the same. BNs
are more robust regarding a more general type of
updating with the cost of the decision/prediction
dependency on the values of priors.
In accident studies, accident involvement is a
dependent/target variable, whereas accident causes
affect the frequency occurrence of accident
results/outcomes. As this type of the analysis is
statistical, the explanatory variables/causes and other
characteristics of this accident may be correlated.
This particular analysis regarding accidents is helpful
to have “a first look” on data (Cummings, Mcknight,
& Weiss, 2003; Zarikas et al., 2013).
One of the well-known statistical approaches is
regression models. The regression model has been
widely used in various fields, particularly in research
studies on medical issues or traffic accident severity
(Mujalli et al., 2011; Zong et al., 2016). Logistic and
ordered probit statistical methods are used in traffic
1054
Zholdasbayeva, M. and Zarikas, V.
A Comparison of Bayesian and Frequentist Approaches for the Case of Accident and Safety Analysis, as a Precept for All AI Expert Models.
DOI: 10.5220/0010315810541065
In Proceedings of the 13th International Conference on Agents and Artificial Intelligence (ICAART 2021) - Volume 2, pages 1054-1065
ISBN: 978-989-758-484-8
Copyright
c
2021 by SCITEPRESS Science and Technology Publications, Lda. All rights reserved
studies (Fountas et al., 2018). The utilization of
regression models comes with model assumptions
and the underlying relationships (e.g. the linear
relationships between dependent and independent
variables). If any violation takes place, then
likelihood of the severity of accident or injury may
not be estimated (Zong et al., 2016). In medical
sciences, the sample size and missing values may
cause problems with inconsistencies while updating
(Ducher et al., 2013).
From the point of using conventional frequentist
statistics, before executing the regression model it
may be of some importance to study the frequency,
correlation and variance analysis of explanatory
variables (e.g. accident type, severity of accident,
month&day&time of accident and province). It can
provide information about the nature of fatalities
(Shao, Hu, Liu, Chen, & He, 2019). In the work of
Shao, the frequency analysis was executed to find
inconsistencies in data. The correlation analysis was
held to get causal relationships between factors. The
collected data can be distributed uniformly (Chen,
Chou, & Lu, 2013). Thus, the probabilities are
calculated.
Expert judgment can provide an initial
identification of explanatory variables/causes of
accident and target variables. Next, the correlation
analysis can be executed. The chi-square test can be
completed to test the strength of relationships (i.e. or
how statistically significant the observed data vs the
expected one). The goodness-of-fit test can be used to
test how well data fits for the used distribution type
(Ugurlu et al., 2020).
On the other hand, Bayesian networks are an
alternative AI technique, which are used for
investigating causal relationships between variables
and, therefore, predicting outcomes or effects
depending on the number of observations (Conrady,
Jouffe, & Elwert, 2014). In addition, Bayesian
networks have been applied in in environmental,
agricultural, risk management, safety and reliability
(Amrin, Zarikas, & Spitas, 2018; Zarikas et al., 2015;
Zarikas, 2007; Gerstenberger, Christophersen,
Buxton, & Nicol, 2015; Kabir & Papadopoulos, 2019;
Marcos, Wijesiri, Vergotti, & Glória, 2018; Martos,
Pacheco-torres, Ordóñez, & Jadraque-gago, 2016;
Mukashema, Veldkamp, & Vrieling, 2014; Ropero,
Renooij, & Gaag, 2018; Tang, Yi, Yang, & Sun,
2016). It can be used for prognostics and conducting
diagnostics (Amrin et al., 2018; Bapin & Zarikas,
2014; Conrady et al., 2014). Therefore, BNs can be
applied under uncertainty. This distinctive feature
differs Bayesian networks from other statistical
methods (Iqbal, Yin, Hao, Ilyas, & Ali, 2015;
Nannapaneni, Mahadevan, & Rachuri, 2016).
Bayesian networks are utilized for calculating the
posterior probabilities of events A or B. It is based on
building direct acyclic graphs, which in turn allow
studying causal relationships rather than just finding
the associations between variables.
However, “Any complication that creates
problems for one method of inference creates
problems for all alternative methods of generating
inference, just in different ways" Don Rubin
(interview, April 15, 2014). There are the usual
problems of hidden variables, common factors/causes
influencing explanatory variables and target variables
or weakly identifiable parameters, sensitivity to priors
and the outcome etc.
Nevertheless, the advances of BNs include (i)
inference of individual causal effects, (ii) integration
with decision theory, (iii) suitability for post-
treatment variables, sequential treatments and spatial
and temporal data, (iv) modern BNs include learning
and Conditional Probability Tables (CPTs) are filled
automatically from data.
For building the Bayesian network from data, the
nodes and edges are created, whereas nodes represent
variables and edges show the relationships between
these nodes. Let P(A) be the prior probability
distribution of a random variable A and assume that
P(B) is the prior probability distribution of a set of
random variables or a dataset B. Based on the
Bayesian theorem, the posterior probability is
calculated by this formula:
P
A
|
B
PB|A PA/PB (1)
If the value of P
A
|
B
is maximized by the model
structure, then the relative Bayesian structure is
chosen. Bayesian networks allow choosing the
structure from two learning techniques (i.e.
unsupervised and supervised). The unsupervised
learning method has no target variable, whereas with
the supervised learning method it is important to
choose a target variable. For this, a target variable is
a parent node, whereas child nodes are connected
with “causal” relationships. Further, the supervised
learning procedure can be used by creating naïve
model (Figure 1(a)). Another model is called
augmented naïve model (Figure 1(b)), which can be
applied to a small set of data. The higher precision
and accuracy are added by creating new causal
relationships (Montgomery & Runger, 2014).
A Comparison of Bayesian and Frequentist Approaches for the Case of Accident and Safety Analysis, as a Precept for All AI Expert Models
1055
a)
b)
Figure 1: A simplistic naïve and augmented naïve models
(Y target node; N1, N2, N3 – child nodes): a) naïve
model; b) augmented naïve model.
Bayesian networks provide pre-assumptions
(Ducher et al., 2013; Zong et al., 2016). It means that
the prediction is based on preliminarily provided
evidence. Another issue with missing data can be
solved by an automatic imputation in the network.
For a missing value, implemented modern BNs use an
estimated probability of having this missing datum
according to other factors, which depend on it. The
data is updated, which in turn means that the
calculated coefficients in the network are not frozen
(Ducher et al., 2013). It could be also be stated that
new probabilities of the event depending on
evidences are also calculated. As an example for
traffic accident severity, Bayesian networks can be
used to identify factors associated with the severity of
injury (i.e. killed or seriously injured) by inference
(Mujalli et al., 2011).
1.1 Accidents Example
Vertical transportation devices are widely used both
for personal and professional purposes. Accidents
may happen by violating safety rules. Accidents
regarding cranes (Im & Park, 2020; Mccann, 2003;
Raviv, Fishbain, & Shapira, 2017; Shin, 2015;
Swuste, 2013; Swuste et al., 2020) or escalators
(Almeida, Hirzel, Patrão, Fong, & Dütschke, 2012;
Chi, Chang, & Tsou, 2006; Neil, Steele, Huisingh, &
Smith, 2008; Xing, Dissanayake, Lu, Long, & Lou,
2019) are vastly discussed. However, there is a
limited number of research works regarding elevator
accidents. Elevator accidents take place during the
installation or operation stages (Göksenli & Eryürek,
2009; Zarikas et al., 2013). For that, the EU has
published EN 81-80 issuing 74 dangerous occasions
and prevention proactive measures with elevator
fatalities (Zarikas et al., 2013). Nevertheless, these
safety rules may be ignored. It is vital, therefore, to
investigate the causality and reasons behind elevator
accidents. More than 160 thousand people died from
accidents around the world in 2016, from which
France took an all high of 4.6 percent. One example
of these heavy accidents that occurred was in Paris:
An elevator accident happened in 2011 during
the maintenance job in an apartment block. An
elevator fell down to workers. Three people were
injured and another worker was dead (Warren,
2011).
In current studies, elevator accidents in France,
Zarikas, 2020, (“Elevator accidents France”,
Mendeley Data, V1, doi: 10.17632/sstxdjj32h.1) will
be studied and investigated to find causal
relationships between factors (i.e. explanatory factors
such as the characteristics of an injured person or the
date and place of an accident). The violation of safety
rules based on EN 80-81 will be also studied by the
execution of two modelling techniques: the ordinal
regression and supervised learning methods.
In further sections, data collection method and
data arrangement will be represented. Two statistical
models will be shortly presented. The use of the
ordinal regression model will be discussed shortly for
reasons behind its application. The prediction model
based on Bayesian networks will also discussed
regarding the use of supervised learning. In sections
3 and 4, results and discussion of results will be
presented and discussed. In section 5, conclusions and
recommendations will be given at the end based on
the analysis of using both statistical methods to
prevent elevator accidents.
2 METHODOLOGY
At an initial stage, a considerable amount of time was
spent to collect data regarding elevator accidents from
hospitals, health and safety organizations. The
provided data from these organizations was not fully
representative for elevator accidents. Due to this fact,
most data provided by governmental data sources, for
several reasons, was insufficient for current data
analysis:
Reporting system was only valuable to provide
information on all accident types for categories
such as “falls”, “slips” and etc.;
Data regarding elevator accidents was lack of
information such as the characteristics of injured
people or accident type;
ICAART 2021 - 13th International Conference on Agents and Artificial Intelligence
1056
Information on accidents for each year was
insufficient due to the gap in data collection (i.e.
only officially registered cases were in the
system).
Consequently, data concerning elevator
accidents was extracted from EU Open Data Portal. It
has been proven that it was a reliable source for
several research works (Ugurlu et al., 2020;
Gutierrez-osorio & Pedraza, 2019; Juana-espinosa &
Luj, 2020). For France, data was collected for the
period of 18
th
February, 2003 to 17
th
December, 2009.
As a case, only accidents related for the last 6 years
were studied in the current analysis due to fact that:
most data regarding the earlier years was found
to be unreliable and incomplete;
most data regarding recent years was not
collected due to the lack of resources for the
reasons stated previously.
Regardless of the limited resources in current
data collection, more than 200 cases were collected in
Excel sheet including the information on date and
places of an accident, the characteristics of an
elevator accident (accident type, the type of fault),
details about injured people (the severity of an injury,
gender and age), the description of violated rules for
each accident. It should be noted that an individual
information on each accident type and of an injured
person was collected with an overall number of
violated safety rules separately. The sample size was
considered to be sufficient for the purpose of the
current research (Haghighattalab, Chen, Fan, &
Mohammadi, 2019; Zarikas et al., 2013).
2.1 Data Arrangement
Table 1 shows the representation of data regarding
elevator accident in France for 2003-2009. The
resulting table has provided information on
characteristics of an injured person including the ones
with the unknown gender and age of certain injured
group of people:
Date (year and month);
Place;
Sum of injured;
Gender and age of an injured person (i.e.
including unknown people);
Accident type;
Fault type;
Severity of injury;
Safety rules/regulations violated (i.e. an overall
summary and a separate file with rules).
Table 1: Parameters and their possible states.
Paramete
r
Possible states
Date Year and month of elevator
accidents:
2003 - 2009
Place i.e. 28 cities:
Angres, Avignon, Bordeaux, Brest,
Dijon, Dunkirk, Grinoble, Le
Havre, Lille, Limoges, Lyon,
Marseille, Metz, Mulhouse, Nancy,
Nanterre, Nantes, Nice, Paris,
Reims, Rennes, Roubaix, Saint-
Dennis, Saint-Pierre, Strasbourge,
Toulon, Toulouse, Versailles
Sum of injured The overall number of injured
p
eo
p
le
Gender The gender of an injured person:
Female: F
Male: M
Unknown: U (i.e. for a group of
p
eo
p
le
)
Age The gender of an injured person:
Young aged (1-12): C12
Teenagers (13-18): Ad18
Middle aged (19-59): A59
Seniors (aged) (60-85): SA60
Unknown a
g
ed: UNV
Accident type Professional: PRO
Private: PRI
Fault type Types of elevator faulties:
doors, electrocution, falls, fire,
floor, general, landing, machinery,
power, repair, speeding, sudden
sto
p
s, vandalis
m
Severity of
injury
Light: 1 or A
Heavy: 2 or B
Fatal: 3 or C
Light/Heavy:4 or AB
Light/Fatal: 5 or AC
Heavy/Fatal: 6 or BC
Safety rules Rules related to installing,
repairing, modernizing and
maintaining lifts: IRMM
Rules related to risks and hazardous
situations: RHS
Rules related to safety tips for
p
assen
g
ers: STP
2.2 A Statistical Approach
The main objective of the current work is to
investigate the possible links between factors
presented in Table 1 using two statistical approaches
and, as a case, identify those variables, which
contribute most to the violation of safety rules. For
this, two models were constructed for carrying out the
statistical analysis in IBM SPSS Statistics 23 and
BayesiaLab 2020. For model A, an ordinal regression
A Comparison of Bayesian and Frequentist Approaches for the Case of Accident and Safety Analysis, as a Precept for All AI Expert Models
1057
model was used. Model B was constructed by
following the rules of supervised learning.
The descriptive statistics was executed to obtain
the first summary on the data set checking the
distributions (Perez & Exposito, 2009). Frequency
and correlation tables were built for variables in Table
1 to have a preliminary look on data and draw
conclusions based on this information. Some of those
results will be presented on the next section.
Model A was constructed to identify associations
between one dependent and several independent
variables. The first step was to choose a certain
variable, which should be ordinal in nature. In fact,
the severity of injury was labelled as a dependent
variable to investigate on the effect of other variables
contributing to it. Next, several predictors, which
contributed to the location of the model, were
selected. It is worth noting that for this type of the
analysis it is uncertain what predictors should be
considered first. For the start, all possible contributors
were added into the model and, if not useful, then they
were excluded and the model was estimated again.
For this, all independent variables or, differently,
predictors, were included such as gender, age,
accident type, rules violated, fault type. An initial
analysis was implemented without covariates to see if
the location-only model was sufficient to draw
conclusions. For many cases, the location-only model
is adequate. However, scale variables (e.g. year and
month of an accident) could also be included, if
summary data is inadequate from the location-only
model. It was decided to add a scale variable (i.e.
individually) such as year and month or sum of
injured to investigate possible associations. The basic
approach was to include all of those variables and
subtract from the analysis, if no correlation was found
related to dependent variables. The next step was to
choose the link function between complementary log-
log or Cauchit or even logit based on graphical
representation of the dependent variable. For this,
complementary log-log and logit functions are mostly
similar. The choice of the link function for elevator
accidents in France will be explained in further
sections.
The next step was to evaluate the model itself and,
furthermore, to describe the statistics. Model fitting
information was constructed, whereas the log-
likelihoods could be interpreted as chi-square
statistics. Finally, it gives information if the presented
model gives a significant improvement over the
intercept model. The significance level should be less
than 0.05. Therefore, chi-square-based statistics
shows how strong these relationships could be
between factors (Wood, 2005). It is very useful type
of statistics for the analysis of very few categorical
variables. Next, pseudo R-squared statistics was
implemented by measures such of Cox&Snell,
Nagelkerke and McFadden, which represented how
good the model fits based on the proportion of the
variance. Based on this, if R-squared is high, then the
appropriate measure should be chosen. In the final
step, parameter estimates were obtained for the
dependent variable versus predictors. Results and
conclusions on ordinal regression model for each
country will be presented in sections 3 and 4.
For model B a prediction model was built using
supervised learning. Causal relationships or possible
associations between a target variable and
independent variables was found by building
Bayesian networks using supervised learning. A
supervised learning technique was used in order to
find relationships between the severity of injury as a
target variable and the rest of predictors. For this
purpose, a target variable was chosen as discrete. In
order to define the learning set, no test set has been
considered because the presented data was sufficient
for the preliminary analysis. All variables have been
defined as discrete except the sum of injured which
was stated as continuous. Next, the discretization
method was chosen to be “Tree” for a continuous
variable. After, mutual information of arcs was
analyzed between a target variable and each predictor
in order to find which nodes added most information
to the presented model. The main objective of such an
analysis was to decide on a final network structure, so
that Naïve and Augmented Naïve models were
constructed. Network performance based on a target
analysis was investigated to calculate the precision
and reliability of each model. The next step was to run
the structural coefficient analysis and, if it was
necessary, to adjust the structural coefficient and
rerun the model. The last step was to identify causal
relationships by finding the inference between a
target variable and predictors. For that, an adaptive
questionnaire was also included into the model. The
chosen model and results related to this from
supervised learning will be presented in next sections.
3 RESULTS
In this section, models A and B were built separately
due to the difference in associations between
explanatory factors. Only the selected results from
the number of derived ones will be presented in order
to concentrate on important outcomes.
ICAART 2021 - 13th International Conference on Agents and Artificial Intelligence
1058
3.1 Model A Outcomes
Firstly, a quick statistical analysis has been executed
related to elevator accidents consisting of 205 cases
in France. Therefore, Tables 2 10 represent
preliminary obtained results from descriptive
statistics based on frequencies regarding data with
categories on the overall number of injured people
and relevant safety rules, which have been violated.
Based on this statistical analysis, several outcomes
are as follows:
Related to Table 2, the highest number of
elevator accidents took place in 2007, 2008 and
2006 (with 36, 35 and 30 cases respectively).
In Table 3, from the distribution it can be said
that accidents happened frequently in January
with 23 cases. The most frequent accidents
occurred in June with 25 cases. Also, 20 cases
took place in February and September.
As for Table 4, it can be seen that mostly elevator
accidents have occurred in Dunkirk with 12
cases, Angres, Marseille and Toulouse with 11
cases and, lastly, Versailles with 10 cases.
More females (i.e. appx. 42 percent) have been
injured than males (i.e. almost 35 percent) based
on Table 5.
Regarding the accident distribution over age
categories in Table 6, mostly adults from 19 to
59 or A59 have been injured (i.e. 42 percent),
whereas the least percentage was noticed in the
case of young adults from 12 to 18 and the
undefined age group.
In Table 7, injuries from the private use were
prevalent (53.7 percent) than the ones used for
professional purposes (46.3 percent).
As for the severity of injury in Table 8, fatal
injuries defined as “C” were likely to happen
(with 41.5 percent) compared to light or heavy
injuries. However, it is worth noting that heavy
injuries have taken 35.1 percent.
Related to fault types as shown in Table 9,
unexpected accidents related to elevators
occurred with floor leveling problems (i.e. 13.2
percent) or with doors openings (i.e. 11.2
percent) and lift speeding (10.7 percent) based on
Table 9.
As for the violated rules in Table 10, safety
measures have been violated regarding the cases
of IRMM and IRMM/RHS with 30.2 percent
each.
Table 2: The distribution over year of the accident.
Year Frequency Percent Cumulative
p
ercent
2003 28 13.7 13.7
2004 21 10.2 23.9
2005 27 13.2 37.1
2006 30 14.6 51.7
2007 36 17.6 69.3
2008 35 17.1 86.3
2009 28 13.7 100.0
Total 205 100.0
Table 3: The distribution over month of the accident.
Month Frequency Percent Cumulative
p
ercent
Januar
23 11.2 11.2
Februar
y
20 9.8 21.0
March 19 9.3 30.2
April 15 7.3 37.6
May 16 7.8 45.4
June 25 12.2 57.6
Jul
y
13 6.3 63.9
Au
g
ust 12 5.9 69.8
Se
p
tembe
r
20 9.8 79.5
Octobe
r
15 7.3 86.8
Novembe
r
12 5.9 92.7
Decembe
r
15 7.3 100.0
Total 205 100.0
Table 4: The distribution over place of the accident.
Place Frequency Percent Cumulative
p
ercent
An
g
res 11 5.4 5.4
Avi
g
non 7 3.4 8.8
Bordeaux 9 4.4 13.2
Brest 7 3.4 16.6
Di
j
on 4 2.0 18.5
Dunkir
k
12 5.9 24.4
Grenoble 6 2.9 27.3
Le Havre 8 3.9 31.2
Lille 4 2.0 33.2
Limoges 6 2.9 36.1
L
y
on 7 3.4 39.5
Marseille 11 5.4 44.9
Metz 6 2.9 47.8
Mulhouse 7 3.4 51.2
Nanc
y
8 3.9 55.1
Nanterre 8 3.9 59.0
Nantes 6 2.9 62.0
Nice 3 1.5 63.4
Paris 8 3.9 67.3
Reims 5 2.4 69.8
Rennes 8 3.9 73.7
A Comparison of Bayesian and Frequentist Approaches for the Case of Accident and Safety Analysis, as a Precept for All AI Expert Models
1059
Table 4: The distribution over place of the accident (cont.).
Place Frequency Percent Cumulative
p
ercent
Roubaix 7 3.4 77.1
Saint-Denis 9 4.4 81.5
Saint-Pierre 4 2.0 83.4
Strasbour
g
e 7 3.4 86.8
Toulon 6 2.9 89.8
Toulouse 11 5.4 95.1
Versailles 10 4.9 100.0
Total 205 100.0
Table 5: The distribution over gender of an injured person.
Gender Frequency Percent Cumulative
p
ercent
F 87 42.4 42.4
M 71 34.6 77.1
U 47 22.9 100.0
Total 205 100.0
Table 6: The distribution over age of the accident.
Age Frequency Percent Cumulative
p
ercent
A59 86 42.0 42.0
Ad18 11 5.4 47.3
C12 12 5.9 53.2
SA60 45 22.0 75.1
UNV 51 24.9 100.0
Total 205 100.0
Table 7: The distribution over the type of the accident.
Accident
t
yp
e
Frequency Percent Cumulative
p
ercent
PRI 110 53.7 53.7
PRO 95 46.3 100.0
Total 205 100.0
Table 8: The distribution over the severity of an injury.
Severity of
injury
Frequency Percent Cumulative
p
ercent
AB 22 10.7 10.7
ABC 2 1.0 11.7
AC 24 11.7 23.4
B 72 35.1 58.5
C 85 41.5 100.0
Total 205 100.0
Next, Tables 11 - 13 represent the outcomes from
the execution of an ordinal regression model.
Regarding model A, the best model has been found to
be withseverity of injury as a dependent variable,
“sum of injured people” as a scale covariate and
predictor variables such as “gender” and “age”.
Table 9: The distribution over the type of fault.
Fault type Frequency Percent Cumulative
p
ercent
Doors 23 11.2 11.2
Electro-
cution
10 4.9 16.1
Falls 16 7.8 23.9
Fire 14 6.8 30.7
Floo
r
27 13.2 43.9
General 18 8.8 52.7
Landin
g
10 4.9 57.6
Machiner
y
14 6.8 64.4
Powe
r
13 6.3 70.7
Repai
r
19 9.3 80.0
S
p
eedin
g
22 10.7 90.7
Ste
p
s 7 3.4 94.1
Sudden
stops
6 2.9 97.1
Vandalis
m
6 2.9 100.0
Total 205 100.0
Table 10: The distribution over safety rules.
Safety rules Frequency Percent Cumulative
p
ercent
IRMM 62 30.2 30.2
IRMM/RHS 62 30.2 60.5
IRMM/STP 31 15.1 75.6
RHS 28 13.7 89.3
RHS/STP 9 4.4 93.7
STP 13 6.3 100.0
Total 205 100.0
Accident type has no effect on the dependent
variable, assuming that only two categories exist. No
missing data has been detected. The choice of a link
function - logit (i.e. it is similar to log-log function).
Firstly, from the Goodness-of-Fit model it is
concluded that the presented data is consistent. As in
Table 11, it is shown from the model fitting
information that the final model outperforms the
intercept-only model (i.e. a significance level is less
than 0.05). The next step was to verify if the chosen
link functions was reliable. In Table 12, three pseudo
R-squared values have been represented, whereas
Nagelkerke’s is the best with the value of 0.643. The
test of parallel lines shows that our model rejects the
null hypothesis (i.e. a significance level is higher than
0.05).
Returning to parameter estimates in Table 13, it is
seen that a significance level is high for [SOI = 2]
with the negative estimate and [SOI = 6] with the
positive estimate in relation with the severity of
injury. A significance level for [SOI=4] is equal to
0.023, which states that it is lower than 0.05
contributing to the model. [SOI=3] has no effect on
ICAART 2021 - 13th International Conference on Agents and Artificial Intelligence
1060
the model, which exceeds 0.05. As for the location,
[Gender.category = F] is in the higher position of
severity of injury with respect to the reference
category [Gender.category = U]. The category
[Gender.category = M] is in the lower position of
severity of injury with respect to the reference
category [Gender.category = U].
As for age categories, [Age.category = A59] has
a negative associations with the severity of injury
with the highest significance level. As for
[Age.category = SA60], it is located in the lower
position compared to [Age.category = C12], however
showing the highest association with the dependent
variable with respect to the reference category
[Age.category = UNV]. [Age.category = A18] has the
lowest level of association regarding the severity of
injury with respect to the reference category
[Age.category = UNV]. Sum of injured is a covariate.
Table 11: Model A fitting information.
Model
-2 Log
Likelihoo
d
Chi-
Square df Sig.
Intercept
Only
275.942
Final 92.237 183.705 7 .000
Link function: Logit.
Table 12: Model A pseudo R-squared.
Cox and Snell .592
Nagelkerke .643
McFadden .353
Link function: Logit.
Table 13: Model A parameter estimates.
Estimate Sig.
Threshold
[SOI.cat = 2] -4.640 .000
[SOI.cat = 3] -.358 .705
[SOI.cat = 4] 2.097 .023
[SOI.cat = 6] 5.596 .000
L
ocation
SumofIn
j
ure
d
.892 .012
[Gender.cate
g
or
y
=F] -2.245 .010
Estimate Sig.
[Gender.category=M] -2.110 .013
[Gender.cate
g
or
y
=U] 0
a
.
[A
g
e.cate
g
or
y
=A59] -3.286 .001
[A
g
e.cate
g
or
y
=Ad18] -2.355 .028
[Age.category=C12] -3.040 .006
[Age.category=SA60] -3.286 .001
[Age.category=UNV] 0
a
.
Link function: Lo
g
it.
a. This
p
arameter is set to zero because it is redundant.
3.2 Model B Outcomes
For model B, the Bayesian network was constructed.
To investigate causal relationships between the
severity of injury and other predictors, a simple
statistical model was built using the supervised
learning method. As it has been noted down before,
supervised learning methods need a target variable.
For the current analysis, it is important to find factors
which affect most to the severity of injury and which
rules are mostly violated depending on those factors.
Before the start of the initial analysis a dataset
with variables (i.e. a .csv file) was imported to the
Bayesian network. All variables were considered to
be continuous. The variable “Month” in Table 2 has
not been used due to the insufficient input to the
overall model. The violation of rules IRMM, RHS
and STP has been included from the separate file.
The first step was to identify causal relationships
between variables. Figure 2(a,b) illustrates naïve and
augmented naïve models in case of elevator accidents
in France. It can be noted that the violated rules were
presented separately and derived from the overall rule
types of IRMM, RHS and STP. From Figure 2(b), it
is seen that new causal relationships have occurred
between categories:
“1,4,2i” and “Gender.category”;
“1,1,3” and “Age.category”;
“Fault.category” and “Accident.category”;
“Accident.category” and several violated rules
of “1,1,2”, “1,3,1h”, “1,4,1a”, “1,4,1c”, “1,4,2d”,
“1,4,2h”, “1,5,3”, “2,4,4” and “3,5,1”.
The next step was to choose the model type. The
final model choice was augmented naïve model
explained by higher precision and accuracy (Table
14). Through running the structural coefficient
analysis, the value of 0.1 was chosen to increase the
precision of the presented model. The most reliable
network was augmented naïve model with structural
coefficient with the value of 0.1. From Table 14, it is
clear that the precision has increased from
approximately 87 percent using naïve model to
almost 95 percent using augmented naïve model.
Overall log –loss value is equal to 0.1282 with R of
0.9652, which indicates a higher accuracy regarding
the future validation of the model.
After choosing the right network, inferential
analysis has been implemented. In Figure 3, mutual
information with the severity of injury as a target
node is presented. From the initial investigation, it is
clear that the sum of injured has the strongest effect
on the severity of injury with the amount of mutual
information of 0.7552. Certain variables have the
A Comparison of Bayesian and Frequentist Approaches for the Case of Accident and Safety Analysis, as a Precept for All AI Expert Models
1061
amount of mutual information higher than 0.5 such as
“Age.category and “Gender.category”, which have
higher contribution to the model. Next, the mutual
information shared with a target node between the
values of 0.2 to 0.5 is identified by categorical
variables such as “Fault.category” with the value of
0.1550 and “Place.category” with the value of 0.4092.
The contribution to the model with the value of 0.02
to 0.10 is added by the main predictorYear and
violated rules such as “1,3,2”, “1,1,1c” and “1,4,2g”
as shown in Figure 4.
Table 14: Precision of the model.
Model: Naïve (SC=1) Augmented
Naïve (SC=0.1)
Overall
Precision
86.8293% 95.1220%
Mean Precision 93.0098% 97.4771%
Overall
Reliability
86.7843% 95.1296%
Mean
Reliability
92.4860% 96.7967%
Overall Relative
Gini Index
91.1203% 98.9249%
Mean Relative
Gini Index
95.2681% 99.3890%
Overall Relative
Lift Index
96.1551% 99.5718%
Mean Relative
Lift Index
97.9070% 99.7768%
Overall ROC
Index
95.5679% 99.4702%
Mean ROC
Index
97.6655% 99.7259%
Overall
Calibration
Index
83.3532% 78.7967%
Mean
Calibration
Index
77.7412% 84.9008%
Overall Log-
Loss
0.3357 0.1282
Mean Binary
Log-Loss
0.1343 0.0513
R 0.9426 0.9652
R2 0.8886 0.9315
(a)
(b)
Figure 2: Naïve and augmented naïve models for elevator
accidents in France: a) SC = 1 and b) SC = 0.1.
Figure 3: Mutual information with the target node for
elevator accidents in France.
ICAART 2021 - 13th International Conference on Agents and Artificial Intelligence
1062
Figure 4: Prior probabilities based on target correlations
with an adaptive questionnaire.
4 DISCUSSIONS
Based on the implementation of both statistical and
graphical approaches, the current study has
concentrated on finding the causal relationships
between variables. These outcomes are based on a
limited set of data. It is explained by the fact that
reports for occupational injuries are rarely submitted.
Preliminarily, the severity of injury was chosen as a
target variable. Descriptive statistics has given initial
insight on the frequency of data, which was found to
be important to study the inconsistencies. An ordinal
regression model was used to study the likelihood of
the event.
From those results, elevators accidents in France
may have an upward trend. Nevertheless, the
limitation of this trend is that provided data concerns
only a limited number of reports. As the case, those
injured people from elevator accidents, presumably,
tend to ignore providing reports for light or heavy
injuries. Further, elevator accidents took place in
summer and winter periods in France. The safety
rules were violated due to low technical support and
maintenance measures. Elevator accidents occurred
in well-known cities such as Marseille, Toulouse and
Versailles. As for the characteristics of an injured
person, more female users were injured than male
users. It can be explained by the fact that most injuries
were due to: problems with floor leveling or elevator
doors. Further, mostly adults were injured in elevator
accidents. By building Bayesian network model
based on a supervised learning, it has been found that
mostly safety rules in Figure 3 have been broken
related to
providing lift workers with necessary safety
information;
the working process with the machinery;
providing technicians with safety rules in the
machine room.
The outcomes from model A using an ordinal
regression model have shown a high precision based
on the test of parallel lines and goodness-of-fit in
Table 11. Strong correlations have been found
between gender and age of an injured person. The
only limitation of an ordinal regression model was
that the range (i.e. size) of data and the missing values
in data could affect the outcomes of the accident
analysis (Eboli et al., 2020; Ropero et al., 2018; Wu,
Hou, Wen, Liu, & Wu, 2019). It is also explained by
the fact that more independent variables related to the
type of accident should have been included to the
model in order to find the relationships between the
severity of injury and the number of violated safety
rules.
Model B has outperformed the regression model
for several reasons: pre-assumptions are not
necessary, an automatic missing data imputation is
available and new evidences can be included during
execution. Bayesian network model is valuable to
study both quantitative and qualitative data (Ugurlu
et al., 2020; Juned & Bouwer, 2014; Zhou, Diew,
Shan, & Fai, 2018). It can also be noted that the
missing values can be handled by the missing value
imputation during the analysis (Ducher et al., 2013).
By building Bayesian network model, it is possible to
study data with adding new cases and calculating
further probabilities. It is done by providing new
evidences to data and the dependency on the severity
of injury will be shown regarding the strength of
mutual information shared between variables. These
characteristics differ Bayesian network from other
statistical models (Zong et al., 2016). However, the
limitation is that a sufficient amount of data should be
added in order to spot the inconsistencies in data and
study the reasons for unforeseen accidents behind its
occurrence.
5 CONCLUSIONS
In this study, two modelling approaches have been
used. The above-presented outcomes have brought
important insights on elevator accidents. The chosen
explanatory factors that affect the severity of injury
have been studied. The Bayesian network model has
A Comparison of Bayesian and Frequentist Approaches for the Case of Accident and Safety Analysis, as a Precept for All AI Expert Models
1063
been found to be useful for studying accident data for
making predictions. The precision of utilizing
Bayesian network is higher than that of the ordinal
regression model. The conventional statistics is a
valuable tool to observe the correlations between
factors. Further, these results could be useful at the
beginning of building the efficient strategy to prevent
accidents. The limitation of current studies is the lack
of explanatory variables. Further studies are
suggested to them into the model to study the effect
of these factors on injury severity.
In summary, this work epitomizes a good practise
of use for safety analysis. It is not correct to rely on a
single tool for causal analysis (Pearl, 2019). Anyway,
causal analysis still needs further theoretical
development and integration of a combination of
experimental as well observational data together with
a stronger mathematical framework, which is still
under investigation. A framework, that as Judea Pearl
says, should mathematically encapsulate the fact that
symptoms are not causes of diseases. If data via
different methods can derive similar “causal” effects
from different sets of assumptions, then this is very
encouraging and supportive. However, if results from
different methodologies contradict each other, this is
useful also to know. The usage of background expert
knowledge is necessary in this case to disentangle the
discrepancies. This is a precept for everyone wants to
design a meaningful AI expert model.
As a future work we need to improve these
preliminarily results taking into account a larger
dataset and utilizing Rubin’s causal model called the
Potential Outcomes Framework, (Rubin, 2005) to
verify inferences.
REFERENCES
Alkheder, S., Alrukaibi, F., & Aiash, A., 2020. Risk
analysis of traffic accidents’ severities: An application
of three data mining models. ISA Transactions,
ARTICLE IN PRESS.
Almeida, A. De, Hirzel, S., Patrão, C., Fong, J., &
Dütschke, E., 2012. Energy-efficient elevators and
escalators in Europe: An analysis of energy efficiency
potentials and policy measures. Energy & Buildings,
47, 151–158.
Amrin, A., Zarikas, V., & Spitas, C., 2018. Reliability
analysis and functional design using Bayesian networks
generated automatically by an “Idea Algebra”
framework. Reliability Engineering and System Safety,
180 (July), 211–225.
Bapin, Y., & Zarikas, V., 2014. Probabilistic Method for
Estimation of Spinning Reserves in Multi-connected
Power Systems with Bayesian Network-based
Rescheduling Algorithm, ICAART 2019, 840–849.
Chen, F., Chou, S., & Lu, T., 2013. Scenario analysis of the
new energy policy for Taiwan's electricity sector until
2025. Energy Policy, 61, 162–171.
Chi, C., Chang, T., & Tsou, C., 2006. In-depth investigation
of escalator riding accidents in heavy capacity MRT
stations. Accident Analysis and Prevention, 38, 662–
670.
Conrady, S., Jouffe, L., & Elwert, F., 2014. Causality for
Policy Assessment and Impact Analysis - Directed
Acyclic Graphs and Bayesian Networks for Causal
Identification and Estimation.
Cummings, P., Mcknight, B., & Weiss, N. S., 2003.
Matched-pair cohort methods in traffic crash research.
Accident Analysis and Prevention, 35, 131–141.
Person-interviewed, Don Rubin (2014, April 15). Retrieved
from https://rb.gy/bhm0wl
Ducher, M., Kalbacher, E., Combarnous, F., Vilaine, J. F.
De, Mcgregor, B., Fouque, D., & Fauvel, J. P., 2013.
Comparison of a Bayesian Network with a Logistic
Regression Model to Forecast IgA Nephropathy.
BioMed Research International, 2013.
Eboli, L., Forciniti, C., Mazzulla, G., 2020. Factors
influencing accident severity: an analysis by road
accident type. Transportation Research Procedia, 47,
449–456.
Fountas, G., Ch, P., & Mannering, F. L., 2018. Analytic
Methods in Accident Research Analysis of vehicle
accident-injury severities: A comparison of segment-
versus accident-based latent class ordered probit
models with class-probability functions. Analytic
Methods in Accident Research, 18, 15–32.
Gerstenberger, M. C., Christophersen, A., Buxton, R., &
Nicol, A., 2015. Bi-directional risk assessment in
carbon capture and storage with Bayesian Networks.
International Journal of Greenhouse Gas Control, 35,
150–159.
Göksenli, A., & Eryürek, I. B., 2009. Failure analysis of an
elevator drive shaft. Engineering Failure Analysis,
16(4), 1011–1019.
Gregoriades A. and Christodoulides A. (2017). Traffic
Accidents Analysis using Self-Organizing Maps. 19th
International Conference on Enterprise Information
Systems - Volume 1: pages 452-459.
Gutierrez-osorio, C. & Pedraza C., 2019. Modern data
sources and techniques for analysis and forecast of road
accidents : A review. Journal of Traffic and
Transportation Engineering, 7(4), 432-446.
Haghighattalab, S., Chen, A., Fan, Y., & Mohammadi, R.,
2019. Engineering ethics within accident analysis
models. Accident Analysis and Prevention, 129(May),
119–125.
Im, S., & Park, D., 2020. Crane safety standards : Problem
analysis and safety assurance planning. Safety Science,
127(February), 104686.
Iqbal, K., Yin, X., Hao, H., Ilyas, Q. M., & Ali, H., 2015.
An Overview of Bayesian Network Applications in
Uncertain Domains. International Journal of Computer
Theory and Engineering, 7(6).
Juana-espinosa, S. De, & Luj, S., 2020. Open government
data portals in the European Union: A dataset from
ICAART 2021 - 13th International Conference on Agents and Artificial Intelligence
1064
2015 to 2017. Data in Brief, 29.
Juned, M., & Bouwer, I., 2014. Human fatigue’ s effect on
the risk of maritime groundings – A Bayesian Network
modelling approach. Safety Science, 62, 427–440.
Kabir, S., & Papadopoulos, Y., 2019. Applications of
Bayesian networks and Petri nets in safety, reliability
and risk assessments: A review. Safety Science,
115(April 2018), 154–175.
Marcos, D., Wijesiri, B., Vergotti, M., & Glória, E., 2018.
Ecotoxicology and Environmental Safety Assessing
mercury pollution in Amazon River. Ecotoxicology and
Environmental Safety, 166(June), 354–358.
Martos, A., Pacheco-torres, R., Ordóñez, J., & Jadraque-
gago, E., 2016. Towards successful environmental
performance of sustainable cities: Intervening sectors.
A review. Renewable and Sustainable Energy Reviews,
57, 479–495.
Mccann, M., 2003. Deaths in construction related to
personnel lifts, 1992 1999. Journal of Safety
Research, 34, 507-514.
Michalaki, P., & Quddus, M. A., 2015. Exploring the
factors affecting motorway accident severity in
England. Journal of Safety Research, 55, 89–97.
Montgomery, D. C., & Runger, G. C., 2014. Applied
Statistics and Probability for Engineers (6th ed.).
Wiley.
Mujalli, R. O., Calvo, F. J., & O, J. De., 2011. Analysis of
traffic accident injury severity on Spanish rural
highways using Bayesian networks. Accident Analysis
and Prevention, 43(1), 402–411.
Mukashema, A., Veldkamp, A., & Vrieling, A., 2014.
Automated high resolution mapping of coffee in
Rwanda using an expert Bayesian network.
International Journal of Applied Earth Observations
and Geoinformation, 33, 331–340.
Nannapaneni, S., Mahadevan, S., & Rachuri, S., 2016.
Performance evaluation of a manufacturing process
under uncertainty using Bayesian networks. Journal of
Cleaner Production, 113, 947–959.
Neil, J. O., Steele, G. K., Huisingh, C., & Smith, G. A.,
2008. Escalator-related injuries among older adults in
the United States, 1991 2005. Accident Analysis and
Prevention, 40, 527–533.
Perez, S., & Exposito, M., 2009. Descriptive statistics,
37(6), 314–320.
Pearl, J., 2019. The seven tools of causal inference,
with reflections on machine learning. Communication
of the ACM, 62 (3), pp. 54-60.
Raviv, G., Fishbain, B., & Shapira, A., 2017. Analyzing
risk factors in crane-related near-miss and accident
reports. Safety Science, 91, 192–205.
Ropero, R. F., Renooij, S., & Gaag, L. C. Van Der., 2018.
Discretizing environmental data for learning Bayesian-
network classifiers. Ecological Modelling, 368, 391–
403.
Rubin, Donald (2005). "Causal Inference Using Potential
Outcomes". J. Amer. Statist. Assoc. 100 (469): 322–
331.
Shao, B., Hu, Z., Liu, Q., Chen, S., & He, W., 2019. Fatal
accident patterns of building construction activities in
China. Safety Science, 111(September 2017), 253–263.
Shin, I. J., 2015. Factors that affect safety of tower crane
installation / dismantling in construction industry.
Safety Science, 72, 379–390.
Swuste, P., 2013. A “normal accident” with a tower crane?
An accident analysis conducted by the Dutch Safety
Board. Safety Science, 57, 276–282.
Swuste, P., Groeneweg, J., Gulijk, C. Van, Zwaard, W.,
Lemkowitz, S., & Oostendorp, Y., 2020. The future of
safety science. Safety Science, 125(February), 104593.
Tang, C., Yi, Y., Yang, Z., & Sun, J., 2016. Risk analysis
of emergent water pollution accidents based on a
Bayesian Network. Journal of Environmental
Management, 165, 199–205.
Ugurlu F., Yildiz S., Boran, M., Ugurlu O. & Wang, J.,
2020. Analysis of fishing vessel accidents with
Bayesian network and Chi-square methods, Ocean
Engineering, 198 (August 2019), 1-13.
Warren, M. (2011, November 25). Elevator drop kills one
man, injures three. Retrieved October 27, 2020, from
https://www.thelocal.fr/20111125/1864
Wood, G. R. (2005). Confidence and prediction intervals
for generalised linear accident models, 37, 267–273.
Wu, X., Hou, L., Wen, Y., Liu, W., & Wu, Z., 2019.
Research on the relationship between causal factors and
consequences of incidents. Journal of Loss Prevention
in the Process Industries, 61(July), 287–297.
Xing, Y., Dissanayake, S., Lu, J., Long, S., & Lou, Y.,
2019. An analysis of escalator-related injuries in metro
stations in China , 2013 – 2015. Accident Analysis and
Prevention, 122(October 2017), 332–341.
Zarikas, V., Loupis, M., Papanikolaou, N., & Kyritsi, C.,
2013. Statistical survey of elevator accidents in Greece.
Safety Science, 59
, 93–103.
Zarikas, V., E. Papageorgiou, and P. Regner., 2015.
"Bayesian Network Construction using a Fuzzy Rule
Based Approach for Medical Decision Support." Expert
Systems 32 (3): 344-369.
Zarikas, V., 2007. "Modeling Decisions Under Uncertainty
in Adaptive User Interfaces." Universal Access in the
Information Society 6 (1): 87-101.
Zarikas, Vasilios (2020), “Elevator accidents France”,
Mendeley Data, V1, doi: 10.17632/sstxdjj32h.1
Zhou, Q., Diew, Y., Shan, H., & Fai, K., 2018. A fuzzy and
Bayesian network CREAM model for human reliability
analysis – The case of tanker shipping. Safety Science,
105(February), 149–157.
Zong, F., Xu, H., & Zhang, H., 2013. Prediction for Traffic
Accident Severity: Comparing Bayesian Network and
Regression Models. Mathematical Problems in
Engineering.
A Comparison of Bayesian and Frequentist Approaches for the Case of Accident and Safety Analysis, as a Precept for All AI Expert Models
1065