Advanced Predictive Analytics for Aircraft Accident Severity

Using Deep Learning

S. Thenmalar, B. Jaya Krishna Yadav and D. Venkat Kishore

Department of NWC, SRM Institute of Science and Technology, Chennai, Tamil Nadu, India

Keywords: Accident Severity, Aircraft Safety, Convolutional Neural Networks, Deep Learning, Feature Engineering,

Machine Learning, Predictive Analytics.

Abstract: Aviation safety is seriously threatened by aircraft accidents, which calls for sophisticated prediction models

for precise severity categorization and risk reduction. The intricate, nonlinear linkages found in accident data

are frequently missed by traditional approaches, resulting in less-than-ideal forecasts and postponed

preventive actions. Our research uses machine learning models and deep learning techniques to create a

sophisticated forecasting system for classifying the severity of plane accidents. To increase the dataset's

prediction capacity, we use feature engineering approaches and conduct in-depth Exploratory Data Analysis

(EDA) on historical accident data. We apply the XGB-Classifier after thorough processing and data

organizing, and it achieves an impressive train accuracy of 100% to evaluate accuracy of 95.9%. We create a

model of Convolutional Neural Networks (to improve performance even further, and it first achieves an

accurate training of 97.66% and an accuracy in tests of 93.6%. The model's accuracy is enhanced for both

low-severity incidents (train: 99.13%, test: 96.17%) and high-severity accidents (train: 99.53%, test: 96.93%)

by hyperparameter tuning and severity-specific optimization. By combining both severity levels, the final

CNN model shows a strong predictive performance with an improved train precision of 98.30% and test

accuracy of 97.93%. These results demonstrate how well-structured preprocessing, feature engineering, and

sophisticated deep learning architectures work together to produce a potent tool for immediate accident

severity assessment and aviation safety improvement.

1 INTRODUCTION

Since the beginning of aviation, there has been a

strong focus on aircraft safety, with ongoing efforts to

reduce the likelihood of accidents and enhance

predictive techniques. In the past, accident

investigations used manual research of pilot reports,

flight data, and black box recordings to identify

contributory variables and recommend safety

enhancements. Improved understanding of accident

Over the years, this has been made possible with the

developments in data gathering, sensor technology,

and statistical analysis. Yet aviation accidents do

occur despite stringent safety regulations and

enhanced monitoring systems, warranting better

prediction methods. As the amount of historical

accident data grows, a combination of machine

learning (ML) and deep learning (DL) techniques

presents an opportunity to enhance the accuracy and

efficiency of accident severity classification.

Traditional aviation accident prediction models are

mostly based on rule-based classification technology

and other statistics 16, 17. The prediction capabilities

of these approaches are usually poor, as they often

overlook the complex relationships and nonlinear

patterns present in large-scale aviation data. Several

models such as logistical regression and decision

trees have been applied in order to classify the

severity of accidents, nonetheless, their performance

is bounded due to feature selection problems and

model interpretability. Moreover, most current

approaches focus on individual accident causes

instead as a global forecast according to severity

which leads to suboptimal detection and prevention

policies. In addition, many the research do not

sufficiently address data imbalance, which can result

in biased severity classes, further impeding the

practical applicability of these frameworks in real-

world settings.

To address these challenges, this research

proposes an advanced predictive analytics framework

for determining the severity of flight accidents via

648

Thenmalar, S., Yadav, B. J. K. and Kishore, D. V.

Advanced Predictive Analytics for Aircraft Accident Severity Using Deep Learning.

DOI: 10.5220/0013903400004919

Paper published under CC license (CC BY-NC-ND 4.0)

In Proceedings of the 1st International Conference on Research and Development in Information, Communication, and Computing Technologies (ICRDICCT‘25 2025) - Volume 3, pages

648-656

ISBN: 978-989-758-777-1

applying deep learning techniques. To ensure dataset

quality, the analysis begins with feature engineering

and in-depth exploratory data analysis (EDA). This

research was driven by the industry need to accurately

detect the severity of accidents in order to improve

aviation safety and mitigate risks. With flying

operations increasing all around the world, even with

highly sophisticated safety systems in place, there

remains a risk of an accident occurring. A highly

predictive model can tremendously help in early

identification and allow airlines and civil aviation

authorities to take preventable safety measures. Yet,

this research intends to bridge the gap between a

conceptual approach to safety evaluation and real

time disaster forecasts through deep learning

architectures, ensuring a better decision-support

system for risk management of airlines in the

operations phase. This research aims to build on a

new and novel classification technique to classify

level of the flight accidents incidence of severity of

the flight accidents, the recent deep learning

applications, model have been well known, however

these models are so complex that they do not

implement a structure preprocessing or feature

engineering techniques. Not only does it improve the

prediction accuracy, but combining the ML and DL

models also makes them explainable and flexible for

use in the real world. The work underscores the

importance of advanced AI-powered statistics in

terms of flight safety and demonstrates how deep

learning models could transform accident prevention

strategies. The results are in line with an overarching

goal to minimize accident-related deaths and improve

flight safety through judicious, data-oriented

insights.

2 LITERATURE SURVEY

Several works considered the application of

statistical and machine learning techniques for

evaluating and classifying flight accident severity6.

Early approaches to modelling accident severity with

historical aviation data primarily used traditional

statistical models, such as logistic regression,

decision trees and Bayesian classifiers. To determine

the main causes of accidents, including weather, pilot

expertise, and aircraft type, researchers have used

feature selection techniques. Nevertheless, these

models frequently have trouble processing high-

dimensional data and identifying intricate

correlations between variables. In order to enhance

predictive performance, some research also tried to

employ ensemble techniques like Random Forest and

Gradient Boosting; nonetheless, the outcomes were

frequently limited by unbalanced datasets and the

incapacity to generalize effectively across various

accident circumstances. Additionally, even though

these models produced findings that could be

understood, their accuracy was still below par,

requiring more advanced techniques to improve

predictive power. Madeira et al. uses text preliminary

processing, Natural Language Processing (NLP),

semi-supervised Label Spreading (LS), and

supervised Support Vector Machine (SVM) to

discover and categorize human component categories

from aircraft incident reports. Bayesian optimization

techniques and random search enhance model

performance. With Micro F1 scores of 0.900, 0.779,

and 0.875, the top predictive models had strong

prediction abilities. A bigger data set should be

considered in future studies. Zhang et al. in order to

forecast unfavorable outcomes, this research analyses

National Transportation Safety Board (NTSB)

accident investigation records using data mining and

sequential deep learning algorithms. In order to

develop models for classification for passenger

airlines, the researchers concentrate on written

information that defines event sequences.

Dong et al. suggests identifying causative

elements through the use of deep learning-based

models. An open-source natural language model, an

attention-based long short-term memory model, and

200,000 incident reports from the Aviation Safety

Reporting System (ASRS) are among the data sets

utilized. The suggested method is a viable strategy for

enhancing aircraft safety since it is more precise and

flexible than conventional machine learning

techniques. In order to better analyze aviation

accident data, researchers have begun using neural

networks, namely Convolutional Neural Networks

(CNNs) and Recurrent Neural Networks (RNNs), as

deep learning has become more popular. CNNs have

been demonstrated in many researches works to boost

classification performance by identifying important

patterns in structured accident datasets. To enhance

accuracy and robustness, other studies have applied

hybrid models by combining deep learning and

traditional machine learning methods. Nevertheless,

their contemporary usage for airline safety

management operations is limited due to the nature of

majority of these techniques focusing on large-scale

accident analysis instead of clear-cut impact

classification. Lastly, since deep learning models tend

to require more fine tuning and processing power, it

can make it hard to adopt them in real-time flight

safety systems. Nonetheless, there is still a need for a

comprehensive and high prediction model to classify

Advanced Predictive Analytics for Aircraft Accident Severity Using Deep Learning

649

the severity of accidents accurately, despite these

advancements. Zhang et al., (2021) If the

shortcomings of previous work are to be tackled, this

model would combine machine learning and deep

learning methods.

3 METHODOLOGY

First, we have a structured method of doing

advanced predictive analysis over the ensembles of

neural networks that reflects a state-of-the-art

behavior of our data-driven approach to aircraft

accidents severity analysis. The process starts with an

aggregation of data from trusted sources like the

FAA, NTSB, ASN and BAAA, with data points

including but not limited to: recorder data, aircraft

parameters, pilot and crew info, atmospheric data and

ATC communications, and historical accident

reports. Data is preprocessed, in which Missing

Value treatment, Duplicate Removing, Normalizing

etc. are done. Data preprocessing techniques such as

Principal Component Analysis (PCA) and Recursive

Feature Elimination (RFE) are useful for determining

the most relevant features, while Natural Language

Processing (NLP) techniques like TF-IDF and Word

Embeddings (Word2Vec, BERT) can be applied to

analyze textual accident reports. You are using only

up to October 2023 data for training your models. In

addition, Recurrent Neural Networks (RNNs) and

Long Short-Term Memory (LSTM) networks can be

used for sequential flight data, which extract temporal

dependencies, whereas CNNs are more suitable for

processing image-based data like weather maps and

aircraft damage assessments. Figure 1 shows working

methodology.

Figure 1: Working methodology.

It is trained on labelled datasets to predict the

severity of accidents as minor, serious or fatal by

leveraging advanced techniques like transfer

learning, Bayesian Optimization for hyperparameter

tuning and assembling for accuracy improvement.

The performance is evaluated using several metrics

such as accuracy, precision, recall, F1-score, and

AUC-ROC, while validation techniques such as k-

fold cross-validation ensure model generalization. To

mitigate the black-box issue within deep learning,

interpretability techniques (e.g. Shapley Additive

explanations (SHAP)) are utilized to understand

crucial contributing parameters, providing useful

insights to enhance aviation safety. The validated

predictive model is incorporated into real-time

aviation monitoring systems, providing early

warning capabilities to air traffic control, flight

management, and airline safety systems.

Incorporating Real-time Data Continuous retraining

on incoming live data allows the model to adapt

quickly to evolving risks typical of the aviation

industry, improving proactive risk assessment

capabilities and emergency response strategies. This

predictive framework based on deep learning offers a

valuable approach for mitigating aviation accident

severity, enhancing regulatory adherence, and

bolstering the overall safety framework in the

domain of aviation.

3.1 Data Collection

To make sure the data set used in this study was well-

balanced for model evaluation, the 10,000 records

were separated into7,000 training samples and 3,000

test samples. All records contain in the accident data

vital operational and environmental aspects that will

influence the severity of the accident. Among the key

features in this dataset is Safety Score, which is the

numeric representation of the aircraft's overall safety

level, and Severity, denoting the target variable for

classification. Another important feature are Days

Since Inspection (shows service history) and Total

Safety Complaints (previous safety issues). These

provide related to the external effects acting on the

airplane efficacy and manage stability. Also, along

with Accident Type Code which categorizes a range

of accident types, Cabin Temperature was added as a

variable affecting the flight stage. It also features

Violations, a list of any prior infractions made in

relation to the aircraft and Max Elevation provides an

insight into the height of which the event took place.

Finally, Accident ID acts as an identification number

for every incident, and Adverse Weather Metric takes

into consideration the dangerous weather

ICRDICCT‘25 2025 - INTERNATIONAL CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION,

COMMUNICATION, AND COMPUTING TECHNOLOGIES

650

circumstances that contributed to the accident.

Before the models are trained, this dataset which is

rich in a variety of aviation-related parameters is

preprocessed and improved through feature

engineering in order to increase predicted accuracy.

3.2 Exploratory Data Analysis (EDA)

In order to better grasp the dataset, a variety of EDA

and visualization techniques were used to assess class

separability, identify anomalies, and comprehend

feature distribution (as shown in Figure 2). The

spatial distribution of numerical characteristics was

examined using boxplots and histograms. The results

showed that Adverse Weather Measure and Total

Safety Complaints had a significant right skew with

many outliers, suggesting that these variables might

not be reliable indicators of accident severity. Safety

Score, Days Since Inspection, Accident Type Code

and Violations all showed substantial correlations

with accident severity, indicating that these variables

are essential for model training. Correlation heatmaps

also assisted in identifying dependencies between

features. Relationships between variables were

investigated using pair plots and scatter plots, which

showed that some features clearly separated across

accident groups while others showed a great deal of

overlap. Variable distributions across severity levels

were compared using violin plots, which showed that

while Highly Fatal and Damaging and Minor Damage

and Injuries could be clearly distinguished from one

another, the other two classes showed substantial

overlap, which made classification more difficult.

Figure 2: Creating features for analysis.

Additionally, box plots aided in the detection and

management of outliers, especially in variables with

extreme values like Adverse Weather Metric and

Total Safety Complaints. Safety Score, Days Since

Inspection, Accident Type Code, and Regulations

were the most significant criteria in evaluating the

severity of the accident, according to feature

importance analysis using machine learning models.

Two severity classes were found to be well-separated,

while the other two showed significant overlap, as

confirmed by the use of Principal Component

Analysis (PCA) to visualize feature grouping in a

lower-dimensional space. Figure 3 shows

Exploratory Data Analysis. In order to improve

model performance, redundant or less important

features were either eliminated or altered, while the

most pertinent qualities were kept for predictive

modelling. Overall, the EDA results served as a guide

for choosing features and development process.

Figure 3: Exploratory data analysis.

3.3 Data Preprocessing

A strong preprocessing pipeline was put in place to

guarantee that the dataset was properly organized and

training-optimized. In the first stage, missing values

were handled by filling in categorical values with the

most frequent class and imputed numerical

characteristics using the value of the median to avoid

data bias. Box plots and the IQR (Interquartile

Range) approach were used to identify outliers.

Extreme values in highly skewed features, such as

Adverse Weather Measure and Total Safety

Complaints, were either clipped or changed. In order

to prevent redundancy in the final model, highly

correlated features were examined after the dataset

Advanced Predictive Analytics for Aircraft Accident Severity Using Deep Learning

651

was examined for skewness and multicollinearity.

Before encoding categorical variables like Accident

type code were also examined to make sure they were

adequately represented in all classes. A key factor in

enhancing model performance was feature

engineering. To gain a greater understanding of

aviation safety situations, existing traits were

combined to create new features. To better assess an

aircraft's safety reliability, for example, Safety Score

and Violations were integrated to form the Risk Index

feature. Likewise, to more accurately depict flying

stability, Control Metric and Turbulence in forces

were converted into a Stability Score. Recursive

feature elimination (RFE) and mutual information are

two feature selection strategies that were used to

reduce dimensionality and increase computational

efficiency by keeping just the most important

predictors. Safety score, Days Since inspection,

Accident type code and Violations were among the

features that were chosen because they were found to

be very relevant in the assessment of severity.

To guarantee a fair model evaluation, the dataset was

divided into training (70%) and testing (30%) after

feature selection was finished. The following was the

train-test split formula:

𝑇𝑟𝑎𝑖𝑛 𝑆𝑖𝑧𝑒 =





𝑋 𝑇𝑜𝑡𝑎𝑙 𝐷𝑎𝑡𝑎,

𝑇𝑒𝑠𝑡 𝑆𝑖𝑧𝑒 =





𝑋 𝑇𝑜𝑡𝑎𝑙 𝐷𝑎𝑡𝑎 (1)

where 3,000 samples were set aside for testing and

7,000 samples for training. Standard Scaler from sk

learn. preprocessing was then used to standardize

numerical features in order to improve convergence

in deep learning models and normalize data

distribution. The following was the standardization

formula,

𝑋







(2)

where 𝜇 is mean of the feature and 𝜎 its standard

deviation. Certain categorical variables, specifically

in the case of Accident type code, can be transformed

into a format that can be supplied to the machine,

without making any ordinal associations, such that

they pertain to which one is not dependent upon the

other; this process is widely known as One-hot

encoding It enabled models to work with this value

types by transforming the category attributes into

many binary columns. After initial processing,

information stacking was applied to merge multiple

models and utilize their collective predictive power.

In this way, CNN was used as a meta-model over the

XGB-Classifier base learner in the stacking process

for improving the severity classification. The stacking

formula that was used:

𝐹𝑖𝑛𝑎𝑙 𝑃𝑟𝑒𝑑𝑖𝑐𝑡𝑖𝑜𝑛 = 𝛼 𝑋 𝑀𝑜𝑑𝑒𝑙



+ 𝛽 𝑋 𝑀𝑜𝑑𝑒𝑙



(3)

where α, β denote the weight coefficients

optimized during model training. This hybrid model

ensured a trade-off between high fidelity of the deep

learning and lack of interpretability of classical ML.

By applying these processing and feature extraction

methods, we successfully optimized the dataset and

achieved significant improvements in accuracy and

generalization performance.

3.4 XG-Boost Classifier

For flight accident severity classification, the stated

research utilized the Extreme Gradient Boosting

(XG-Boost) Classifier, which is a powerful ensemble

learning method based on gradient boosting. Due to

its commonly used way of achieving effectiveness,

scale, and high accuracy in prediction, XG-Boost is a

prominent choice of for handling structured tabular

data. It does this by fitting a sequence of weak

architecture- Trees additively, where each new tree

is built to correct the errors made by the previously

fitted trees. This training works on the principle of

boosting, which can help to lower loss and improve

prediction performance of the model. The objective

function is maximized by the XG-Boost algorithm

and this consists of a regularization term and a loss

function which are defined as follows:

𝑂𝑏𝑗 =

∑

𝐿𝑦



,𝑦



+

∑

Ω𝑓













(4)

where L\left (y_i, {\hat{y}} _j\right) is the loss

function that determines how well the prediction

captures the true value, and \Omega(f_j) is the

regularization term that penalizes model complexity

to avoid overfitting. This research used Grid-Search-

CV to optimize hyperparameters: regularization

terms, learning rate, maximum tree depth, and

number of estimators among them. The final tuned

XGB-Classifier achieved good generalization with a

training accuracy of 100% and a test accuracy of

95.9% while maintaining high precision when

classifying accident severities. One of the key

strengths of XG-Boost is its ability to handle high-

dimensional data, feature interactions, and missing

values efficiently. Also, the model includes advanced

methods such as L1 / L2 regularization, column

subsampling, and row subsampling that help it

generalize better without a huge computational cost.

ICRDICCT‘25 2025 - INTERNATIONAL CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION,

COMMUNICATION, AND COMPUTING TECHNOLOGIES

652

To achieve that only the most relevant predictors

affect the classification, XG-Boost also involves a

weighted quantile sketch algorithm for fast feature

selection.

4 4 CONVOLUTIONAL NEURAL

NETWORKS

In this research, the Convolutional Neural Network

(CNN), a potent deep learning model, was employed

to categorize the severity of flight accidents.

Although CNNs are frequently employed for image

recognition, they can also process organized tabular

data due to their versatility. In this experiment, CNN

was trained to recognize important linkages that

affect accident severity by learning intricate patterns

from numerical characteristics. CNNs are extremely

effective for tasks such as classification involving

several interacting variables because, in contrast to

typical machine learning models, they can extract

deep representations and capture non-linear

connections. Convolutional layers, functions for

activation, batch normalization, loss layers, and fully

interconnected layers were among the several layers

that made up the CNN architecture utilized in this

investigation (as shown in Figure 4). Every feature in

the structured incident dataset was handled as a

distinct dimension in an input tensor by the input

layer. In order to discover important parameters

influencing aviation safety, the layers of convolution

applied filters to the data in order to capture spatial

hierarchies. To ensure that the system could learn

intricate linkages, non-linearity was introduced using

activation functions like ReLU (Rectified Linear

Unit). While dropout layers prevented overfitting by

periodically deactivating neurons during training,

batch normalization was used to maintain learning.

Figure 4: CNN model architecture.

The CNN model had an initial training accuracy of

97.66% and a test accuracy of 93.59%. This result

indicated the CNN could generalize to unseen data

while retaining high accuracy. Nevertheless, the CNN

was further tuned by noting that the classifications

could be further optimized if divided into High

Severity and Low Severity. I made this optimization

under the assumption that training the CNNs

separately on the minor incidents and catastrophic

disaster would increase the classification quality

since my investigations had shown the accidents have

distinct patterns. For Low Severity, I accomplished

the following: I trained the CNNs specifically on

incidents falling under the Low-Risk Incidents and

Minor Damage and Injuries. Consequently, the CNN

could focus the technical and behavioral indicators

that are not apparent in high disaster levels, such as

minor safety noncompliance and control metrics

fluctuations. After fine-tuning, the CNNs training

was 99.13%, with a test accuracy of 96.17%. I

optimized this to train on the incidents that are

categorized Major Safety Violations, Highly Fatal

and Damaging, and so on. Common in these

incidences are extreme weather, serious mechanical

malfunctions, and significant safety non-compliance.

The fine-tuning afforded the CNNs the capabilities to

capture high-risk indicator hence the remarkable

training accuracy of 99.53% and a test accuracy of

96.93%. The hybrid approach ensured that the CNN

leant from a significant number of low and

catastrophic disaster cases to secure the optimal

classification accuracy balance. The extracted final

CNN outperformed the initial CNN and XGB-

Classifier as the results show. The training accuracy

was 98.30%, and the test accuracy was 97.93%. CNN

had automatic feature hierarchy extraction, not

requiring feature engineering to construct complex

capacity patterns. CNNs saved the process step of

selecting features from a myriad of features in regular

models that are complex preprocessing routines.

CNNs learned and knew the feature representations

dynamically and were well prepared for the unseen

examples.

Moreover, crucial for CNN’s success was its

ability to handle class disparities. Traditional models

commonly have unbalanced datasets, where fewer

cases may be present in some severity categories than

others. CNN model used data augmentation

techniques such as weighted loss functions and

synthetic sampling to ensure balanced training over

all accident severity categories. The model,

therefore, was capable of producing precise forecasts

for each severity level without favoring the majority

class. The stacking framework at the same time

Advanced Predictive Analytics for Aircraft Accident Severity Using Deep Learning

653

integrated the benefits of CNN and XGB-Classifier

and achieved even better prediction performance.

CNN was given the abilities of deep learning to

capture complex, non-linear interactions whereas

XGB−Classifier provided structured learning and

robust feature selection. The merger of the two

models also showcased the remarkable potential of

deep learning in the field of aviation safety research,

as it produced a highly accurate and reliable accident

severity prediction system. The CNN-based

categorization system that was developed in this

work represents a significant advancement in

statistical analysis used for predicting the severity of

flight accidents. By combining deep learning with

specific fine-tuning procedures, the model achieved

high accuracy and generalization, providing a

powerful tool for enhancing proactive risk

management and aviation safety assessments.

5 RESULTS

The performance of the proposed Aircraft Accident

Severity Prediction Model was evaluated using

accuracy, loss, confusion matrix, and model

comparison. The two main models used for the

research, CNN and XGB-Classifier, were both

optimized to maximize on classification accuracy.

CNN was further improved by dividing the

information into cases of High Severity and Low

Severity, which made it possible to comprehend

accident severity patterns in greater detail. To

ascertain the best method for forecasting the severity

of aircraft accidents, the output of various models was

compared. To maximize performance (as shown in

Figure 5), a stacking method was used to train the

XGB-Classifier model on the preprocessed dataset.

Following training, it demonstrated remarkable 100%

train accuracy and 95.9% test accuracy. The model

seems to have successfully captured intricate

correlations in the data, as seen by the nearly flawless

training accuracy. The model did marginally worse

on unknown data, however the test accuracy was still

high and suggested some overfitting. The XGB-

Classifier's confusion matrix revealed that while it

properly classified the majority of cases, there were a

few small misclassifications in severity classes that

overlapped, especially in the categories for moderate

damage and mild injury.

Figure 5: Train and test accuracy of various models.

Prior to any adjustments, the original CNN model

had a 93.59% test accuracy and a 97.66% training

accuracy. CNN had the advantage of automatically

learning feature representations, which allowed it to

identify deeper patterns in the data, even though its

accuracy was marginally lower than that of XGB-

Classifier. While the validation loss stopped slightly,

indicating the need for fine-tuning, the loss curves

exhibited a consistent reduction during training,

showing adequate convergence. Subsequent

examination of the confusion matrix revealed that,

like the XGB-Classifier, there was considerable

overlap in the Medium Damage and Minor Injury

groups, but that the Highly Fatal or Damaging

instances were accurately classified. The CNN model

was adjusted independently for High Severity and

Low Severity instances in order to overcome these

classification issues. The CNN model obtained an

accuracy in training of 99.13% and a test accuracy of

96.17% when trained exclusively for Low Severity

accidents. This showed that by concentrating on

small mishaps, the model could distinguish between

them more successfully, reducing the number of

incorrect classifications. In a similar vein, the CNN

model demonstrated a high degree of ability to

differentiate among fatal collisions and other severe

occurrences, achieving training precision of 99.53%

and test accuracy of 96.93% for High Severity cases.

With a training success rate of 98.47% and a test

accuracy of 98.10%, the final combined CNN model

which combined both Low Severity and High

Severity tuning strategies achieved the best overall

accuracy.

ICRDICCT‘25 2025 - INTERNATIONAL CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION,

COMMUNICATION, AND COMPUTING TECHNOLOGIES

654

Figure 6: Confusion matrix of various class.

Variations in accident severity across all

categories were well-represented by this model (as

shown in Figure 6). This CNN model considerably

decreased misclassification errors, especially in

moderate and mild accident classes, that were

previously difficult for both XGB-Classifier and the

original CNN model, according to the confusion

matrix comparison. Furthermore, this final CNN

model's loss curves demonstrated smooth

convergence, suggesting improved generalization

over previous iterations. Accuracy and loss plots for

each model were compared in order to further

validate the model's performance. Because of its

flawless training accuracy, the XGB-Classifier model

showed evidence of minor overfitting despite its

quick convergence The CNN models, however,

showed a much-more-steady increase in accuracy,

and loss continued to decrease across epochs. The

adjusted CNN models with the lowest loss curves

were those of High Sensitivity and Little Severity

brackets, which were found to be the best

compromise. The final merged CNN model

demonstrated the most reliable performance with

high accuracy across all severity classifications. As

shown in the comparison of CNN model and XGB-

Classifier, the application of deep learning in

predicting the severity of flight accidents yields

better benefits.

6 CONCLUSIONS

This study provides a comprehensive approach to

predicting the severities of flight accidents using

state-of-the-art machine learning and deep learning

techniques. The dataset was significantly

preprocessed, feature engineered, and exploratory

data analysis (EDA) was carried out to further

enhance model performance. So, these research

articles managing to work with CNN(XGB-

Classifier) which is very capable of giving good

prediction on accident severity. CNN achieved

higher accuracy on the test dataset and had lower

overfitting as compared to XGB-Classifier, which

achieved 95.9% test accuracy on the test dataset.

While we achieved test accuracy levels of 96.17%

and 96.93% (obtaining a loss of 0.087706 & 0.067773

respectively) through further fine-tuning by

separating the two classes of High Severity and Low

Severity cases, we found particularly significant

improvements. The final combination CNN model,

which utilized both severity levels, produced the best

test accuracy of 97.93%, making it the most

successful solution. Through this research, it is

demonstrated that neural network models,

particularly CNN, can learn complex interactions in

aircraft accident data, and therefore serve as a

reliable method for severity classification. The

findings of this research have significant

implications for aviation safety, as they enable

proactive risk assessment and accident prevention

strategies. In future studies, real-time flight data and

accredited publication from airlines can be added to

the model so as to better predictive capabilities.

Urgent: performance can be improved by using

ensemble methods which combine deep learning

with other AI driven methods such as transformer-

based architectures and reinforcement learning. The

utilization of explainable AI (XAI) techniques will

contribute towards enhanced transparency in

decision-making for aviation authorities as well.

Integration of weather patterns as well as pilot

behavior analytics and maintenance records can

enrich the model and may even make a fully

automated and intelligent incident forecasting

system, aiding aviation safety and risk avoidance, a

reality.

REFERENCES

Alahmari, Fahad, Arshi Naim, and Hamed Alqahtani. "E-

Learning modeling technique and convolution neural

networks in online education." IoT-enabled

convolutional neural networks: Techniques and

applications. River Publishers, 2023. 261-295.

Bai, Yuhan. "RELU-function and derived function review."

SHS web of conferences. Vol. 144. EDP Sciences,

2022.

Berhanu, Yetay, Esayas Alemayehu, and Dietrich Schröder.

"Examining Car Accident Prediction Techniques and

Road Traffic Congestion: A Comparative Analysis of

Advanced Predictive Analytics for Aircraft Accident Severity Using Deep Learning

655

Road Safety and Prevention of World Challenges in

Low‐Income and High‐Income Countries." Journal of

advanced transportation 2023.1 (2023): 6643412.

Boddapati, Mohan Sai Dinesh, et al. "Creating a protected

virtual learning space: a comprehensive strategy for

security and user experience in online

education." International Conference on Cognit-ive

Computing and Cyber Physical Systems. Cham:

Springer Nature Switzerland, 2023.

Chand, Arun, S. Jayesh, and A. B. Bhasi. "Road traffic

accidents: An overview of data sources, analysis

techniques and contributing factors." Materials Today:

Proceedings 47 (2021): 5135-5141.

Courtney, Matthew B. "Exploratory data analysis in

schools: A logic model to guide implementation."

International Journal of Education Policy and

Leadership 17.4 (2021): 14-pp.

De Lutio, Riccardo, et al. "Learning graph regularisation for

guided super-resolution." Proceedings of the ieee/cvf

conference on computer vision and pattern recognition.

2022.

Deveci, Muhammet, et al. "A decision support system for

reducing the strategic risk in the schedule building

process for network carrier airline operations." Annals

of Operations Research (2022): 1-37.

Di Mauro, Mario, et al. "Supervised feature selection

techniques in network intrusion detection: A critical

review." Engineering Applications of Artificial

Intelligence 101 (2021): 104216.

Ding, Ning, et al. "Parameter-efficient fine-tuning of large-

scale pre-trained language models." Nature Machine

Intelligence 5.3 (2023): 220-235.

Dong, Tianxi, et al. "Identifying incident causal factors to

improve aviation transportation safety: Proposing a

deep learning approach." Journal of advanced

transportation 2021.1 (2021): 5540046.

Helgo, Malene. "Deep learning and machine learning

algorithms for enhanced aircraft maintenance and flight

data analysis." Journal of Robotics Spectrum 1 (2023):

090-099.

Jia, Weikuan, et al. "Feature dimensionality reduction: a

review." Complex & Intelligent Systems 8.3 (2022):

2663-2693.

Kumar, Pradeep, et al. "Classification of imbalanced data:

review of methods and applications." IOP

conference series: materials science and engine-ering.

Vol. 1099. No. 1. IOP Publishing, 2021.

Li, Xuan, et al. "From features engineering to scenarios

engineering for trustworthy AI: I&I, C&C, and V&V."

IEEE Intelligent Systems 37.4 (2022): 18-26.

Liu, Huipeng, Minghua Hu, and Lei Yang. "A new risk

level identification model for aviation safety."

Engineering Applications of Artificial Intelligen-ce

136 (2024): 108901.

Madeira, Tomás, et al. "Machine learning and natural

language processing for prediction of human factors in

aviation incident reports." Aerospace 8.2 (2021): 47.

Mazarei, Arefeh, et al. "Online boxplot derived outlier

detection." International Journal of Data Science and

Analytics (2024): 1-15.

Rodriguez, Jose, et al. "Latest advances of model predictive

control in electrical drives Part I: Basic concepts and

advanced strategies." IEEE Transactions on Power

Electronics 37.4 (2021): 3927-3942.

Shreffler, Jacob, and Martin R. Huecker. "Exploratory data

analysis: Frequencies, descriptive statistics,

histograms, and boxplots." StatPearls [Internet].

StatPearls Publishing, 2023.

Singh, Dalwinder, and Birmohan Singh. "Feature wise

normalization: An effective way of normalizing data."

Pattern Recognition 122 (2022): 108307.

Valente, Francisco, et al. "Interpretability, personalization

and reliability of a machine learning based clinical

decision support system." Data Mining and Knowledge

Discovery 36.3 (2022): 1140-1173.

Wang, Huanxin, et al. "An analysis of factors affecting the

severity of marine accidents." Reliability Engineering

& System Safety 210 (2021): 107513.

Zhang, Fan. "A hybrid structured deep neural network with

Word2Vec for construction accident causes

classification." International Journal of Construction

Management 22.6 (2022): 1120-1140.

Zhang, Xiaoge, Prabhakar Srinivasan, and Sankaran

Mahadevan. "Sequential deep learning from NTSB

reports for aviation safety prognosis." Safety science

142 (2021): 105390.

ICRDICCT‘25 2025 - INTERNATIONAL CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION,

COMMUNICATION, AND COMPUTING TECHNOLOGIES

656