People who can afford a higher fare live in a higher
class, have a better geographical location, have better
access to relational information and therefore manage
to survive faster than others on board.
“Sex” is significant because women are given
priority to survival, as well as children, reflecting the
'women and children first' policy. Kakde et al. also
found that people with the title Mrs in their name
column had a survival rate of 79%, while people
signed Mr had a survival rate of about 16% (Kakde,
2018). There's about a fourfold difference in survival
rate between male and female, indicating the
importance of "gender" and also that people follow
the "women and children first" rule.
In general, features such as "sibsp", "parch" and
"embarked" have less weight because they are less
relative to survival than age, sex, fare and Pclass.
However, these features may be weighted differently
in other algorithms.
3.2 Limitations and Future Prospects
In this study, only one algorithm, Random Forest, has
been used, which may have resulted in unimportant
attributes becoming apparent due to the specificity of
the method. As a solution, several machine learning
algorithms such as LR, Gradient Boosting, RNN, and
SVM should be used and then compared and critically
analyzed. Furthermore, a new dataset with a different
combination of features could be tried and compared
with the previous one, then make a deep dive into how
the features affect the predictions.
4 CONCLUSION
In this study, it was found that age, fare and sex were
the three most influential features in predicting the
survival of Titanic passengers by applying Random
Forest to a Kaggle dataset.
The results of this analysis highlight the
importance of specific passenger attributes in
predicting survival. Recognizing the importance of
age, fare and gender can contribute to more accurate
forecasts in similar datasets and improve historical
analysis. This may have practical implications for
improving prediction models in other fields where the
importance of characteristics is
essential. Furthermore, the use of Random Forests
demonstrates the effectiveness of AI algorithms in
predicting problems and identifying important
patterns in complicated datasets.
However, there are a number of shortcomings in
this analysis. Model robustness may be affected by
potential overfitting problems and by focusing on
feature importance without considering feature
interactions. Future studies could explore other
methods for comparative analysis, such as logistic
regression or gradient boosting machines. In addition,
using a variety of combinations of features, as well as
looking at feature interactions, may improve model
accuracy and provide a deeper understanding of
survival predictions.
REFERENCES
Careerfoundry, what is Random Forest? 2023. Retrieved
from https://careerfoundry.com/en/blog/data-
analytics/what-is-random-forest/
Choi, E., Bahadori, M. T., Schuetz, A., Stewart, W. F., &
Sun, J. 2016. Doctor AI: Predicting clinical events via
recurrent neural networks. JMLR Workshop and
Conference Proceedings, 56, 301-318. Epub 2016 Dec
10. PMID: 28286600; PMCID: PMC5341604.
Frey, B. S., Savage, D. A., & Torgler, B. 2011. Behavior
under extreme conditions: The Titanic disaster. Journal
of Economic Perspectives, 25(1), 209–22.
https://doi.org/10.1257/jep.25.1.209
History. Titanic History, 2024. Retrieved from
https://www.history.com/topics/early-20th-century-
us/titanic
Hsieh, Y.-Z., Su, M.-C., Wang, C.-H., & Wang, P.-C. 2014.
Prediction of survival of ICU patients using
computational intelligence. Computers in Biology and
Medicine, 47, 13-19.
Kakde, Y., & Agrawal, S. 2018. Predicting survival on
Titanic by applying exploratory data analytics and
machine learning techniques. International Journal of
Computer Applications, 179, 32-38.
https://doi.org/10.5120/ijca2018917094
Kaggle, 2017. Titanic Dataset. Retrieved from
https://www.kaggle.com/datasets/heptapod/titanic
Pradeep, K. R., & Naveen, N. C. 2018. Lung cancer
survivability prediction based on performance using
classification techniques of support vector machines,
C4.5 and naive Bayes algorithms for healthcare
analytics. Procedia Computer Science, 132, 412-420.
Singh, A., Saraswat, S., & Faujdar, N. 2017. Analyzing
Titanic disaster using machine learning algorithms. In
2017 International Conference on Computing,
Communication and Automation (ICCCA) (pp. 406-
411). Greater Noida, India: IEEE.
https://doi.org/10.1109/CCAA.2017.8229835
Wang, T., Qiu, R. G., & Yu, M. 2018. Predictive modeling
of the progression of Alzheimer’s disease with
recurrent neural networks. Scientific Reports, 8, 9161.