3 RESULTS AND DISCUSSION
This study evaluated the importance of features in a
lung cancer dataset using Random Forest and
Multilayer Perceptron (MLP) models. The dataset
comprises 309 samples and 15 features. These
features were selected to identify better factors related
to lung cancer and improve the model's predictive
accuracy. The results are provided in Figure 1 and
Figure 2.
3.1 Feature Importance Evaluation
Using the Random Forest model, this study calculated
the contribution of each feature to the model’s
performance. The results of feature importance are
shown in Figure 1. Age (AGE) was identified as the
most significant feature with an importance score of
0.1955, highlighting its crucial role in lung cancer
prediction. The next most important features were
Allergy and Swallowing Difficulty, with importance
scores of 0.1136 and 0.0893, respectively, indicating
a strong correlation with lung cancer.
In the MLP model, this study assessed the
contribution of features to the model’s accuracy by
using the feature importance from the Random Forest
model. Detailed data on feature importance in the
MLP model are presented in Figure 2. This study
observed that removing the Age feature decreased
model accuracy to 0.94 (ROC of 0.90) while
removing the Allergy feature decreased accuracy to
0.96 (ROC of 0.93).
In summary, both Random Forest and MLP
models highlight the importance of features such as
Age, Allergy, and Swallowing Difficulty in lung
cancer prediction. The identification and weighting of
these features are crucial for enhancing early
detection accuracy. These findings help optimize
model performance in practical applications and
provide a foundation for further research and feature
selection strategies.
The results of this study indicate that both the
Random Forest and MLP models’ most important
features are Age, Allergy, and Swallowing Difficulty
in lung cancer prediction. However, it is important to
consider the impact of dataset imbalance on the
evaluation of feature importance.
Additionally, features with lower importance
scores, such as Fatigue and Wheezing, might still
have clinical significance in specific patient groups or
disease stages. Therefore, it is important for future
research to address dataset imbalance through
techniques like over sampling, and under
sampling. These methods can provide a more
balanced view of feature importance and improve the
model's overall performance
4 CONCLUSIONS
This study demonstrates the features and importance
of lung cancer prediction in the MLP model using a
random forest model. Features such as Age, Allergy,
and Swallowing Difficulty are important for the
diagnostic process. However, the significant
imbalance in this dataset, with only 39 non-cancerous
samples, may impact the accuracy of feature
importance evaluations. Future research should focus
on using dataset imbalance with advanced sampling
techniques and validating findings with larger, more
balanced datasets. Overcoming these challenges will
enhance the accuracy of predictive models, improve
early detection strategies for lung cancer, and benefit
patient outcomes.
REFERENCES
Connolly, J. L., et al. 2003. Role of the surgical pathologist
in the diagnosis and management of the cancer patient.
Holland-Frei Cancer Medicine - NCBI Bookshelf.
Erickson, B. J., Korfiatis, P., Akkus, Z., & Kline, T. L. 2017.
Machine learning for medical imaging. Radiographics,
37(2), 505-515.
Giger, M. L. 2018. Machine learning in medical
imaging. Journal of the American College of
Radiology, 15(3), 512-520.
Javed, R., Abbas, T., Khan, A. H., Daud, A., Bukhari, A.,
& Alharbey, R. 2024. Deep learning for lungs cancer
detection: A review. Artificial Intelligence Review,
57(8).
Kononenko, I. 2001. Machine learning for medical
diagnosis: history, state of the art and perspective.
Artificial Intelligence in medicine, 23(1), 89-109.
Lung cancer. 2022. Kaggle. https://www.kaggle.com/data
sets/nancyalaswad90/lung-cancer/data
Pacurari, A. C., et al. 2023. Diagnostic Accuracy of
Machine Learning AI architectures in detection and
Classification of lung Cancer: A Systematic review.
Diagnostics, 13(13), 2145.
Pinkus, A. 1999. Approximation theory of the MLP model
in neural networks. Acta numerica, 8, 143-195.
Taud, H., & Mas, J. F. 2018. Multilayer perceptron
(MLP). Geomatic approaches for modeling land change
scenarios, 451-455.
World Health Organization: WHO & World Health
Organization: WHO. 2023. Lung cancer. Retrieved
September 4, 2024, from https://www.who.int/news-
room/fact-sheets/detail/lung-cancer