4 SUGGESTION
However, although the result showed a perfect
accuracy, there are still some improvements that can
enhance the performance of machine learning models.
From the perspective of data, it is crucial to use a large
variety of diseases data that includes latest statistics of
human, because bacteria and viruses vary along the
time, since they have strong adaptation and easier to
be fostered as well, single variation in local
environment may incur detrimental effects, such as
plague (Naji etal, 2021).
Secondly, ensuring the data is reliable that doesn’t
introduce any bias. Since the origin data is based on
survey, it is possible that people obscure their real
situation or deceive the researchers. To make a
reasonable and objective prediction, it is important to
consider aspects including ethnicity or socioeconomic
status to avoid possible discrimination or superficial
information (Hussain etal,2024).
Moreover, the parameters in supervised learning
model should be adjusted more appropriately. For
instance, KNN calculates the distance between the
new data point and the k nearest neighbours to classify
or predict regression. However, the experiment didn’t
focus on the value of K during the process; A large K
may cause less overfitting but ignore significant
patterns, smaller K results in more clusters, but lead to
overfitting of the data. Also, among various machine
learning models, choices should be concluded not just
on scientific level but also clinical aspects. By
cooperating with medical experts, models can most
efficiently used to predict the relevance of diseases so
that professors can build suitable advice and
prescription.
5 CONCLUSION
To summarize, the experiment predicted the relations
between illnesses and cancer by utilizing three kinds
of machine learning models. Among these models,
random forest performed the best, illustrating that the
relations between cancers and other symptoms were
almost related; while KNN may not be very
appropriated since it achieves the worst during
several evaluation models.
During the research, it is found that allergy,
wheezing, alcohol consuming and other illnesses or
symptoms are quite correlated with lung cancer,
which means that having these habits or symptoms
are more possible to get cancer than normal people.
Moreover, there were no direct relevance between
gender and lung cancer, showing that cancer is not
related to the difference of biological structure of
human body.
Overall, machine learning occupies an important
role in research of illnesses, with the development of
contemporary technology, human is getting closer to
the ideal life.
REFERENCES
Evidently AI, How to explain the ROC AUC score and
ROC curve? 2025,
https://www.evidentlyai.com/classification-
metrics/explain-roc-
curve#:~:text=ROC%20AUC%20score%20shows%2
0how,have%20an%20AUC%20of%200.5.
Hussain, S., Ali, M., Naseem, U., Nezhadmoghadam, F.,
Jatoi, M. A., Gulliver, T. A., & Tamez-Peña, J. G.,
2024. Breast cancer risk prediction using machine
learning: A systematic review. Frontiers in Oncology,
14.
Islam, M. M., Haque, M. R., Iqbal, H., Hasan, M. M.,
Hasan, M., & Kabir, M. N., 2020. Breast cancer
prediction: A comparative study using machine
learning techniques. SN Computer Science, 1(5).
Kourou, K., Exarchos, T. P., Exarchos, K. P., Karamouzis,
M. V., & Fotiadis, D. I., 2015. Machine learning
applications in cancer prognosis and prediction.
Computational and Structural Biotechnology Journal,
13.
Kumar, C. A., Harish, S., Ravi, P., Svn, M., Kumar, B. P.
P., Mohanavel, V., Alyami, N. M., Priya, S. S., &
Asfaw, A. K., 2022. Lung cancer prediction from text
datasets using machine learning. BioMed Research
International, 2022, 1–10.
Mokoatle, M., Marivate, V., Mapiye, D., Bornman, R., &
Hayes, V. M., 2023. A review and comparative study
of cancer detection using machine learning: SBERT
and SimCSE application. BMC Bioinformatics, 24(1).
Naji, M. A., El Filali, S., Aarika, K., Benlahmar, E. H.,
Abdelouhahid, R. A., Debauche, O., 2021. Machine
learning algorithms for breast cancer prediction and
diagnosis. In Procedia Computer Science, 191, 487-
492.
Patil, R, S., D, T. P., S, J., ,Ingale, S. P., , 2024, March 24.
A comprehensive review on cancer prediction using
machine learning techniques.
Sandragracenelson, 2023, March 23. Lung cancer
prediction. Kaggle.
https://www.kaggle.com/code/sandragracenelson/lung
-cancer-prediction
Zhang, S., Yang, L., Xu, W., Wang, Y., Han, L., Zhao, G.,
& Cai, T., 2024. Predicting the risk of lung cancer
using machine learning: A large study based on UK
Biobank. Medicine, 103(16), e37879.