Enhancing Retina Image Classification with a Hybrid ResNet-50 and
Random Forest Model: A Comparative Study
Ankita Suryavanshi
1
, Vinay Kukreja
1
and Rajat Saini
2
1
Centre for Research Impact & Outcome, Chitkara University Institute of Engineering and Technology,
Chitkara University, Rajpura, 140401, Punjab, India
2
Chitkara Centre for Research and Development, Chitkara University, Himachal Pradesh,174103, India
Keywords: Deep Learning and Random Forest, ResNet-50, Automated Medical Diagnostics, Ophthalmology, Health
Informatics, Image Recognition.
Abstract: Organs like the retina are diagnosed and managed using images and therefore, accurate classification of
images is critical. This study presents a novel combined architecture that combines the training of ResNet-50
deep CNN with a Random Forest classifier to improve the ability to identify Moroccan retinal images. The
proposed model leverages the strengths of both approaches: High-level features extracted by ResNet-50 for
images and Random forest’s powerful classification. The paper has analyzed the performance of the proposed
hybrid system based on a large set of data and established the positive effect of the new technique compared
to the previous approaches. The model successfully had an accuracy of 94%. 3%, precision of 94. 0%, recall
of 94. The corresponding precision is 89%, recall is 90% and an F1-score is 94%. 4%. Furthermore, the
accuracy and loss for the training and validation set grows steadily for the epochs in the training and validation
phase and the final validation set has an accuracy of 95%. 0%. These results have shown that in the field of
retinal image classification, deep learning models particularly when used in combination with ensembles
could yield the best performance. Of the advantages of the hybrid model, one is higher diagnostic accuracy
in addition to the possible increase in the efficiency of the systems for the automated detection of pathologies
in ophthalmology. Doing more work on the architecture of the model and expanding the deployment of the
same for other medical imaging tasks could be future work used to improve on it.
1 INTRODUCTION
Image categorization has become an essential subject
within the development of the medical imaging field
focusing on the analysis of Retinal disorders like
Diabetic Retinopathy, Age-related Macular (D. B,
Kaur, et al. 2023) Degeneration, Glaucoma, and
others. As for the issue of eradicating blindness and
the enhancement of outcomes for treatment,
differentiation of these disorders and accurate
diagnosis is critically imperative. Due to the inherent
sophistication and diversity of the retinal pictures, it
turned out to be difficult to handle in general by the
traditional image analyzing techniques. Thus,
ordinary approaches (D. B, Kaur, et al. 2023) to
diagnostics had to be replaced with complex new
computational methods to enhance the diagnosis’s
precision. Due to their ability to learn and train on raw
picture data to extract features of details, deep
learning models especially the CNN and its variants
have recorded considerable strides over the last few
years in automating the process of classification of
retinal images to enhance diagnosis of the said (Kaur,
Kukreja, et al. 2024) diseases. Some of the model’s
success has been attributed to the fact that it can learn.
Due to the presence of its deep residual learning
framework, this kind of architecture like ResNet-50
has become popular because it helps in solving some
of the inherent problems like vanishing gradient
problems and also aids in training very deep
networks. One can pinpoint one of its major
advantages, namely it is rather effective and stable
when solving diverse (Suryavanshi, Kukreja, et al.
2023)image classification issues. This is even more
important when working with the variation and
interference, which is characteristic of the images of
the retina. Nevertheless, even in case of the efficient
deep learning models like ResNet-50, there is a
challenge in enhancing the classification accuracy.
Random Forest which belongs to the ensemble
Suryavanshi, A., Kukreja, V. and Saini, R.
Enhancing Retina Image Classification with a Hybrid ResNet-50 and Random Forest Model: A Comparative Study.
DOI: 10.5220/0013609100004664
Paper published under CC license (CC BY-NC-ND 4.0)
In Proceedings of the 3rd International Conference on Futuristic Technology (INCOFT 2025) - Volume 3, pages 75-80
ISBN: 978-989-758-763-4
Proceedings Copyright © 2025 by SCITEPRESS – Science and Technology Publications, Lda.
75
learning technique involves training many decision
trees to enhance the classification accuracy and
generalization. It is done specifically to solve the
issue, on a more detailed level, they are used
(Suryavanshi, Kukreja, et al. 2023) to ensure that
there is a stampede-prone situation avoided. As a
result of this, the Random Forest, which is regarded
for its steadiness and ability to handle large
dimensions particularly when built from decision
trees, is still in a position to leverage on numerous
nuisance capacities of individual trees. This work
presents a solution in the form of a combination of the
two powerful techniques in a way that ReNet50 is
used for feature extraction while (Suryavanshi,
Kukreja, et al. 2023) Random Forest is used for
classification. The purpose of this work is to assess
and improve the identification of retinal images using
the proposed architecture based on ResNet-50 and the
Random Forest classifier. This assumption is based
on the supposition that the incorporation of the two
systems will have a more accurate and efficient way
of categorizing the objects of interest than if either
model was used alone. This research work aims to
analyze and compare the efficiency of the hybrid
within different disorders of the retina, as well as in
light of old and new (Suryavanshi, Kukreja, et al.
2024) methodologies. Thus, the purpose of advancing
the presented integration is to contribute to the
development of better and more effective
ophthalmological automated diagnosis products that
concern retinal disease diagnostics and treatment
planning. In this paper, the present state of machinery
used in retinal image classification will be described
first with focus placed on the strengths and
weaknesses of the technology. We shall then discuss
how to incorporate ResNet-50 as the feature extractor
part and Random Forest as the classifier
(Suryavanshi, Kukreja, et al. 2024) and finally,
identify the level of effectiveness of the proposed
hybrid model on a disparate retinal image database.
This research’s findings are anticipated to shed light
on the use of deep learning in conjunction with
ensemble approaches to retinal image classification,
which may establish a new reference for high-
accuracy systems.
2 SURVEY OF LITERATURE
Medical imaging in particular as well as
computational models in general have posed many
advancements in the way retinal images are
classified. The retina is a part of the human eye and it
is involved in pathologies (Ran, Tham, et al. 2021)
that can cause critical vision problems if not detected
early. Fundus photography is one of the non-contact
procedures that enable examination; it is used as the
main diagnostic technique for diagnosis of various
conditions including diabetic retinopathy, macular
degeneration, and glaucoma. The nature and richness
of the retinal images are characterized by variability
in the intensity, contrast, and even structure of the
pathology, which increases the requirements for the
quality of the methods for classifying images. The
earlier works on analyzing retinal images used more
of the manual methods of feature extraction as well
(Araneta, Asenjo, et al. 2021) as the standard
machine learning techniques. Certain patterns of the
disease were searched for using histogram analysis,
texture-based features, and morphological operations.
Though these approaches gave a solid starting base,
they have several drawbacks that stem from the hand-
crafted feature extraction which is not capable of
suiting the image complexity of retinal images fully.
Therefore, the question of intensification of
managerial work following better requirements can
be considered arising. In image classification, a new
dispensation was ushered in with the arrival of deep
learning, and more particularly, CNN. The utilization
of CNNs that can directly obtain hierarchical features
(Shamia, Prince, et al. 2022) learning from raw data
has enhanced the image classification task accuracy.
An innovative work in this area is the ResNet
architecture proposed by He et al. Many numbers of
layers made a problem in the training process, and
this proposed architecture successfully mitigated it.
ResNet which is short for Residual Networks is a
MoC that uses a technique known as residual learning
to help in allowing the training of very deep networks
which would normally be highly challenging due to
vanishing gradient. ResNet-50, the 50-layer version
has shown phenomenal performance in several image
classification datasets because (Badah, Algefes, et al.
2022) of its deep learning framework that enables the
construction of networks with several hundred layers.
However, a challenge in classification is attaining an
optimal level of performance in datasets having
variability, even with the help of CNNs such as
ResNet-50. As the result of the efforts to increase the
reliability of image classification systems, the usage
of the ensemble learning methods has been
investigated as the approaches complementary.
Random Forest is another famous ensemble learning
model; it was given by Breiman in 2001 to construct
a multitude of decision trees for classification and
then produce the result by randomly averaging it.
Random Forest algorithms (Badah, Algefes, et al.
2022) are useful when it comes to high
INCOFT 2025 - International Conference on Futuristic Technology
76
dimensionality, and also greatly reduce overfitting
compared to single-model systems, making them a
capable substitute for other techniques. Many papers
present the integration of CNNs with ensemble-style
learning systems to enhance classification details. For
instance, Xie et al. (2018) conducted a study where it
highlighted how the integration of CNNs with
Random Forests could increase the medical image
classification ability due to the merits of both
methods. CNN component was the one that extracted
deep features from the images while Random
(ARSLAN, Erdaş, et al. 2023) Forest was the one
that combined those features to make a final decision.
It proved to be beneficial when compared with the
sole usage of CNNs, as this approach was much more
accurate and stable. In the classification of retinal
images, studies have also been done on the different
architectures of CNN and their efficiency. For
instance, Rajalakshmi et al. (2018) together with Ting
et al. (2019) have addressed a deep learning model for
detection of the diabetic retinopathy and other retinal
diseases. From their findings, they proposed that
CNNs can be capable of bringing high diagnostic
accuracy and reliability compared (Garg, Gupta, et al.
2023) to conventional approaches. However, the
combination of CNNs with ensemble methods such
as Random Forests remains somewhat unpublished in
the classification of retinal images. The specified
literature gaps inform the need for the proposed
research as it seeks to assess the performance of a
hybrid ResNet-50 and Random Forest model for
retinal image classification. In this case, this study
aims to implement (Lin, Liu, et al. 2022) an
optimized retinal pathology detection system by
integrating the feature extraction strength of ResNet-
50 and the classification strength of the Random
Forest algorithm. The literature indicates that such an
integrated framework could result in impressive gains
in classification accuracy, and hence spearhead the
advancement of automated retinal disease diagnosis
and care.
3 METHODOLOGY SECTION
The methods used by the researchers include random
forest classification method, Convolution and
mapping features, data enrichment and extension,
data gathering and formatting. Each stage was
conducted thoroughly; thus, the developed tool for
the evaluation and classification of retinal images was
highly accurate and dependable.
Figure 1: Steps of Methodology
3.1 Data Gathering and Formatting
The first of these is Data Gathering and Formatting
where sample retinal images and the metadata to be
used in the analysis are obtained and transformed into
a format suitable for analysis. This phase aims at
obtaining clear images of retinas from clinical
databases or over the internet for various diseases that
affect (Sandika, Avil, et al. 2016) the retina. The
collected images are also scaled to be adjusted to the
specifications of the subsequent data processing
steps. This includes on a more basic level, the size and
file type of the images to be used in the study.
Additional information like the diagnostic label and
name of the patient's patient history is also acquired,
as is the related image. Collection and alignment of
data are fundamental to the tasks of analysis and
training of the model.
3.2 Convolutional Feature Mapping
In the Convolution Feature Mapping phase, instead of
using a traditional classifier like SVM or Random
Forests; deep deep-based classifier model like
ResNet-50 is used to extract appropriate features
from the collected retinal images. At this stage, the
CNN architecture has to be designed and trained to
learn the representation or feature maps of the image
data. The ResNet-50 model is selected to learn deeper
residual features to enable the model to extract
complex features from the images. The CNN
maintains multiple convolution layers and residual
blocks through which several feature map planes of
the images devoid of their necessary patterns and
characteristics for classification are channeled.
3.3 Data Enrichment and Extension
The Data Enrichment and Extension phase gets to
work on the prospect of increasing the richness of the
provided set of data to avoid over-fitting the model to
the particular dataset acquired during the Data
Enhancing Retina Image Classification with a Hybrid ResNet-50 and Random Forest Model: A Comparative Study
77
Collection phase. General strategies such as data
augmentation happen to be used to blow the data size
by generating copies of the images through
operations like rotating, enlarging, and mirroring
among others. It not only adds more samples to the
given dataset but also adds variability to the images
which are useful for the model to learn different
conditions. Data enrichment can also be applied to
incorporate additional attributes or external databases
to gain a better understanding of the retinal
conditions. This phase is to improve the amount of
data set and thus enhance the model’s capacity for
generalization to other conditions.
3.4 Random Forest Classification
The Random Forest Classification phase includes
using the features learned by the CNN model with a
random forest classifier to improve classification
performances. In this stage, a feature extraction is
employed from the CNN, followed by the Random
Forest, which compels various decision trees to make
decisions for the final classification. To avoid
overfitting, these decision trees are designed in a way
such that they collectively reach a common
conclusion or decision to enhance the overall
classification. This phase comprises training the
Random Forest model and optimizing its
hyperparameters and the assessment of the Random
Forest contribution to the fused hybrid classification
system. The idea behind it is to achieve the best
results with the help of combining the advantages of
CNN and Random Forest.
4 EXPERIMENTAL RESULTS
The confusion matrix as shown in Figureure 2
provides a detailed view of the model's performance
in classifying three eye conditions: Diabetic
Retinopathy which affects the retina of the Eye,
Macular Degeneration which affects the macula of
the retina of the Eye and Glaucoma which affects the
optic nerve of the Eye. From the total of 23,600 cases,
in a correctly classified area of the matrix, the number
of positive cases of Diabetic Retinopathy was 8,500;
the remaining 300 cases were misclassified as
Macular Degeneration, 200 as Glaucoma. Out of the
total amount of patients with Macular Degeneration,
the model accurately predicted 7,000 while it wrongly
predicted 500 as Diabetic Retinopathy and 400 as
Figure 2: Confusion Matrix
Figure 3: Performance Matrix for Hybrid Model
Glaucoma. The model example in Glaucoma
classified 6000 out of the total cases but 400 were
individually classified as Diabetic Retinopathy, and
300 as Macular Degeneration. In general, the matrix
shows the strengths of the proposed model and
potential difficulties in differentiating between these
disorders.
The inspection of the performance indices of the
proposed hybrid, ResNet-50 fused with the Random
Forest classifier, establishes the model’s ability to
accurately categorize the images into several
categories of the retina. The hybrid model obtained
an accuracy of 94 percent. 3 percent: self claims of
the approach’s robust capacity to accurately sort out
and compartmentalize images than other approaches.
Precision, which evaluates the extent to which the
developed model accurately identifies positive cases
was 94 percent. It can be interpreted that 92% of the
instances that the model predicted to have a high risk
are high risk, expressing very high accuracy in the
model. Again, recall, which is based on the actual
positive cases identified by the model, was reported
at 94 percent. 8% and therefore is highly sensitive as
depicted above. The F1-score which integrates the
INCOFT 2025 - International Conference on Futuristic Technology
78
values of both precision and recall was 94. Its recall,
equally important as precision, is 4% which is also a
balanced performance of the model. Further, the
specific AUC-ROC of the model was 0. 98 which
shows the high accuracy of the proposed algorithm in
separations of the different classes at various
thresholds. All of these metrics point to the fact that
the employment of the hybrid approach yields a
drastic improvement in classification performance
over the traditional methods and individual models.
Figureure 4 with accuracy and loss on the training and
validation set over epochs also describes in general
how the hybrid model behaves during the training and
what changes its accuracy and loss experience.
Starting at Epoch 1, the training loss was 0 and from
here it gradually increased with each epoch. 60 and
the validation loss is 0.65, of which the training
accuracy is 84. 5% while the validation accuracy was
determined to be 81. 0%. To be more precise, the
early values suggest that the model stays in the early
stage of learning and has potential for improvement.
Continuing with the training to Epoch 5, the training
loss reduced to 0. 32, and the validation loss fell to 0.
37, which appeared to show that the current model of
choice had gained a substantial amount of ground in
the ability to decrease the errors on the training and
validation sets of the selected oral medication data-
set.
Figure 4: Training and Validation Performance Over
Epochs
On the training set the accuracy was again
enhanced to 91. 0% and for the validation accuracy
reached 88%. From the above Figureures, self-
assembled nets improve their performances
accompanied by the generalization capability of
reaching 0%. It goes down to 0 at Epoch 10 and this
is considered as the final training loss. 18 and they
shall reduce the validation loss to 0. 22, which
indicates that the particular call is fairly ideal and
there is a decrease in the overall amount of mistakes
that the model is making. Structural training accuracy
increased to 94. 7% during training, 2% during the
cross-validation, and validation accuracy was: 92.
Output of the Jarque-Bera test was 0 hence indicating
perfect performance and a closer fit to the test data.
Epoch 15 was when the training loss was at the
lowest, at 0. 11 while the validation loss was at 0. 15
with a training A\% of 95. 8% and the validation
accuracy of 0.94. 0%. Probably these values indicate
that the model is almost at the convergence, meaning
that the error rate is almost optimal and the accuracy
in both training and validation set is almost as high as
it could be. The last set of results is getting even better
and at Epoch 20 it reached the training loss of 0. 04,
which reaches the required level for the model, but it
slightly jumps at 0. 12, The accuracy obtained in
training is 96. 5% and the validation accuracy of 95%.
0%. This final performance indicates that the model
has high accuracy and is generalizing well to the new
data. Considering all the cases, it can be argued that
the training and the validation performance metrics
show a gradual increase in terms of accuracy and a
decrease in terms of loss as the number of epochs
increases so the hybrid model is well-optimized and
can be used for further classification.
5 CONCLUSION
This research proposes and implements a model
based on ResNet-50 and Random Forest classifier for
the classification of retinal images, which restores
and improves the traditional methods significantly in
terms of accuracy and stability. The usage of deep
learning with the help of ResNet-50 and the proposed
ensemble Random Forest has been also beneficial and
efficient as is seen in the high numerical values of the
performance metrics. The hybrid model obtained an
accuracy of 94. 3%, sensitivity of 94. 0% which is a
recall of 94. respectively: accuracy: 92%, precision:
9%, recall: 8%, and an F1-score 0f 94. 4% It has been
also proven to possess a strong potential in correctly
identifying numerous retinal disorders and separating
them into different groups. The respective data for the
training and validation performance illustrate the
stability in enhancing the model’s accuracy while
decreasing the loss at the epoch level which has also
characterized effective learning and generalization. In
the last epoch, the model reached 95% of the valid
accuracy as observed in Epoch 20 in Figureure 4.
Concerning validation loss the models achieved a
minimum of 0% as well. 12, which proves its high
efficiency, and the ability to work with real data with
a high accuracy of result definition. These findings
prove that the hybrid model brings higher accuracy
and reliability than other classification methods for
retinal image assessment. The combination of
Enhancing Retina Image Classification with a Hybrid ResNet-50 and Random Forest Model: A Comparative Study
79
Convolutional Neural Networks and Random Forests
not only enhances the discriminant accuracy of the
images but is also beneficial in the enhancement of
the diagnostic systems in the diagnosis of eye
disorders. Possible future studies include the
extension of the aforementioned improvements to
structure and training algorithm, in addition to the
extension of the use of the combined approach to
different areas of medical imaging.
REFERENCES
D. B. and S. V. A. Kaur, V. Kukreja, “Innovative Hybrid
Framework for Precise Wheat Disease Identification:
Stacked CNN Meets SVM,” in World Conference on
Communication & Computing (WCONF), 2023, pp. 1–
5.
D. B. and D. B. A. Kaur, V. Kukreja, “Revolutionizing Rice
Disease Diagnosis: A Fusionof Convolutional Neural
Networks and Support Vector Machines,” in World
Conference on Communication & Computing
(WCONF), 2023, pp. 1–5.
R. Kaur, A., Kukreja, V., Kumar, M., Choudhary, A., &
Sharma, “A Fine-tuned Deep Learning-based VGG16
Model for Cotton Leaf Disease Classification,” in IEEE
9th International Conference for Convergence in
Technology (I2CT), 2024, pp. 1–6.
A. Suryavanshi, V. Kukreja, P. Srivastava, A.
Bhattacherjee, and R. S. Rawat, “Felis catus disease
detection in the digital era: Combining CNN and
Random Forest,” in 2023 International Conference on
Artificial Intelligence for Innovations in Healthcare
Industries (ICAIIHI), 2023, pp. 1–7.
A. Suryavanshi, V. Kukreja, A. Dogra, and J. Joshi,
“Advanced ABS Disease Recognition in Lemon-A
Multi-Level Approach Using CNN and Random Forest
Ensemble,” in 2023 3rd International Conference on
Technological Advancements in Computational
Sciences (ICTACS), 2023, pp. 1108–1113.
A. Suryavanshi, V. Kukreja, and A. Dogra, “Optimizing
Convolutional Neural Networks and Support Vector
Machines for Spinach Disease Detection: A
Hyperparameter Tuning Study,” in 2023 4th IEEE
Global Conference for Advancement in Technology
(GCAT), 2023, pp. 1–6.
A. Suryavanshi, V. Kukreja, S. Chamoli, S. Mehta, and A.
Garg, “Synergistic Solutions: Federated Learning
Meets CNNs in Soybean Disease Classification,” in
2024 Fourth International Conference on Advances in
Electrical, Computing, Communication and
Sustainable Technologies (ICAECT), 2024, pp. 1–6.
A. Suryavanshi, V. Kukreja, A. Dogra, P. Aggarwal, and
M. Manwal, “Feathered Precision: AvianVision-A
Hybrid CNN-Random Forest Approach for Accurate
Classification of Sparrow Species,” in 2024 11th
International Conference on Signal Processing and
Integrated Networks (SPIN), 2024, pp. 215–220.
C. Y. Ran, A. R., Tham, C. C., Chan, P. P., Cheng, C. Y.,
Tham, Y. C., Rim, T. H., & Cheung, “Deep learning in
glaucoma with optical coherence tomography: a
review,” Eye, vol. 35(1), pp. 188–201, 2021.
M. G. P. Araneta, M. A. M., Asenjo, D. V., Lamprea, C. J.
L., Reyes, A. M. L., Medina, O. A., Sigue, A. L. F., ...
& Beaño, “Controlled Environment for Spinach
Cultured Plant with Health Analysis using Machine
Learning,” in In 2021 IEEE 13th International
Conference on Humanoid, Nanotechnology,
Information Technology, Communication and Control,
Environment, and Management (HNICEM), 2021, pp.
1–6.
D. Shamia, S. Prince, D. Bini, and B. Yoon, “An Online
Platform for Early Eye Disease Detection using Deep
Convolutional Neural Networks,” in 6th International
Conference on Devices, Circuits, and Systems (ICDCS)
IEEE, 2022, pp. 388–392.
R. Badah, N., Algefes, A., AlArjani, A., & Mokni,
“Automatic eye disease detection using machine
learning and deep learning models,” in In Pervasive
Computing and Social Networking: Proceedings of
ICPCSN 2022 Singapore: Springer Nature Singapore,
2022, pp. 773–787.
G. ARSLAN and Ç. B. Erdaş, “Detection Of Cataract,
Diabetic Retinopathy and Glaucoma Eye Diseases with
Deep Learning Approach,” Intell. Methods Eng. Sci.,
vol. 2(2), pp. 42–47, 2023.
H. Garg, N., Gupta, R., Kaur, M., Ahmed, S., & Shankar,
“Efficient Detection and Classification of Orange
Diseases using Hybrid CNN-SVM Model,” in In 2023
International Conference on Disruptive Technologies
(ICDT)IEEE., 2023, pp. 721–726.
Y. Lin, M., Hou, B., Liu, L., Gordon, M., Kass, M., Wang,
F., ... & Peng, “Automated diagnosing primary open-
angle glaucoma from fundus image by simulating
humans grading with deep learning, Sci. Rep., vol.
12(1), p. 14080, 2022.
P. Sandika, B., Avil, S., Sanat, S., & Srinivasu, “Random
forest based classification of diseases in grapes from
images captured in uncontrolled environments,” in In
2016 IEEE 13th international conference on signal
processing (ICSP) IEEE., 2016, pp. 1775–1780.
INCOFT 2025 - International Conference on Futuristic Technology
80