optimize profundity, width, and determination, with
an input measure of 260x260 pixels (Khan, et al. ,
2022). Compared to littler EfficientNet models, it
offers higher precision (~79.8% Top-1 on ImageNet)
whereas remaining computationally effective (~9.2M
parameters). It is perfect for errands requiring both
accuracy and speed, such as therapeutic imaging and
real-time applications.
1.3 Architecture
The architecture of EfficientNet-B2 is built on the
principles of Mobile Inverted Bottleneck
Convolutions (MBConv) and Squeeze-and-
Excitation (SE) blocks, optimized for efficiency and
high performance. MBConv layers use a depthwise
separable convolution combined with an expansion
phase, reducing computational cost while capturing
rich feature representations.
Figure 1: EfficientNet Architecture
The SE blocks enhance the model by recalibrating
channel-wise feature responses, emphasizing critical
features and suppressing less important ones.
EfficientNet-B2 employs compound scaling, a unique
method to balance depth (number of layers), width
(number of channels per layer), and input resolution.
For B2, the input image size is 260x260 pixels,
providing higher resolution for detailed feature
extraction compared to smaller models like
EfficientNet-B0 and B1. The scaling ensures a
systematic increase in model capacity without
redundant computations, achieving efficiency. The
model contains stacked MB Conv layers grouped into
stages, each designed for different spatial resolutions,
followed by a global average pooling layer and a
fully connected classifier. With approximately 9.2
million parameters and 1.0 billion FLOPs,
EfficientNet-B2 is significantly lighter than
traditional models like ResNet but delivers
competitive accuracy (~79.8% Top-1 on ImageNet).
This architecture makes it highly suitable for image
classification tasks in resource-constrained
environments or real-time applications requiring both
speed and accuracy.have varying depths in various
designs. The top two levels have '4096' channels
apiece, while the third has '1000' channels. The
configuration of completely linked layers remains
consistent across all networks.
1.4 Retinopathy
Retinopathy, a word that refers to a variety of retinal
illnesses, is a serious medical problem with far-
reaching consequences for visual health. Retinopathy
is mostly connected with systemic disorders such as
diabetes, hypertension, and other vascular diseases
(Kumaresan, and, Palanisamy, 2022. It presents as
damage to the fragile blood vessels of the retina,
which may lead to visual impairment or even
blindness if not addressed. The retina, a critical
sensory tissue in the back of the eye, is responsible
for translating light impulses into visual information
for the brain. Therefore, any disturbance to its
complicated vascular network might result in
significant repercussions. As the frequency of
retinopathy-related disorders grows worldwide, there
is an increasing need for better diagnostic techniques
and strategies to diagnose and manage this ocular
problem early on, avoiding irreparable damage and
maintaining visual acuity. This introduction sets the
context for delving into the problems and
developments in retinopathy identification,
highlighting the importance of early detection in
reducing its effect on eye health.
1.5
Transfer Learning
Transfer learning, a strong paradigm in machine
learning, has emerged as a game-changing way to
improving the efficiency and performance of many
jobs. At its essence, transfer learning takes
information obtained from addressing one issue and
applies it to a new but related activity (Yadav, et al. ,
2022). Transfer learning, unlike standard machine
learning models that start from scratch, enables pre-
trained models on huge and varied datasets to be
adapted to new tasks, even when labeled data is
limited. This technology is especially useful in
sectors where getting large labeled datasets is
difficult or costly. In recent years, transfer learning
has achieved great success in a wide range of
applications, including image and audio recognition
and natural language processing. Its adaptability and
efficacy derive from its capacity to transfer learnt
features, representations, or information from one
domain to another, which speeds up the learning
process and considerably improves model