
classification for each sport. This not only enables the
detection of human poses (Lugaresi, Tang et al. 2019)
but also categorizes them into specific sport-related
activities, offering valuable insights for analyzing per-
formance and improving posture. The primary goal of
this research is to create a system that can accurately
estimate human poses, classify them into predefined
categories, and provide real-time feedback for pos-
ture correction. By addressing the limitations of ex-
isting methods, this work introduces a scalable and
user-friendly solution that is both efficient and practi-
cal. With a strong focus on sports applications, (Lu-
garesi, Tang et al. 2019) such as activity classifica-
tion and posture monitoring, this system holds great
potential to enhance performance analysis and help
prevent injuries, paving the way for smarter and more
effective training tools. The paper is organized as
follows: Section II provides a detailed background on
the pose estimation techniques used in this research,
including MediaPipe and models. Section III explains
the methodology, discussing the dataset and approach
used to develop the system. Section IV presents the
results and analysis of the model’s performance. Sec-
tion V concludes the study, summarizing the key find-
ings. Section VI outlines the future scope, suggesting
potential improvements and applications. Section VII
lists the references cited in this research. Finally, Sec-
tion VIII acknowledges the contributions and support
received during the study.
2 BACKGROUND STUDY
2.1 ResNet-200 Model
To delve deeper into the utility and effectiveness of
ResNet-200, it’s important to first understand the
challenges associated with very deep networks. As
neural networks become deeper, the optimization pro-
cess becomes increasingly difficult due to issues like
vanishing gradients, where gradients during back-
propagation diminish as they propagate through the
network. This makes it harder for earlier layers to ad-
just their weights properly, leading to slow or ineffec-
tive training. Additionally, as networks grow deeper,
they are more prone to overfitting and difficulty gen-
eralizing to unseen data. ResNet (Khosla, Teterwak,
et al. 2020) addresses these challenges by introducing
the concept of residual connections, which allows the
network to learn the residual mapping rather than the
direct mapping. Residual connections are essentially
shortcut paths that allow the input x of a layer to by-
pass certain transformations and be added directly to
the output. This mechanism ensures that the network
does not need to learn the identity function explicitly,
making it easier to train and enabling it to maintain
the flow of gradients, which is especially crucial in
very deep networks like ResNet-200.
The mathematical formula that describes the
residual block is:
y = F(x) + x (1)
Here, F(x) is the function representing the trans-
formation applied by the convolutional layers, and x
is the original input to the block. The addition of x
to F(x) allows the network to learn the residual (the
difference between the output and the input), rather
than trying to learn the full transformation. This ap-
proach makes it easier for the network to learn, as
it only needs to focus on learning small corrections
or residuals, rather than the entire function. When
considering the deeper ResNet-200 model, this design
principle allows the network to stack a large number
of layers—up to 200—without suffering from (Wang,
Jiang et al. 2017) performance degradation. The
residual connections allow the gradients to flow more
effectively during training, even through hundreds of
layers, because the skip connections ensure that the
gradients are propagated without becoming too small.
This is particularly important for training deep net-
works, where traditional architectures would struggle
to maintain gradient magnitudes across many layers.
The effectiveness of the residual connections can
be further understood by looking at the backpropaga-
tion process. When computing gradients for a resid-
ual block, the gradient of the loss with respect to the
input x is :
∂L
∂x
=
∂L
∂y
·
∂y
∂x
=
∂L
∂y
·
1 +
∂F(x)
∂x
(2)
This equation highlights that the gradient flowing
through the shortcut connection (identity mapping) is
always 1, ensuring that the gradient is never com-
pletely diminished. This is in stark contrast to tra-
ditional deep networks, where gradients can become
very small as they are propagated back through many
layers, leading to what is called the vanishing gradient
problem.
In practice, the residual block structure enables
very deep networks, like ResNet-200, to be trained
more effectively and efficiently. The additional lay-
ers allow the network to learn increasingly complex
hierarchical features. For example, in image classi-
fication tasks, shallow layers might learn basic fea-
tures like edges, while deeper layers in the network
can learn more complex representations, such as tex-
tures or object parts. By utilizing residual connec-
tions, ResNet-200 (He, Zhang, et al. 2016) can ef-
INCOFT 2025 - International Conference on Futuristic Technology
582