Exploring Human Activity Recognition through Deep Learning
Techniques
Malleni Vyshnavi, Syeda Sanuber Naaz, Verriboina Subbamma,
Dandannagari Shirisha and N. Parashuram
Department of Computer Science Engineering, Ravindra College of Engineering for Women, Kurnool - 518002, Andhra
Pradesh, India
Keywords: Human Activity Recognition, Deep Learning, Convolutional Neural Networks, Recurrent Neural Networks,
Long Short‑Term Memory, Sensor‑Based Recognition, Video‑Based HAR, Wearable Sensors, Feature
Extraction, Time Series Data, Pose Estimation, Transfer Learning, Computer Vision, Data Augmentation,
Real‑Time Activity Recognition.
Abstract: Human Activity Recognition (HAR) refers to recognizing human activities by interpreting their data coming
from acceleration and gyroscope signals from different devices. In past studies, HAR has been achieved
through the method of using traditional features and machine learning algorithms. Now, however, deep
learning has been raised as a very strong possibility that could be used to improve the performance of HAR
classification systems. This project, as an epitome of the HAR journey, gathers features from dirty
applications in machine learning to sophisticated uses of deep-learning techniques such as Convolutional
Neural Networks (CNNs) and Recurrent Neural Networks (RNNs). Deep Learning for Human Activity
Recognition comprises facts within the domain of efficiently collecting, processing, and analyzing human
activity identifying data. As part of this project, we will attempt to utilize deep learning modeling techniques
for various activities such as activity classification and fall-detection activities; give input from publicly
available datasets and evaluation metrics; demonstration of multi-modal data integration and transfer learning
will also be discussed with the view to improving systems for HAR applications in healthcare.
1 INTRODUCTION
Human Activity Recognition (HAR) is one of the
fastest-growing and versatile technologies which has
see a rapid adoption in diverse areas, including
healthcare, smart homes, industrial automation. It is a
key to the patient’s health control, disease monitoring
and human behavior analysis for safety, comfort and
energy efficiency. Half Adder Receiver (HAR) is
generally divided into the video-based and sensor-
based systems. Whereas video-based HAR employs
visual input for activity recognition, sensor based
HAR uses information from wearable or
environmental sensors. Because video-based
supervision impinges too much on the privacy of
individuals, sensor-based HAR is now acknowledged
as the more acceptable and moral choice. Embedded
sensor-rich smart devices can sample signals in the
surrounding environment to recognize human
activities in adversarial or demanding settings,
providing passive and unobtrusive activity
recognition for real-time and continuous monitoring
without collecting direct privacy-sensitive
information. In general, HAR is paving the way for
intelligent context- aware systems that not only
enhance human’s life but also automate various
industries.
2 RESEARCH METHODOLOGY
2.1 Research Area
Human activity recognition (HAR) employs deep
learning techniques to analyze sensor data acquired
from accelerometers, gyroscopes, and even cameras,
to classify different applications of human activity.
This could be health care, intelligent surveillance, and
also HCI applications. They are expected to
investigate and analyze deep learning models
currently incorporating CNNs, RNNs, LSTMs, and
372
Vyshnavi, M., Naaz, S. S., Subbamma, V., Shirisha, D. and Parashuram, N.
Exploring Human Activity Recognition through Deep Learning Techniques.
DOI: 10.5220/0013913400004919
Paper published under CC license (CC BY-NC-ND 4.0)
In Proceedings of the 1st International Conference on Research and Development in Information, Communication, and Computing Technologies (ICRDICCT‘25 2025) - Volume 4, pages
372-376
ISBN: 978-989-758-777-1
Proceedings Copyright © 2026 by SCITEPRESS Science and Technology Publications, Lda.
Transformers concerning their optimum utilization at
respective operations.
3 LITERATURE REVIEW
Elements of traditional HAR have used rule-based
systems and machine learning algorithms, such as
SVM, KNN, and DT, which require manual feature
extraction. The major shift in HAR began when deep
learning took the stage because the latter learned
hierarchical representations. CNNs extract spatial
features from images and video frames, while RNNs
and LSTMs illustrate temporal dependencies in
sensor data. Hybrid Models (CNN-
LSTM/Transformer) have components else.
4 EXISTING SYSTEM
Existing HAR systems employ sensor data collected
from wearables, vision-based sensors, and IoT
systems. Data preprocessing, such as noise reduction
and normalization, improves the accuracy of the
models. Deep learning models can automatically
extract features using different architectures such as
CNNs, RNNs, and LSTMs, but many challenges
remain, including variations in sensor placement,
computational efficiency, and real-time detection.
5 PROPOSED SYSTEM
The system proposed is poised to improve both
accuracy and real-time performance through the
integration of multi-modal sensor fusion and edge
computing as new technologies in attention-
demanding applications.
Sensor Fusion: Sensor fusion allows
accelerometer, gyroscope, and video data to be
used in combination, thereby making activity
recognition richer.
Advanced Models: It uses Transformers,
Temporal Convolutional Networks (TCNs),
and Graph Convolutional Networks (GCNs),
which add value in terms of better feature
extraction. Edge Computing: Edge computing
has lightweight models deployed in portable
mobile devices for real-time recognition.
Model Interpretability: CAM and attention
visualization for interpretability. Continual
Learning: Adapts over time to new data
without requiring much retraining
6 METHODOLOGY
Collection data using sensors that are worn and with
the aid of ambient sensors, as well as vision-based
sensors, including publically HAR datasets, during
collection of data, using UCI HAR, WISDM, and
NTU RGB+D, among others.
The core preprocessing steps include reducing
noise, normalizing, segmenting the data, and
augmenting the data quality by enhancing the quality
of data.
Feature Extraction and Model Selection:
Spatial feature-specific CNNs; Temporal
dependencies by LSTMs and Transformers;
hybridization models (CNN-LSTM, GCNs)
for better accuracy while using small,
lightweight models such as MobileNet, and
TinyML for deploying edge.
Training and Evaluation: Compare the
model with ground truth by using accuracy,
precision, recall, F1 score, and confusion
matrix. This makes it cross-validated for
generalization.
Edge Computing: In-device models for real-
time recognition. Cloud APIs: External
processing for complex tasks.
Integration: Fitness-related HAR systems
with health and safety applications. Thus, it
has been completely developed under the
atmosphere of deep learning human activity
recognition systems with better accuracy,
efficiency, and real-time.
7 ARCHITECTURE
Human activity recognition in general has a deep
learning approach along with sensor data which
emanates from accelerometers, gyroscopes, and
cameras. This procedure is divided into some specific
phases. Figure 1 Shows the Stages of Human Activity
Recognition Using Machine Learning and Deep
Learning Techniques.
Data Preprocessing: the processes of
eliminating noise, normalizing, and filtering.
Feature extraction Time: in time domain
Exploring Human Activity Recognition through Deep Learning Techniques
373
(mean, variance) and frequency domain
(FFT). Modeling: CNN on spatial features,
sequential data on RNN/LSTM.
Training & Testing: Data labeled with all the
models used for training relationships and
tested via accuracy, precision, recall, and F1-
score. Classification: the ability to identify
activities in new data from sensors. Output
and Post-processing: Activity labels together
with smoothing functions to improve
measurement accuracy will be made visible.
Figure 1: Stages of Human Activity Recognition Using
Machine Learning and Deep Learning Techniques.
8 INPUT DATASETS
HAR datasets consist of time-series sensory data
recorded on wearable devices. Some popular datasets
are as follows: the UCI HAR Dataset consists of 30
subjects and 6 activities (walking, sitting, etc.), the
MHEALTH Dataset with multisensory data covering
various physical activities, and WISDM refers to
smartphone sensor data for jogging, walking, etc.
The provided features are an accelerometer and
gyroscope.
Data across three axes: X, Y, and Z. Data Split:
7080 percent for training purposes, the remainder
2030 percent for testing; and 10-fold cross-
validation. The preparation process includes
normalization, filtering out noise, and extracting key
features.
Ninety subjects were recruited to produce the
multi-sensor dataset. Twenty subjects were males and
ten were females at the time of data recording. All
subjects were physically able to carry out their normal
lifestyle activities without restriction. A process of
70% training (2870). The important part in gathering
information and segments is to normalize the noise
filter, and further extend the feature extractions.
9 EXPERIMENTAL RESULTS
In the course of a series of test runs it turned out that
deep learning techniques indisputably are efficient for
the problem of Human Activity Recognition based on
sensor data. Conventional evaluation criteria such as
accuracy, precision, recall, F1-score, and confusion
matrix are the performance measures used to assess
the classification capabilities of this model.
Model Performance Different activities are
real-time controlled and accurately identified
leveraging the model database divided into
training, validation, and test sets. Hence, the
best choice for the Human Activity
Recognition task is a deep learning model due
to the variety and the high dimensionality of
sensory data. CNN Model: This CNN model
boasts an extremely high level of accuracy: its
success rate is 92.5% of tests.
LSTM Model: The accuracy curve reached its
peak of 94.3% thanks to the extra
characterization of temporal patterns.
Hybrid CNN-LSTM Model: Here we can see
that a hybrid of a Convolutional Neural
Network (CNN) model and a Long Short-
Term Memory (LSTM) model worked the
best, achieving an accuracy level of 96.1%.
The most important thing that the model
excelled at was better anomaly detection after
the relabeling of data. Also, concerning sensor
placement, it convinces through clearly
separated capture points. Thus, the spatial and
temporal feature extraction techniques were
separated for the benefit of the Hybrid CNN-
LSTM Model.
Confusion Matrix Analysis: The confusion
matrix entries of the walking, running, and
sitting activities indicate what is the ground
truth and what has been classified as these
labels as well as the number of correct and
incorrect guesses. After confining time
duration, the steps remained clear and
unaffected. For example, a student climbing
up the stairs was not likely to perform each
ICRDICCT‘25 2025 - INTERNATIONAL CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION,
COMMUNICATION, AND COMPUTING TECHNOLOGIES
374
step more than once while climbing down. The
sensors on baby-like skin constantly emit
varying signals due to neighboring tight
locations of the sensors. These signals can
make the system be convinced of footsteps
while someone is still.
Deep Learning vs Traditional Models In deep
learning models the decision criterion is
almost always generalized to "Is it really like
the one based on image X?" This comes with
the problem that little fragments could be
practically the same, but they could also be
from different images. Some of the methods
use robustness for finding the concept of
similarity. On the other hand, such methods
introduce fragilities obtained by chance. For
example, it is impossible to use the dot product
of two vectors, i.e., a and b in a nonlinear
space to find the Eulerian distance between a
and b. Random Forest achieved 85.7% SVM
reached 88.2% In this way, computational
models of deep learning show a significant
positive trend coming out on top in cases when
it is needed.
Real-World Performance Churning through
streaming data in real-time and based on
sensor information, these models have
demonstrated a robust and lasting
performance. In diverse fields, they can be
used to perform various real-world tasks, such
as, for example, fitness tracking, healthcare
monitoring, and smart home automation.
Though the activity classification models that
have been built through deep learning have
been effective, there are certain constraints
that have to be acknowledged.
10 DISCUSSION OF RESULTS
AND RECOMMENDATIONS
10.1 The Discussion of Findings
Model Performance: Deep learning models
(CNN, LSTM) perform better than other
traditional models Like SVM and Random
Forest. Confusion matrix Insights: Similar
activities are being misclassified; for example,
walking vs. running. Sensor Quality: Poor
calibration leads to noise and affects accuracy.
Training Time & Efficiency: Deep models
require high computational power.
Generalization & Overfitting: Techniques like
dropout and cross-validation are regularizing
strategies to mitigate overfitting.
10.2 Recommendation for Future Work
Utilizing multi-sensor data as an enhanced context
data collection device. Model Improvements:
Transfer learning with hybrid models, in the form of
CNN- LSTMs. Real-time Processing: Making
inference speed optimum for real-world use. Sensor
Calibration: Applied noise reduction at sensor levels
for enhancing accuracy.
11 PERFORMANCE EVALUATION
This section thoroughly reveals how well the model
fares in HAR, able to give a good comparison of the
strong and weak points of the deep learning paradigm
with those of other methods in HAR.
Comparative Analysis Deep Learning versus
Traditional Models: Deep learning architectures have
dramatically superior performance in accuracy, such
as CNN, with a mean accuracy of 92.5% as opposed
to 85% accuracy for SVM. CNN, LSTM, and hybrid
models outperform traditional methods such as SVM
and Random Forest in recognizing complex activities.
Error Analysis:
Misclassifications: Accurate activities such as sitting
and lying have nearly identical patterns in sensors,
confusing.
Activity Ambiguous: Overlapping activities, like
walking and jogging, face bottlenecks with means of
recognizing through sensors. Evaluation Metrics
Accuracy, Precision, Recall, F1-score, and ROC-
AUC: Measurements for the correctness of a model
in producing false positives and false negatives to the
least possible extent.
11.1 Testing Robustness
Data Variation: The model was assessed with
multiple Datasets to show consistency. Evaluation of
real-world scenarios: on mobile and wearable devices
exposed to constraints of power and processing
capabilities.
Noise Handling: Performance is evaluated in
various environments for reliability in real-world
applications.
Exploring Human Activity Recognition through Deep Learning Techniques
375
11.2 Data Sources
Sensor data, which involves technology that tracks
and observes body movements for Human Activity
Recognition (HAR), relies on mobile devices like
accelerometers and gyroscopes. Among the most
well-known datasets are:
UCI HAR Dataset, which includes data collected
from thirty volunteers who performed six different
activities walking, sitting, standing, walking
downstairs, walking upstairs, and lying down using
smartphone sensors. This dataset compiles
information from various sensors, such as
accelerometers and gyroscopes, to capture physical
activities comprehensively. Another dataset,
WISDM, features smartphone sensor data collected
during activities like walking and running. These
datasets consist of real-time sequences, where each
data point includes measurements from sensors along
the X, Y, and Z axes, along with a corresponding
activity label. Typically, these datasets undergo some
initial processing before being fed into deep learning
models, including normalization, feature extraction,
and noise reduction. These steps are all aimed at
enhancing performance metrics, particularly
accuracy, in the model.
12 CONCLUSIONS
Deep learning in HAR has unlocked incredible
potential for effectively identifying various types of
human activity through data collected from wearable
devices. Among the different deep learning models,
CNNs, RNNs, and LSTMs have unique strengths in
recognizing the spatial and temporal features
embedded in the sensor data. The results highlight the
power of deep learning in tackling complex, high-
dimensional time-series data, giving it an edge over
traditional machine learning methods when it comes
to accuracy. However, there’s still a long way to go:
we need to improve subject and environment
generalization to reduce misclassification errors and
enable real-time execution on resource-constrained
devices. Looking ahead, significant advancements
will likely come from factors like data quality and
diversity.
REFERENCES
"Real-Time Human Activity Recognition with Deep
Learning Models" by M. Awais and A. Zafar, published
in the Journal of Ambient Intelligence and Humanized
Computing, volume 11, issue 4, pages 1633-1647. You
can find it at https://doi.org/10.1007/s12652-019-
01553-4.
He, Y. and Wu, Z. discussed "Activity recognition of raw
accelerometer data with deep neural networks" in IEEE
Access, 2019; volume 7, pages 78176-78184. You can
read more athttps://doi.org/10.1109/ACCESS.2019.29
20086
Hu, C., Zhang, J., and Zhang, Z. (2017) explored a deep
learning method for recognizing human activities using
multi-sensor data. This was presented at the
International Conference on Artificial Intelligence and
Computer Engineering (AICE), pages 75-81. Check it
out here: https://doi.org/10.1109/AICE.2017.50.
Lastly, Sadeghi, M., and Ganaie, M.A. (2019) wrote about
human activity recognition using hybrid deep learning
models in the Journal of Computational Science,
volume 35, pages 32-41. You can access it at
https://doi.org/10.1016/j.jocs.2019.01.002.
Ravi, D., Dandekar, N., Zhang, Z., and Looney, C.
presented their work on "Activity Recognition from
accelerometer data" at the 17th International
Conference on Pattern Recognition (ICPR), volume 1,
pages 14. Find it here:https://doi.org/10.1109/ICPR.20
04.1334026.
ICRDICCT‘25 2025 - INTERNATIONAL CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION,
COMMUNICATION, AND COMPUTING TECHNOLOGIES
376