Table  7  presents  comparative  results  for  the 
proposed  approach  and  other  state  of  the  art 
approaches using the KTH dataset. Table 7 shows that 
the  approach  of  Wang  et  al.,  (Wang  et  al.,  2013) 
achieves an accuracy of 95.7%. While this is higher 
than the proposed approach, the computational cost of 
this method prevents it from running in real time. We 
also compare our approach with that in (Reid et al., 
2020)  who  used  a  reduced  sample  rate  and  sample 
size  to  achieve  real  time  performance  using  body 
keypoints.  The  proposed  approach  performs 
significantly better, indicating that the use of keypoint 
changes  is  a  more  robust  alternative  to  simply 
reducing  the  sample  rate  and  sample  size  while 
maintaining the real-time performance.  
Table 7: Comparison of approaches on the KTH dataset. 
Performance evaluation using the KTH 
dataset
 
Approach Accuracy Speed/FPS 
(Wang et al., 2013)  95.7%  3 
(Reid et al., 2020)  90.2%  24 
Keypoint Changes 
94.2%  24 
6  CONCLUSION 
We  have  presented  a  method  for  human  activity 
recognition  based  on  calculating  the  key  points 
changes  (Euclidean  distance  and  angle).  We  have 
shown  that  this  approach  achieves  accuracy  on  par 
with current state of  the  art methods, while using a 
sparse  representation.  Further,  we  have  conducted 
run-time experiments and shown that this method is 
sufficiently fast enough for real time applications. In 
future  work  we  will  investigate  how  this  approach 
performs  for  multi-person  activity  recognition  and 
adapt this approach for more complex activities and 
scenes involving one or more people.  
REFERENCES 
Cai, Y., Wang, Z., Yin, B., Yin, R., Du, A., Luo, Z., Li, Z., 
Zhou, X., Yu, G., Zhou, E., Zhang, X., Wei, Y., & Sun, 
J.  (2019).  Res-steps-net  for  multi-person  pose 
estimation.  Joint COCO and Mapillary Workshop at 
ICCV 2019: COCO Keypoint Challenge Track. 
Camarena, F., Chang, L., & Gonzalez-Mendoza, M. (2019). 
Improving  the  dense  trajectories  approach  towards 
efficient recognition of  simple human activities. 2019 
7th International Workshop on Biometrics and 
Forensics (IWBF), 1–6. 
Cao,  Z.,  Simon,  T.,  Wei,  S.  E.,  &  Sheikh,  Y.  (2017). 
Realtime  multi-person  2D  pose  estimation  using  part 
affinity  fields.  IEEE/CVF Conference on Computer 
Vision and Pattern Recognition (CVPR), 7291–7299. 
Choutas,  V.,  Weinzaepfel,  P.,  Revaud,  J.,  &  Schmid,  C. 
(n.d.). PoTion: Pose MoTion Representation for Action 
Recognition. 
D’Sa,  A.  G.,  &  Prasad,  B.  G.  (2019).  An  IoT  Based 
Framework  For  Activity  Recognition  Using  Deep 
Learning  Technique.  In  ArXiv Preprint. 
http://arxiv.org/abs/1906.07247 
Dollar, P., Rabaud, V., Cottrell, G., & Belongie, S. (2005). 
Behavior  recognition  via  sparse  spatio-temporal 
features. 2005 IEEE International Workshop on Visual 
Surveillance and Performance Evaluation of Tracking 
and Surveillance, 65–72. 
Efros,  A.  A.,  Berg,  A.  C.,  Mori,  G.,  &  Malik,  J.  (2003). 
Recognising  Action  at  a  distance.  Proceedings Ninth 
IEEE International Conference on Computer Vision, 2, 
726–733. https://doi.org/10.1017/s1358246107000136 
Gao, Z., Chen, M. Y., Hauptmann, A. G., & Cai, A. (2010). 
Comparing  evaluation  protocols  on  the  KTH  dataset. 
International Workshop on Human Behavior 
Understanding, 88–100. 
Gorelick, L., Blank, M., Shechtman, E., Member, S., Irani, 
M., & Basri, R. (2007). Actions as space time shapes. 
IEEE Transactions on Pattern Analysis and Machine 
Intelligence,  29(12),  2247–2253.  https://doi.org/ 
10.1109/TPAMI.2007.70711 
Guo,  K.,  Ishwar,  P.,  &  Konrad,  J.  (2010).  Action 
recognition  using  sparse  representation  on  covariance 
manifolds  of  optical  flow.  Proceedings - IEEE 
International Conference on Advanced Video and 
Signal Based Surveillance, AVSS 2010,  188–195. 
https://doi.org/10.1109/AVSS.2010.71 
Jain,  M.,  Jégou,  H.,  &  Bouthemy,  P.  (2013).  Better 
exploiting motion for better action recognition. 
https://doi.org/10.1109/CVPR.2013.330 
Ke,  Y.,  Sukthankar,  R.,  &  Hebert,  M.  (2005).  Efficient 
Visual  Event  Detection  Using  Volumetric  Features. 
Tenth IEEE International Conference on Computer 
Vision (ICCV’05),  166–173.  https://doi.org/10.1109/ 
CVPR.2007.383137 
Laptev, I. (2004). Local Spatio-Temporal Image Features 
for Motion Interpretation. 
Lee,  D.  G.,  &  Lee,  S.  W.  (2019).  Prediction  of  partially 
observed  human  activity  based  on  pre-trained  deep 
representation.  Pattern Recognition,  85,  198–206. 
https://doi.org/10.1016/j.patcog.2018.08.006 
Lin, Liang,  et al. (2020). The Foundation and Advances of 
Deep Learning. In Human Centric Visual Analysis with 
Deep Learning (pp. 3-13.). 
Matikainen,  P.,  Hebert,  M.,  &  Sukthankar,  R.  (2009). 
Trajectons:  Action  recognition  through  the  motion 
analysis  of  tracked  features.  2009 IEEE 12th 
International Conference on Computer Vision 
Workshops, ICCV Workshops 2009,  514–521. 
https://doi.org/10.1109/ICCVW.2009.5457659