biomechanical applications (see e.g. Begg et al. 2005; 
Halilaj et al. 2018), making it a reliable choice where 
sensor data often have complex relationships. 
Mixed data augmentation strategies, unexplored in 
our  comparisons,  may  yield  improvements,  particu-
larly for complex methods like VAEs and GANs, that 
require  larger  datasets.  An  initial  experiment  has 
shown  that  the  RMSE  of  GAN  with  SVM  could  be 
improved from 5.06 to 4.78 (for 6-10 participants) by 
applying JIT and PM prior to the training of the GAN, 
yielding better results than JIT alone.  
One  limitation  of  the  experiments  might  be  the 
setup  for  the  hyperparameter  optimization.  The 
decision to use 200 iterations may be too restrictive, 
particularly given the complexity of models with up 
to  20  hyperparameters,  such  as  GAN-XGB. 
Conversely, models with fewer hyperparameters, like 
the SVM downstream model, as well as the JIT, PM, 
and SMOTE data augmentation methods, might have 
been  favored.  A  more  comprehensive  optimization 
could  potentially  enhance  the  performance  of  the 
other methods, in particular VAE and GAN. 
The  work  established  a  preliminary  step  into 
synthetic  data  generation  in  the  context  of  FSA 
estimation from mobile sensorics, focusing primarily 
on  the  comparison  of  methods.  Future  research 
should  build  upon  these  findings  to  explore  new 
dimensions  in  augmentation  and  synthetic  data 
generation,  aiming  to  maximize  the  accuracy  and 
utility  of  FSA  prediction  in  real-world  running 
scenarios.  Ultimately,  our  goal  is  to  provide  a  data 
generation method that supports the development of 
running  shoes  and  athlete  training  for  improved 
performance and injury prevention. 
5  CONCLUSION 
In conclusion, our work represents a step forward in 
the  quest  to  incorporate  data  augmentation  and 
synthetic data generation into the domain of wearable 
sensor  development.  We  evaluated  different 
combinations  of  methods  for  varying  numbers  of 
participants  to  estimate  the  FSA,  with  SVM 
improving the RMSE by more than 10 % compared 
to RF. The success of the simple JIT and PM method 
underscores  the  value  of  revisiting  and  adapting 
methods for more specific biomechanical constraints. 
Data  augmentation  methods  adapted  for  specialized 
problems may have the potential to generate realistic 
synthetic  data  and  therefore  facilitate  the 
development  of  more  cost-effective  algorithms  for 
wearable sensors, thus enabling researchers to move 
to field-based data collections with less intensive lab-
based back-end development. 
ACKNOWLEDGMENT 
This work has been supported by the Austrian Federal 
Ministry  for  Climate  Action,  Environment,  Energy, 
Mobility, Innovation and Technology under Contract 
No. 2021-0.641.557. 
REFERENCES 
Begg, R. K., Palaniswami, M., & Owen, B. (2005). Support 
vector machines for automated gait classification. IEEE 
transactions on Biomedical Engineering,  52(5),  828-
838. 
Bergstra,  J.,  Yamins,  D.,  &  Cox,  D.  D.  (2022).  Hyperopt: 
Distributed  Asynchronous  Hyper-Parameter  Optimiza-
tion. Astrophysics Source Code Library, ascl-2205. 
Boser,  B.  E.,  Guyon,  I.  M.,  &  Vapnik,  V.  N.  (1992).  A 
Training Algorithm for Optimal Margin Classifiers. In 
Proceedings of the fifth annual workshop on 
Computational learning theory (pp. 144-152). 
Breiman, L. (2001). Random Forests. Machine learning, 45, 
5-32. doi: 10.1023/A:1010933404324/METRICS. 
Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, 
W.  P.  (2002).  SMOTE:  Synthetic  Minority  Over-
sampling  Technique. Journal of artificial intelligence 
research, 16, 321-357. doi: 10.1613/JAIR.953. 
Chen, T., & Guestrin, C. (2016). XGBoost: A Scalable Tree 
Boosting  System.  In  Proceedings of the 22nd acm 
sigkdd international conference on knowledge 
discovery and data mining  (pp.  785-794).  doi: 
10.1145/2939672 
Cheung, R. T. H., & Davis, I. S., (2011). Landing Pattern 
Modification  to  Improve  Patellofemoral  Pain  in 
Runners:  A  Case  Series. Journal of Orthopaedic & 
Sports Physical Therapy, vol. 41, no. 12, pp. 914–919, 
doi: 10.2519/jospt.2011.3771. 
Cyran,  K.  A.,  Kawulok,  J.,  Kawulok,  M.,  Stawarz,  M., 
Michalak, M., Pietrowska, M., Widlak, P., Polańska, J. 
(2013).  Support  vector  machines  in  biomedical  and 
biometrical  applications.  In  Emerging paradigms in 
machine learning  (pp.  379-417).  Berlin,  Heidelberg: 
Springer Berlin Heidelberg. 
Goodfellow,  I.,  Pouget-Abadie,  J.,  Mirza,  M.,  Xu,  B., 
Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y. 
(2020).  Generative  Adversarial  Networks. 
Communications of the ACM,  63(11),  139-144.  doi: 
10.1145/3422622. 
Guido, R., Ferrisi, S., Lofaro, D., & Conforti, D. (2024). An 
Overview  on  the  Advancements  of  Support  Vector 
Machine  Models  in  Healthcare  Applications:  A 
Review. Information, 15(4), 235. 
Halilaj, E., Rajagopal, A., Fiterau, M., Hicks, J. L., Hastie, 
T. J., & Delp, S. L. (2018). Machine learning in human