
 
ACC_NORM_VAR  and  GPS_SPD_MED),  whereas  the 
other 2 models M1 and M3 are different: M1 differs 
because it involves ACC_STD_V which brings the same 
information as ACC_NORM_VAR. M3 differs because it 
does not have access to ACC_NORM_VAR. 
4  CONCLUSIONS 
Given a classification (or regression) problem, due to 
the  number  of  different  possible  combinations  of 
sensors,  features,  classifiers  and  hyper-parameters, 
finding an optimal classifier is a very time consuming 
task.  
This is why, simplifying the problem, using quick 
data mining tools is very interesting. 
In this study, we present three simple data mining 
tools:  Principal  Component  Analysis,  Mahalanobis 
distance and Linear Discriminant Analysis.  
We  apply  them  on  real  data  concerning  the 
transportation mode classification problem and show 
that we are able to  
  clean the data: we remove outliers 
representing 11% of the samples 
  simplify the problem: we reduce data 
dimension from 14 to 8 and this 
simplification even improves the classifier 
performance 
  study the importance of each of 8 features; 
it turns out that feature ‘ACC_NORM_VAR’ is 
very important whereas ‘MAG_NORM_STD’ can 
be removed with a small effect on 
performance (-0.01). 
ACKNOWLEDGEMENTS 
This work is part of the BONVOYAGE project which 
has  received  funding  from  the  European  Union’s 
Horizon  2020  research  and  innovation  programme 
under grant agreement No 635867. 
REFERENCES 
Anderson,  I.,  Muller,  H.,  2006.  Exploring  GSM  Signal 
Strength  Levels  in  Pervasive Environments,  in:  20th 
International  Conference  on  Advanced  Information 
Networking  and  Applications,  2006.  AINA  2006. 
Presented  at  the  20th  International  Conference  on 
Advanced Information Networking and  Applications, 
2006. AINA 2006, pp. 87–91. https://doi.org/10.1109/ 
AINA.2006.176 
Arlot, S., Celisse, A., 2010. A survey of cross-validation 
procedures for model selection. Stat.  Surv. 4, 40–79. 
https://doi.org/10.1214/09-SS054 
De  Maesschalck,  R.,  Jouan-Rimbaud,  D.,  Massart,  D.L., 
2000. The Mahalanobis distance. Chemom. Intell. Lab. 
Syst.  50,  1–18.  https://doi.org/10.1016/S0169-
7439(99)00047-7 
Duda,  R.O.,  Hart,  P.E.,  Stork,  D.G.,  2001.  Pattern 
Classification  by  Richard  O.  Duda,  David  G.  Stork, 
Peter E.Hart .pdf. 
Gu, Q., Li, Z., Han, J., 2012. Generalized fisher score for 
feature selection. ArXiv Prepr. ArXiv12023725. 
Hemminki,  S.,  Nurmi,  P.,  Tarkoma,  S.,  2013. 
Accelerometer-based  Transportation  Mode  Detection 
on  Smartphones,  in:  Proceedings  of  the  11th  ACM 
Conference on Embedded Networked Sensor Systems. 
ACM,  New  York,  NY,  USA,  p.  13:1–13:14. 
https://doi.org/10.1145/2517351.2517367 
Li, C., Georgiopoulos, M., Anagnostopoulos, G.C., 2011. 
Kernel  principal  subspace  Mahalanobis  distances  for 
outlier  detection,  in:  The  2011  International  Joint 
Conference on Neural Networks. Presented at the The 
2011  International  Joint  Conference  on  Neural 
Networks,  pp.  2528–2535.  https://doi.org/10.1109/ 
IJCNN.2011.6033548 
Lorintiu,  O.,  Vassilev,  A.,  2016.  Transportation  mode 
recognition based on smartphone embedded sensors for 
carbon  footprint  estimation,  in:  2016  IEEE  19th 
International Conference on Intelligent Transportation 
Systems  (ITSC).  Presented  at  the  2016  IEEE  19th 
International Conference on Intelligent Transportation 
Systems  (ITSC),  pp.  1976–1981.  https://doi.org/ 
10.1109/ITSC.2016.7795875 
Manzoni, V.,  Maniloff, D.,  Kloeckl, K.,  Ratti, C.,  2010. 
Transportation mode identification and real-time CO2 
emission estimation using smartphones. 
Martinez, A.M., Kak, A.C., 2001. PCA versus LDA. IEEE 
Trans.  Pattern  Anal.  Mach.  Intell.  23,  228–233. 
https://doi.org/10.1109/34.908974 
Nitsche, P., Widhalm, P., Breuss, S., Brändle, N., Maurer, 
P.,  2014.  Supporting  large-scale  travel  surveys  with 
smartphones – A practical approach. Transp. Res. Part 
C  Emerg.  Technol.  43,  212–221.  https://doi.org/ 
10.1016/j.trc.2013.11.005 
Reddy,  S.,  Mun,  M.,  Burke,  J.,  Estrin,  D.,  Hansen,  M., 
Srivastava,  M.,  2010.  Using  Mobile  Phones  to 
Determine  Transportation  Modes.  ACM  Trans  Sen 
Netw 6, 13:1–13:27. https://doi.org/10.1145/1689239. 
1689243 
Sankaran,  K.,  Zhu,  M.,  Guo, X.F.,  Ananda,  A.L.,  Chan, 
M.C., Peh, L.-S., 2014. Using Mobile Phone Barometer 
for Low-power Transportation  Context Detection, in: 
Proceedings  of  the  12th  ACM  Conference  on 
Embedded Network Sensor Systems. ACM, New York, 
NY,  USA,  pp.  191–205.  https://doi.org/10.1145/ 
2668332.2668343 
Stenneth,  L.,  Wolfson,  O.,  Yu,  P.S.,  Xu,  B.,  2011. 
Transportation Mode Detection Using Mobile Phones 
and GIS Information, in: Proceedings of the 19th ACM 
SIGSPATIAL International  Conference on  Advances 
Data Mining Applied to Transportation Mode Classification Problem
45