
 
  [Performance  and  scalability]  From  the 
implementation  point  of  view, our  goal  is to 
make execution in the cloud more performant 
and to better use the scalability features of the 
cloud. These features are especially important 
in  the  case  of  large-scale  analysis  (during 
training  phase)  as  well  as  for  serving  many 
users  or  other  applications.  In  particular,  we 
will study how existing relevant technologies 
like  Mesos  (Hindman  et  al.,  2011)  or 
Kubernetes can be used for that purpose.  
REFERENCES 
Abadi, D.J., 2007. Column stores for wide and sparse data. 
In Proceedings of the Conference on Innovative Data 
Systems Research (CIDR), 292–297.  
Ahmed, M., Mahmood,  A.N., Hu, J., 2016. A survey of 
network  anomaly  detection  techniques.  Journal  of 
Network and Computer Applications 60, 19–31.  
Berthold, M.R., Cebron, N., Dill, F., Gabriel, T., Koetter, 
T., Meinl, T., Ohl, P., Sieb, C., Thiel, K., Wiswedel, B., 
2007.  Knime:  The  Konstanz  Information  Miner. 
Proceedings Studies in Classification, Data Analysis, 
and  Knowledge  Organization  (GfKL),  Freiburg, 
Germany, Springer-Verlag.  
Borg,  I.,  Groenen,  P.J.,  2005.  Modern  Multidimensional 
Scaling: Theory and Applications. Springer, New York, 
NY, USA.  
Box, G.E.P., Jenkins,  G.M., 1976.  Time  Series Analysis: 
Forecasting and Control, Rev. Edition, San Francisco: 
Holden-Day.  
Chandola,  V.,  Banerjee,  A.,  Kumar,  V.,  2009.  Anomaly 
Detection: A Survey. ACM Computing Surveys 41(3).  
Chandola,  V,  Banerjee,  A.,  Kumar,  V.,  2012.  Anomaly 
Detection  for  Discrete  Sequences:  A  Survey,  IEEE 
Transactions  on  Knowledge  and  Data  Engineering, 
24(5).  
Copeland, G.P., Khoshafian, S.N., 1985. A decomposition 
storage model. In SIGMOD 1985, 268–279.  
de Leeuw,  J.,  1988.  Convergence  of  the  majorization 
methods  for  multidimensional  scaling.  Journal  of 
Classification, 5(2):163–180.  
Dean, J, Ghemawat, S., 2004. MapReduce: Simplified data 
processing  on  large  clusters.  In  Sixth  Symposium  on 
Operating  System  Design  and  Implementation 
(OSDI'04), 137–150.  
Guyon,  I.,  Gunn,  S.,  Nikravesh,  M.,  Zadeh,  L.A.,  2006. 
Feature  Extraction:  Foundations  and  Applications. 
Springer, New York, NY, USA.  
Kandel,  S.  et  al.,  2011.  Research  Directions  in  Data 
Wrangling:  Vizualizations  and  transformations  for 
usable  and  credible  data.  Information  Visualization, 
10(4), 271–288.  
Khreich, W., Khosravifar, B., Hamou-Lhadj, A., Talhi, C., 
2017. An anomaly detection system based on variable 
N-gram features and one-class SVM. Information and 
Software Technology 91, 186–197. 
Kruskal,  J.B.,  1964.  Multidimensional  scaling  by 
optimizing goodness of fit to a nonmetric hypothesis. 
Psychometrika, 29(1):1–27.  
Manning, C.D, Raghavan, P, Schutze, H., 2008. Scoring, 
term  weighting,  and  the  vector  space  model. 
Introduction to Information Retrieval. p. 100.  
McKinney,  W.,  2010.  Data  Structures  for  Statistical 
Computing in Python. In Proceedings of the 9th Python 
in Science Conference (SciPy 2010), 51–56.  
McKinney,  W.,  2011.  pandas:  a  Foundational  Python 
Library  for  Data  Analysis  and  Statistics.  In  Proc. 
PyHPC 2011.  
Hindman,  B.,  Konwinski,  A.,  Zaharia,  M.,  Ghodsi,  A., 
Joseph, A.D., Katz, R., Shenker, S., Stoica, I., 2011. 
Mesos: A Platform for Fine-Grained Resource Sharing 
in the Data Center. Proc. 8th USENIX conference on 
Networked systems design and implementation (NSDI 
2011), 295–308.  
Saia,  R.,  Carta,  S.,  2017.  A  Frequency-domain-based 
Pattern  Mining  for  Credit  Card  Fraud  Detection,  In 
Proc.  2nd  International  Conference  on  Internet  of 
Things, Big Data and Security  (IoTBDS 2017), 386–
391.  
Savinov,  A.,  2014.  ConceptMix:  Self-Service  Analytical 
Data  Integration  Based  on  the  Concept-Oriented 
Model,  Proc.  3rd  International  Conference  on  Data 
Technologies and Applications (DATA 2014), 78–84.  
Savinov, A., 2016. DataCommandr: Column-Oriented Data 
Integration,  Transformation  and  Analysis. 
International Conference on Internet of Things and Big 
Data (IoTBD 2016), 339–347.  
Singhal, A., 2001. Modern Information Retrieval: A Brief 
Overview.  Bulletin  of  the  IEEE  Computer  Society 
Technical Committee on Data Engineering, 24(4): 35–
43.  
Schölkopf, B., Williamson, R., Smola, A., Shawe-Taylor, 
J., Platt, J., 1999. Support Vector Method for Novelty 
Detection. In Proc. 12th International Conference on 
Neural Information Processing Systems (NIPS 1999), 
582–588.  
Smola, A.J., Schölkopf, B., 2004. A Tutorial on Support 
Vector Regression. Statistics and Computing archive, 
14(3): 199–222.  
Zadrozny, P., Kodali, R., 2013. Big Data Analytics Using 
Splunk: Deriving Operational Intelligence from Social 
Media, Machine Data, Existing Data Warehouses, and 
Other Real-Time Streaming Sources. Apress, Berkely.  
Zaharia, M., Chowdhury, M., Das, T. et al., 2012. Resilient 
distributed datasets: a fault-tolerant abstraction for in-
memory  cluster  computing.  In  Proc.  9th  USENIX 
conference  on  Networked  Systems  Design  and 
Implementation (NSDI'12).  
IoTBDS 2018 - 3rd International Conference on Internet of Things, Big Data and Security
62