Towards Intelligent Data Analysis: The Metadata Challenge

Besim Bilalli, Alberto Abelló, Tomàs Aluja-Banet, Robert Wrembel

Abstract

Once analyzed correctly, data can yield substantial benefits. The process of analyzing the data and transforming it into knowledge is known as Knowledge Discovery in Databases (KDD). The plethora and subtleties of algorithms in the different steps of KDD, render it challenging. An effective user support is of crucial importance, even more now, when the analysis is performed on Big Data. Metadata is the necessary component to drive the user support. In this paper we study the metadata required to provide user support on every stage of the KDD process. We show that intelligent systems addressing the problem of user assistance in KDD are incomplete in this regard. They do not use the whole potential of metadata to enable assistance during the whole process. We present a comprehensive classification of all the metadata required to provide user support. Furthermore, we present our implementation of a metadata repository for storing and managing this metadata and explain its benefits in a real Big Data analytics project.

References

  1. Bernstein, A., Provost, F. J., and Hill, S. (2005). Toward intelligent assistance for a data mining process: An ontology-based approach for cost-sensitive classification. IEEE TKDE, 17(4).
  2. CWM (2003). Object Management Group: Common warehouse metamodel specification. Available at http://www.omg.org/spec/CWM/1.1/PDF/.
  3. Diamantini, C., Potena, D., and Storti, E. (2009). Ontologydriven KDD process composition. In IDA.
  4. Engels, R. (1996). Planning tasks for KDD; performing task-oriented user-guidance. In KDD.
  5. Fayyad, U. M. et al. (1996). From data mining to knowledge discovery in databases. AI Magazine, 17(3).
  6. Foshay, N. et al. (2007). Does data warehouse end-user metadata add value? Commun. ACM, 50(11).
  7. Giraud-Carrier, C. (2005). The data mining advisor: metalearning at the service of practitioners. In ICMLA.
  8. Ho, T. K. and Basu, M. (2002). Complexity measures of supervised classification problems. IEEE TPAMI, 24(3).
  9. Ioannis Kopanas, N. M. A. and Daskalaki, S. (2002). The role of domain knowledge in a large scale data mining project. In SETN.
  10. Kalousis, A. et al. (2014). Using meta-mining to support DM workflow planning and optimization. JAIR, 51(1).
  11. Kalousis, A. and Hilario, M. (2001). Model selection via meta-learning: A comparative study. IJAIT, 10(4).
  12. Kietz, J., Serban, F., Fischer, S., and Bernstein, A. (2014). Semantics Inside! But Let's Not Tell the Data Miners: Intelligent Support for Data Mining. In ESWC.
  13. Lindner, G. and Studer, R. (1999). AST: support for algorithm selection with a CBR approach. In PKDD.
  14. Moreau, L. et al. (2011). The open provenance model core specification (v1.1). FGCS, 27(6).
  15. Morik, K. and Scholz, M. (2002). The miningmart approach. In Informatik bewegt: Informatik.
  16. Raes, J. (1992). Inside two commercially available statistical expert systems. Statistics and Computing, 2(2).
  17. Serban, F., Vanschoren, J., Kietz, J., and Bernstein, A. (2013). A survey of intelligent assistants for data analysis. ACM Comput. Surv., 45(3).
  18. Simmhan, Y. L., Plale, B., and Gannon, D. (2005). A survey of data provenance in e-science. SIGMOD Rec., 34(3).
  19. Sleeman, D. H., Rissakis, M., Craw, S., Graner, N., and Sharma, S. (1995). Consultant-2: pre- and postprocessing of ML applications. IJHCS, 43(1).
  20. Varga, J. et al. (2014). Towards next generation BI systems: The analytical metadata challenge. In DaWaK.
  21. Záková, M., Kremen, P., ZeleznÉ, F., and Lavrac, N. (2011). Automating KD workflow composition through ontology-based planning. IEEE T-ASE, 8(2).
Download


Paper Citation


in Harvard Style

Bilalli B., Abelló A., Aluja-Banet T. and Wrembel R. (2016). Towards Intelligent Data Analysis: The Metadata Challenge . In Proceedings of the International Conference on Internet of Things and Big Data - Volume 1: IoTBD, ISBN 978-989-758-183-0, pages 331-338. DOI: 10.5220/0005876203310338


in Bibtex Style

@conference{iotbd16,
author={Besim Bilalli and Alberto Abelló and Tomàs Aluja-Banet and Robert Wrembel},
title={Towards Intelligent Data Analysis: The Metadata Challenge},
booktitle={Proceedings of the International Conference on Internet of Things and Big Data - Volume 1: IoTBD,},
year={2016},
pages={331-338},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0005876203310338},
isbn={978-989-758-183-0},
}


in EndNote Style

TY - CONF
JO - Proceedings of the International Conference on Internet of Things and Big Data - Volume 1: IoTBD,
TI - Towards Intelligent Data Analysis: The Metadata Challenge
SN - 978-989-758-183-0
AU - Bilalli B.
AU - Abelló A.
AU - Aluja-Banet T.
AU - Wrembel R.
PY - 2016
SP - 331
EP - 338
DO - 10.5220/0005876203310338