impact estimations. The algorithms and techniques 
have been successfully employed in several large case 
studies, leading to practical data lineage and 
component dependency visualizations. We continue 
this research by performance measurement with the 
number of different big datasets, to present practical 
examples and draw conclusion of our approach. 
We also considering a more abstract, conceptual 
and business level approach in addition to the current 
physical/technical level of data lineage representation 
and automation. 
ACKNOWLEDGEMENTS 
The research has been supported by EU through 
European Regional Development Fund. 
REFERENCES 
Anand, M. K., Bowers, S., McPhillips, T., & Ludäscher, B. 
(2009, March). Efficient provenance storage over 
nested data collections. In Proceedings of the 12th 
International Conference on Extending Database 
Technology: Advances in Database Technology (pp. 
958-969). ACM. 
Anand, M. K., Bowers, S., & Ludäscher, B. (2010, March). 
Techniques for efficiently querying scientific workflow 
provenance graphs. In EDBT (Vol. 10, pp. 287-298). 
Benjelloun, O., Sarma, A. D., Hayworth, C., & Widom, J. 
(2006). An introduction to ULDBs and the Trio system. 
IEEE Data Engineering Bulletin, March 2006. 
Buneman, P., Khanna, S., & Wang-Chiew, T. (2001). Why 
and where: A characterization of data provenance. In 
Database Theory—ICDT 2001 (pp. 316-330). Springer 
Berlin Heidelberg. 
Cheney, J., Chiticariu, L., & Tan, W. C. (2009). Provenance 
in databases: Why, how, and where. Now Publishers 
Inc. 
Cui, Y., Widom, J., & Wiener, J. L. (2000). Tracing the 
lineage of view data in a warehousing environment. 
ACM Transactions on Database Systems (TODS), 
25(2), 179-227. 
Cui, Y., & Widom, J. (2003). Lineage tracing for general 
data warehouse transformations. The VLDB Journal—
The International Journal on Very Large Data Bases, 
12(1), 41-58.  
de Santana, A. S., & de Carvalho Moura, A. M. (2004). 
Metadata to support transformations and data & 
metadata lineage in a warehousing environment. In 
Data Warehousing and Knowledge Discovery (pp. 249-
258). Springer Berlin Heidelberg. 
Fan, H., & Poulovassilis, A. (2003, November). Using 
AutoMed metadata in data warehousing environments. 
In Proceedings of the 6th ACM international workshop 
on Data warehousing and OLAP (pp. 86-93). ACM. 
Giorgini, P., Rizzi, S., & Garzetti, M. (2008). GRAnD: A 
goal-oriented approach to requirement analysis in data 
warehouses. Decision Support Systems, 45(1), 4-21. 
Heinis, T., & Alonso, G. (2008, June). Efficient lineage 
tracking for scientific workflows. In Proceedings of the 
2008 ACM SIGMOD international conference on 
Management of data (pp. 1007-1018). ACM. 
Ikeda, R., Das Sarma, A., & Widom, J. (2013, April). 
Logical provenance in data-oriented workflows?. In 
Data Engineering (ICDE), 2013 IEEE 29th 
International Conference on (pp. 877-888). IEEE. 
Missier, P., Belhajjame, K., Zhao, J., Roos, M., & Goble, 
C. (2008). Data lineage model for Taverna workflows 
with lightweight annotation requirements. In 
Provenance and Annotation of Data and Processes (pp. 
17-30). Springer Berlin Heidelberg. 
Priebe, T., Reisser, A., & Hoang, D. T. A. (2011). 
Reinventing the Wheel?! Why Harmonization and 
Reuse Fail in Complex Data Warehouse Environments 
and a Proposed Solution to the Problem. 
Ramesh, B., & Jarke, M. (2001). Toward reference models 
for requirements traceability. Software Engineering, 
IEEE Transactions on, 27(1), 58-93.  
Reisser, A., & Priebe, T. (2009, August). Utilizing 
Semantic Web Technologies for Efficient Data Lineage 
and Impact Analyses in Data Warehouse Environments. 
In Database and Expert Systems Application, 2009. 
DEXA'09. 20th International Workshop on (pp. 59-63). 
IEEE.  
Skoutas, D., & Simitsis, A. (2007). Ontology-based 
conceptual design of ETL processes for both structured 
and semi-structured data. International Journal on 
Semantic Web and Information Systems (IJSWIS), 
3(4), 1-24.  
Tan, W. C. (2007). Provenance in Databases: Past, Current, 
and Future. IEEE Data Eng. Bull., 30(4), 3-12. 
Tomingas, K., Tammet, T., & Kliimask, M. (2014), Rule-
Based Impact Analysis for Enterprise Business 
Intelligence. In Proceedings of the Artificial 
Intelligence Applications and Innovations (AIAI2014) 
conference workshop (MT4BD).  Series: IFIP 
Advances in Information and Communication 
Technology, Vol. 437.  
Tomingas, K., Kliimask, M., & Tammet, T. (2015). Data 
Integration Patterns for Data Warehouse Automation. 
In New Trends in Database and Information Systems II 
(pp. 41-55). Springer International Publishing. 
Vassiliadis, P., Simitsis, A., & Skiadopoulos, S. (2002). 
Conceptual modeling for ETL processes. In 
Proceedings of the 5th ACM international workshop on 
Data Warehousing and OLAP (pp. 14-21). ACM. 
Widom, J. (2004). Trio: A system for integrated 
management of data, accuracy, and lineage. Technical 
Report. 
Woodruff, A., & Stonebraker, M. (1997). Supporting fine-
grained data lineage in a database visualization 
environment. In Data Engineering, 1997. Proceedings. 
13th International Conference on (pp. 91-102). IEEE.