Author:
Wil Van Der Aalst
Affiliation:
Technische Universiteit Eindhoven, Netherlands
Keyword(s):
Data Science, Big Data, Fairness, Confidentiality, Accuracy, Transparency, Process Mining.
Abstract:
The widespread use of “Big Data” is heavily impacting organizations and individuals for which these data are
collected. Sophisticated data science techniques aim to extract as much value from data as possible. Powerful
mixtures of Big Data and analytics are rapidly changing the way we do business, socialize, conduct research,
and govern society. Big Data is considered as the “new oil” and data science aims to transform this into new
forms of “energy”: insights, diagnostics, predictions, and automated decisions. However, the process of transforming
“new oil” (data) into “new energy” (analytics) may negatively impact citizens, patients, customers,
and employees. Systematic discrimination based on data, invasions of privacy, non-transparent life-changing
decisions, and inaccurate conclusions illustrate that data science techniques may lead to new forms of “pollution”.
We use the term “Green Data Science” for technological solutions that enable individuals, organizations
and soc
iety to reap the benefits from the widespread availability of data while ensuring fairness, confidentiality,
accuracy, and transparency. To illustrate the scientific challenges related to “Green Data Science”, we
focus on process mining as a concrete example. Recent breakthroughs in process mining resulted in powerful
techniques to discover the real processes, to detect deviations from normative process models, and to analyze
bottlenecks and waste. Therefore, this paper poses the question: How to benefit from process mining while
avoiding “pollutions” related to unfairness, undesired disclosures, inaccuracies, and non-transparency?
(More)