Towards Tracking Provenance from Machine Learning Notebooks

Dominik Kerzel, Sheeba Samuel, Sheeba Samuel, Birgitta König-Ries, Birgitta König-Ries

2021

Abstract

Machine learning (ML) pipelines are constructed to automate every step of ML tasks, transforming raw data into engineered features, which are then used for training models. Even though ML pipelines provide benefits in terms of flexibility, extensibility, and scalability, there are many challenges when it comes to their reproducibility and data dependencies. Therefore, it is crucial to track and manage metadata and provenance of ML pipelines, including code, model, and data. The provenance information can be used by data scientists in developing and deploying ML models. It improves understanding complex ML pipelines and facilitates analyzing, debugging, and reproducing ML experiments. In this paper, we discuss ML use cases, challenges, and design goals of an ML provenance management tool to automatically expose the metadata. We introduce MLProvLab, a JupyterLab extension, to automatically identify the relationships between data and models in ML scripts. The tool is designed to help data scientists and ML practitioners track, capture, compare, and visualize the provenance of machine learning notebooks.

Download


Paper Citation


in Harvard Style

Kerzel D., Samuel S. and König-Ries B. (2021). Towards Tracking Provenance from Machine Learning Notebooks. In Proceedings of the 13th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K 2021) - Volume 1: KDIR; ISBN 978-989-758-533-3, SciTePress, pages 274-281. DOI: 10.5220/0010681400003064


in Bibtex Style

@conference{kdir21,
author={Dominik Kerzel and Sheeba Samuel and Birgitta König-Ries},
title={Towards Tracking Provenance from Machine Learning Notebooks},
booktitle={Proceedings of the 13th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K 2021) - Volume 1: KDIR},
year={2021},
pages={274-281},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0010681400003064},
isbn={978-989-758-533-3},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 13th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K 2021) - Volume 1: KDIR
TI - Towards Tracking Provenance from Machine Learning Notebooks
SN - 978-989-758-533-3
AU - Kerzel D.
AU - Samuel S.
AU - König-Ries B.
PY - 2021
SP - 274
EP - 281
DO - 10.5220/0010681400003064
PB - SciTePress