loading
Papers Papers/2022 Papers Papers/2022

Research.Publish.Connect.

Paper

Paper Unlock

Authors: Dominik Kerzel 1 ; Sheeba Samuel 1 ; 2 and Birgitta König-Ries 1 ; 2

Affiliations: 1 Heinz Nixdorf Chair for Distributed Information Systems, Friedrich Schiller University Jena, Germany ; 2 Michael Stifel Center Jena, Friedrich Schiller University Jena, Germany

Keyword(s): Machine Learning, Information Extraction, Provenance, Jupyter Notebook, Reproducibility.

Abstract: Machine learning (ML) pipelines are constructed to automate every step of ML tasks, transforming raw data into engineered features, which are then used for training models. Even though ML pipelines provide benefits in terms of flexibility, extensibility, and scalability, there are many challenges when it comes to their reproducibility and data dependencies. Therefore, it is crucial to track and manage metadata and provenance of ML pipelines, including code, model, and data. The provenance information can be used by data scientists in developing and deploying ML models. It improves understanding complex ML pipelines and facilitates analyzing, debugging, and reproducing ML experiments. In this paper, we discuss ML use cases, challenges, and design goals of an ML provenance management tool to automatically expose the metadata. We introduce MLProvLab, a JupyterLab extension, to automatically identify the relationships between data and models in ML scripts. The tool is designed to help da ta scientists and ML practitioners track, capture, compare, and visualize the provenance of machine learning notebooks. (More)

CC BY-NC-ND 4.0

Sign In Guest: Register as new SciTePress user now for free.

Sign In SciTePress user: please login.

PDF ImageMy Papers

You are not signed in, therefore limits apply to your IP address 3.231.146.172

In the current month:
Recent papers: 100 available of 100 total
2+ years older papers: 200 available of 200 total

Paper citation in several formats:
Kerzel, D.; Samuel, S. and König-Ries, B. (2021). Towards Tracking Provenance from Machine Learning Notebooks. In Proceedings of the 13th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K 2021) - KDIR; ISBN 978-989-758-533-3; ISSN 2184-3228, SciTePress, pages 274-281. DOI: 10.5220/0010681400003064

@conference{kdir21,
author={Dominik Kerzel. and Sheeba Samuel. and Birgitta König{-}Ries.},
title={Towards Tracking Provenance from Machine Learning Notebooks},
booktitle={Proceedings of the 13th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K 2021) - KDIR},
year={2021},
pages={274-281},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0010681400003064},
isbn={978-989-758-533-3},
issn={2184-3228},
}

TY - CONF

JO - Proceedings of the 13th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K 2021) - KDIR
TI - Towards Tracking Provenance from Machine Learning Notebooks
SN - 978-989-758-533-3
IS - 2184-3228
AU - Kerzel, D.
AU - Samuel, S.
AU - König-Ries, B.
PY - 2021
SP - 274
EP - 281
DO - 10.5220/0010681400003064
PB - SciTePress