Bridging Competency Gaps in Data Science: Evaluating the Role of

Automation Frameworks Across the DASC-PM Lifecycle

Maike Holtkemper and Christian Beecks

University of Hagen, Chair of Data Science, Universit

atsstrasse 11, 58097 Hagen, Germany

Keywords:

Automation Frameworks, EDISON Data Science Framework, Data Science Process Models, DASC-PM.

Abstract:

Successful data science projects require a balanced mix of competencies. However, a shortage of skilled

professionals disrupts this balance, fragmenting expertise across the data science pipeline. This fragmentation

causes inefﬁciencies, delays, and project failures. Automation frameworks can help to mitigate these issues

by handling repetitive tasks and integrating specialized skills. These frameworks improve workﬂow efﬁciency

across project phases but remain limited in critical areas like project initiation and deployment. This pre-study

identiﬁes tasks in each project phase using the DASC-PM model. The model structures the assessment of

automation potential and maps tasks to the EDISON Data Science Framework (EDSF), determining which

competencies automation can support. The ﬁndings indicate that automation enhances efﬁciency in early

phases, such as Data Provision and Analysis, contrasting with challenges in Project Order and Deployment,

where human expertise remains essential. Addressing these gaps can improve collaboration and create a more

integrated data science workﬂow.

1 INTRODUCTION

In today’s digital era, data drives decision-making

across sectors as organizations analyze large datasets

to gain insights, shape actions, and stay competitive

(Robinson and Nolis, 2020). These insights remain

difﬁcult to realize, as many organizations still strug-

gle to succeed in data science projects (G

okay et al.,

2023). These projects require well-structured execu-

tion to reduce risks and improve outcomes (Haertel

et al., 2022; Kutzias et al., 2021).

However, the success of data science endeavors

relies on both structured models and team compe-

tencies (Santana and D

ıaz-Fern

andez, 2023). These

competencies, including e.g. data engineering, are

essential for supporting the entire project lifecycle

(Cuadrado-Gallego and Demchenko, 2020). How-

ever, growing competency gaps, driven by demand

that outpaces the supply of skilled professionals, limit

the transformation of raw data into insights and under-

mine project success (Mikalef and Krogstie, 2019).

Automation frameworks have been shown to auto-

mate repetitive tasks and reduce fragmentation across

project phases (Wang et al., 2021; Macas et al.,

2017). By reducing fragmentation across phases,

these frameworks help bridge silos and create a col-

laborative environment throughout the data science

lifecycle (Abdelaal et al., 2023). Tools such as Au-

toDS support data provision and analysis while link-

ing stages for smoother team transitions (Wang et al.,

2021). These tools alleviate competency shortages

by automating expertise-driven tasks, which in turn

optimize workﬂows, reduce errors, and accelerate

project completion (Abdelaal et al., 2023; Macas

et al., 2017). These optimized workﬂows are enabled

by frameworks that align with DASC-PM stages and

show potential for integrated automation.

This pre-study investigates how automation

frameworks support essential data science compe-

tencies, addressing the research question: How do

automation frameworks support the competencies

needed for data science projects? Using the EDSF

(Demchenko et al., 2022), it examines how these tools

complement or replace human expertise in data prepa-

ration, modeling, and evaluation. These competen-

cies are mapped to DASC-PM tasks to identify gaps

and ensure a coherent project approach. This mapping

forms the main contribution by combining DASC-PM

and EDSF to assess competency coverage, highlight

unmet needs, and inform future tool development. As

a pre-study, the paper outlines key challenges, pro-

poses a structured approach, and lays the founda-

tion for future research. The structure is as follows:

Section 2 reviews the background, Section 3 details

the methodology, Section 4 presents the ﬁndings, and

Section 5 concludes with future directions.

Holtkemper, M., Beecks and C.

Bridging Competency Gaps in Data Science: Evaluating the Role of Automation Frameworks Across the DASC-PM Lifecycle.

DOI: 10.5220/0013559900003967

In Proceedings of the 14th International Conference on Data Science, Technology and Applications (DATA 2025), pages 491-499

ISBN: 978-989-758-758-0; ISSN: 2184-285X

491

2 BACKGROUND

Data’s role is expanding as organizations invest in AI

and data projects to drive revenue, efﬁciency, and in-

novation (Santana and D

ıaz-Fern

andez, 2023). While

terms like AI, data science, and machine learning dif-

fer, their core objectives remain consistent across in-

dustries (Kruhse-Lehtonen and Hofmann, 2020).

Many data science projects fail to meet expecta-

tions, with most never reaching production. Venture-

Beat (VentureBeat, 2019) reports an 87% failure rate,

while NewVantage Partners (NewVantage Partners,

2019) ﬁnds 77% of companies struggle with AI adop-

tion. Additionally, 70% report minimal AI impact

(Ransbotham et al., 2019), and data scientists face

challenges integrating models into operations. Dav-

enport et al. (Davenport et al., 2020) note a widen-

ing gap between successful and failing organizations.

Poor project management and technical hurdles fur-

ther drive high failure rates (Joshi et al., 2021).

Research highlights a disconnect between techni-

cal processes and organizational practices in data sci-

ence projects, increasing risks such as poor project

management, competency gaps, and data quality is-

sues (Saltz, 2021; Martinez et al., 2021). Boina

(Boina et al., 2023) discusses the integration of

Data Engineering and Intelligent Process Automa-

tion (IPA) to enhance business efﬁciency and inno-

vation. Reddy et al. (Reddy et al., 2024) identify

a competency gap as a key barrier to effective data

science adoption and alignment with organizational

goals. Li et al. (Li et al., 2021) identify critical skills

and domain knowledge gaps in U.S. manufacturing

by analyzing job postings and professional proﬁles,

highlighting the need for targeted workforce training.

Aljohani et al. (Aljohani et al., 2022) reveal a per-

sistent mismatch between university curricula and job

market demands through a large-scale analysis.

To address these complexities, it is essential to

clearly deﬁne the competencies required. Gartner

(James and Duncan, 2023) predicts that by 2026,

leading data science teams will need increasingly di-

verse skill sets, resulting in signiﬁcant changes to

team structures. In response, competence frameworks

have become increasingly important for deﬁning and

cultivating the skills needed in data science initiatives

(Salminen et al., 2024; Brauner et al., 2025).

2.1 The EDISON Data Science

Framework (EDSF)

The EDSF, developed during the EDISON project

(2015–2017), deﬁnes key competencies for data sci-

entists (Cuadrado-Gallego and Demchenko, 2020). It

provides a structured curriculum and knowledge base

to support skill development (European Commission,

2017). A core component, the Competence Frame-

work for Data Science (CF-DS), links essential com-

petencies to relevant knowledge and skills, ensuring a

standardized training model across Europe (European

Commission, 2017).

EDSF aligns with the European e-Competence

Framework (European Committee for Standardiza-

tion, 2014), deﬁning competence as the ability to

apply knowledge, skills, and attitudes to achieve

results (European Committee for Standardization,

2014, p.5). CF-DS categorizes competencies into

ﬁve areas: Data Analytics (DSDA), Data Engineer-

ing (DSENG), Data Management (DSDM), Research

Methods and Project Management (DSRMP), and

Domain-Speciﬁc Knowledge (DSDK). Missing com-

petencies can hinder project success, as expertise is

often distributed across teams. Automation frame-

works can help bridge these gaps, enhancing efﬁ-

ciency and outcomes (Potanin et al., 2024). The full

EDISON CF-DS framework is available here.

Table 1: Excerpt of the EDISON CF-DS.

Cat. Sub-Category Sub-Sub-Category

Data Analytics - DSDA

DSDA01: Use a va-

riety of data analyt-

ics techniques

Machine Learning,

Data mining, Pre-

scriptive analytics,

Predictive analytics,

Data life cycle

DSDA02: Apply

designated quantita-

tive techniques

Statistics, Time se-

ries analysis, Op-

timization, Simula-

tion, Deploy models

for analysis and pre-

diction

DSDA03: Identify,

extract, combine

available heteroge-

neous data

Modern data

sources (audio,

video, image, . . . ),

Verify data quality

2.2 Leveraging Automation to Address

Competency Gaps

The demand for skilled data scientists continues to

rise, yet a shortage of qualiﬁed professionals per-

sists (Demchenko and Jos

e, 2021). Both academia

and industry seek innovative solutions to address this

competency gap. As data volumes grow, developing

machine learning models and extracting insights be-

comes more complex, increasing the manual effort re-

quired for data processing and analysis (Elshawi et al.,

DATA 2025 - 14th International Conference on Data Science, Technology and Applications

492

2019).

Automation frameworks help by handling tasks

traditionally requiring specialized expertise. Re-

search shows that data processing (69%) and collec-

tion (64%) are well-suited for automation (Manyika

et al., 2017). These tools reduce repetitive work,

accelerate decision-making, and enhance efﬁciency

(Abbaszadegan and Grau, 2015). AutoDS (Wang

et al., 2021) and AutoCure (Abdelaal et al., 2023) sup-

port various stages of the data science process.

Automation also improves software testing, with

frameworks like Robot reducing execution time by

over 80% (Alok Chakravarthy and Padma, 2023).

While automation saves time and reduces errors, its

complexity varies across tasks. Some tasks can be

fully automated, while others require human over-

sight. As Automated Data Science (AutoDS) evolves,

Human-Computer Interaction (HCI) research high-

lights a shift in perception—data scientists now see

automation as a collaborator rather than a competitor

(Wang et al., 2021; Wang et al., 2019).

2.3 Data Science Process Model

(DASC-PM)

A survey identifying meta-requirements led a group

of data science experts from academia and industry

to develop the DASC-PM (Schulz et al., 2022). This

model structures data science projects into a ﬁve-

stage process, integrating scientiﬁc practices, appli-

cation domains, IT infrastructures, and their impacts

(Schulz et al., 2022). The ﬁve phases—Project Order,

Data Provision, Analysis, Deployment, and Appli-

cation—operate within three overarching areas: Do-

main, Scientiﬁcity, and IT Infrastructure.

The Project Order phase deﬁnes domain-speciﬁc

problems and selects use cases, requiring diverse

competencies. Data Provision covers data acquisi-

tion, storage, and management for analysis. In Anal-

ysis, the team applies or develops methodologies, en-

suring validation. Deployment implements analytical

results, while Application monitors model usage and

gathers insights for improvements.

Domain expertise guides objective setting, data in-

terpretation, and ethical considerations. Scientiﬁcity

ensures methodological rigor and structured manage-

ment. IT infrastructure supports all phases, assessed

for needs and scalability. DASC-PM addresses gaps

in existing models, providing a structured, evolving

framework (Schulz et al., 2022) (see Table 2).

This study examines how automation frame-

works support competencies essential for data science

projects. As tasks grow more complex and skilled

professionals remain scarce, automation helps bridge

skill gaps. Using the CF-DS, the study maps DASC-

PM tasks to assess automation’s role in complement-

ing human expertise and identifying competency gaps

that impact project success.

3 METHODOLOGY

3.1 Literature Review

In line with Webster and Watson’s approach (Webster

and Watson, 2002), a literature review was conducted

to identify relevant studies on automation frameworks

in data science. As this is a pre-study rather than

a complete research work, no forward or backward

search was performed; instead, a keyword search was

used to gain initial insights. A search for ”automa-

tion framework” and ”data science” was conducted

in ACM Digital Library and IEEE Xplore, focusing

on publications from the last ﬁve years. Through this

keyword search, 127 automation frameworks were

found, whereas 38 remained after the abstract evalua-

tion and 22 after the full-text evaluation. Topics like

simulation platforms (Aryai et al., 2023) and educa-

tional frameworks such as AutoDomainMine (Varde,

2022) were noted but not explored in depth, as the

study focuses on automation frameworks for data sci-

ence projects.

3.2 Quality Appraisal and Qualitative

Content Analysis

After the initial literature review, a quality assess-

ment was conducted on the selected 22 articles to

ensure their reliability and relevance. The evalua-

tion methodology follows Kitchenham’s guidelines

(Kitchenham, 2004). Each article was assessed using

predeﬁned criteria, classifying them as low, medium,

or high quality based on the standards outlined by

Nidhra et al. (Nidhra et al., 2013).

The quality assessment criteria included the four

questions:

• Does the research align with the objectives of this

study? (general alignment)

• Is the study focused on an automation framework?

(automation framework)

• Does the automation framework address tasks

within the DASC-PM process model? (task-

related)

• Are the ﬁndings relevant to the aims of this study?

(usefulness of the results)

Bridging Competency Gaps in Data Science: Evaluating the Role of Automation Frameworks Across the DASC-PM Lifecycle

493

Table 2: Automation Frameworks covering DASC-PM Phases.

ID AutoFM Reference Year Goal Phase

1 DQA (Shrivastava et al., 2019) 2019 Data quality 2

2 Sweeper (Thawanthaleunglit and Sripanidkulchai, 2019) 2019 Data quality 2-4

3 AutoDS (Wang et al., 2021) 2021 ML conﬁg. 2-5

4 VizSmith (Bavishi et al., 2021) 2021 Visualization 1-2,4

5 GrumPy (Mota, JR., Joselito et al., 2021) 2021 Data analysis 2

6 DVF (Lwakatare et al., 2021) 2021 Data validation 2

7 OneBM (Lam et al., 2021) 2021 Feature Engin. 2-4

8 AutoPrep (Bilal et al., 2022) 2022 Data processing 2

9 QuickViz (Pitroda, 2022) 2022 EDA 2

10 ADE (Galhotra and Khurana, 2022) 2022 Data labeling 2-3

11 NLP (Mavrogiorgos et al., 2022) 2022 Data quality 2

12 a Datadiff (Petricek et al., 2023) 2023 Merging tables 2

12 b CleverCSV (Petricek et al., 2023) 2023 Parsing tables 2

12 c Ptype (Petricek et al., 2023) 2023 Column types 2

12 d ColNet (Petricek et al., 2023) 2023 Annotating data 3

13 AutoCure (Abdelaal et al., 2023) 2023 Data quality 2-3

14 AI (Patel et al., 2023) 2023 EDA 1-2,5

Each article was scored based on its compliance with

predeﬁned criteria, with weights assigned accord-

ingly. Articles scoring with 4 points were rated as

high quality, those below 1 as low quality, and those

between 1 and 3 as medium quality. Out of the 22 ar-

ticles reviewed, 14 were rated as high quality, and 8

were classiﬁed as medium quality.

A qualitative content analysis was conducted on

the 14 high-quality papers, identifying 17 automa-

tion frameworks for systematic analysis. The anal-

ysis followed Kuckartz’s methodology (Kuckartz and

adiker, 2022), focusing on how these frameworks

align with the DASC-PM models. A deductive coding

approach categorized text passages into the six phases

and tasks of CRISP-DM (Table 2) and the ﬁve phases

of DASC-PM. Two researchers independently coded

the documents, following the guidelines in (Kuckartz

and R

adiker, 2022). MAXQDA2022 (Version 22.8.0)

was used for the analysis.

The EDISON CF-DS was applied to evaluate the

competencies required for using automation frame-

works. This framework categorizes competencies

into ﬁve key areas: Data Analytics (DSDA), Data En-

gineering (DSENG), Data Management (DSDM), Re-

search Methods and Project Management (DSRM),

and Domain Knowledge (DSDK). The EDISON CF-

DS was used in the second coding phase.

4 FINDINGS

4.1 Analysis of Automation

Frameworks

The 17 frameworks found in the literature were

aligned with the various phases and tasks of the

DASC-PM process model (see Table 3). The analysis

revealed that only two frameworks covered partially

the Project Order phase. However, 16 frameworks

were relevant to Data Provision, 10 to Analysis, 0 to

Deployment, and 2 to Application.

The Project Order phase, which includes tasks

such as assessing the suitability of the use case, meth-

ods, and objectives, is minimally supported by au-

tomation tools. For example, tools like VizSmith and

AutoDS cover tasks like evaluating the suitability of

methods and objectives, but other tasks, such as as-

sessing the data basis, remain largely unsupported.

The Data Provision phase sees more extensive

support from automation tools. Tasks such as data

anonymization, aggregation, cleansing, and ﬁltering

are well-covered by tools like AutoDS, VizSmith, and

QuickViz, which automate data preparation. How-

ever, tasks related to data protection and metadata

management are not widely covered by the tools.

In the Analysis phase, tools like AutoDS,

Sweeper, and VizSmith provide signiﬁcant coverage

for tasks such as identifying suitable analytical meth-

ods, selecting the best parameter conﬁguration, and

evaluating results. These tasks are crucial for ensur-

DATA 2025 - 14th International Conference on Data Science, Technology and Applications

494

Table 3: Covered DASC-PM Areas, Tasks, and Subtasks.

Phase

Task

Subtask

DQA

Sweeper

AutoDS

VizSmith

GrumPy

DVF

OneBM

AutoPrep

QuickViz

ADE

NLP

Datadiff

CleverCSV

Ptype

ColNet

AutoCure

Project

Order

Sustain.

Check

Suitability of the use case - - - x - - - - - - - - - - - - -

Suitability of the obj. - - - x - - - - - - - - - - - - x

Data Provision

Data Preparation

Data aggregation - x x x - - x x - x - x - - - - x

Data annotation - x x x - - x x - x - x - - - - x

Data cleansing x x x - x x - x - - x x - x - x x

Data ﬁltering x x x - - x - x - - x x - - - x -

Data structuring x x x - - x - x - - - x - - - x x

Data transformation - - x - - - x x - - - - x - - - -

Dimensional reduction - x x x - - x x - x - x - - - - x

Format adjustment - - x - - - x x - - - - x - - - -

Data

Mgmt

Data protection x x x - x x - x - - x x - x - - x

Storing raw data - - x x x x - - - x x - x - - x -

Data access x x x x x x - x - x x x x - - x x

Explorative

Data Analysis

Data validation - x x x x x - - x x - - x - - x x

Data visualization x x x x x x - - x - - - - x - - x

Ident. central attr. x x x x x x - - x - - - - x - - x

Understanding content x x x x x x - - x - - - - x - - x

Statistical analysis - x x x x x - - x x - - x - - x x

Examining the necessity

of data transformations

- x x x x x - - x x - - x - - x x

Exam. missing values - x x x x x - - x x - - x - - x x

Analysis

Anal.

Methods

Determining potentially

suitable procedures

- - x - - - x - - x - - - - - x -

Selection - - x - - - x - - x - - - - - x -

Appl. Anal.

Methods

Setting up a development

environment

- x x - - - x - - x - - - - - - -

Constructing the

progress

- x x - - - x - - x - - - - x - -

Reducing dimensions - x x - - - x - - - - - - - x - -

Dev. Anal.

Methods

Designing the procedure - x x - - - x - - - - - - - x - -

Evaluation

Benchmarking - x x x - - x - - - - - - - x - -

Comparing procedures - - x - - - - - - - - - - - - - -

Evaluating results - x x x - - x - - - - - - - x - -

Performance tests - - x - - - x - - - - - - - x - -

Deploy.

Ensur.

Applic.

Ensuring constant appli-

cability of the model

- - x - - - - - - - - - - - - - x

Appli-

cation

Moni-

toring

Gathering domain-

speciﬁc ﬁndings for

iterative developments

- - - - - - - - - - - - - - - - x

Bridging Competency Gaps in Data Science: Evaluating the Role of Automation Frameworks Across the DASC-PM Lifecycle

495

ing the analytical models align with the data and ob-

jectives. However, there is a lack of support for more

detailed tasks such as testing and establishing the pro-

cedure, indicating a focus on operational rather than

methodological development and testing.

The Deployment phase, which involves prepar-

ing results for recipients, creating technical environ-

ments, and ensuring the system’s viability, shows sub-

stantial tool coverage. Tools like AutoPrep, OneBM,

and QuickViz assist in automating the creation of en-

vironments and the transfer of results. However, tasks

related to technical infrastructure, such as testing soft-

ware licenses and identifying hardware stacks, are

only partially covered by the automation tools.

Finally, the Application phase, which includes

monitoring, ensuring constant applicability of the

model, and gathering domain-speciﬁc ﬁndings for it-

erative improvements, is somewhat well-supported by

tools like AutoDS and QuickViz. These tools aid in

ongoing model validation and updates, ensuring long-

term applicability. However, gaps remain in tasks re-

lated to iterative improvement, suggesting the need

for more work in domain-speciﬁc analysis for con-

tinuous adaptation of models.

The tasks that are not covered by any automation

framework are highlighted below:

• Project Order:

– Sustainability Check: Suitability of the meth-

ods, Assessing the data base, Considering past

projects; Ensuring Realizability: not covered.

• Data Provision:

– Data Preparation: Data anonymization, inte-

gration, Creating data preparation plans, Log-

ging the data preparation, Process automation,

Schema integration;

– Data Management: Backing up prepared data,

Metadata management;

• Analysis:

– Identifying Suitable Analytical Methods: Iden-

tifying requirements, Determining the problem

class, Researching comparable problems;

– Applying Analytical Methods: Ensuring va-

lidity, Considering multiple analytical meth-

ods, Selecting the best parameter conﬁguration,

Weighing time against beneﬁt, Ensuring repli-

cability and transparency, Establishing criteria;

– Developing Analytical Methods: Determining

differences with relevant existing procedures,

Establishing the procedure, Testing the proce-

dure, Implementation;

– Tool Selection: not covered;

– Evaluation: Determining the evaluation crite-

ria, Estimating added value, Reviewing realiz-

abiility;

• Deployment:

– Technical and Methodical Provision: Prepar-

ing the results for the recipients, Building the

product environment, Transferring the results,

Context creation, Automating processes, Deal-

ing with IT resources, Technically testing the

system used;

– Ensuring Technical Realizability: Considering

time criticalities, Considering durations, Deal-

ing with the connected data sources, Identi-

fying the hardware and software stacks, Iden-

tifying technical conditions and opportunities,

Testing software licenses, Legal framework

conditions, Create memory access concept, En-

sure operations and support;

– Ensuring Applicability: Identify target recipi-

ents, Establishing UI/UX design, Ensure mem-

ory access, Involve users, Create a documenta-

tion concept, Create a training concept, Regu-

larly checking the quality of analysis results

The mapping of DASC-PM tasks to EDISON compe-

tencies reveals that the DASC-PM framework spans a

broad range of competencies, integrating data science

analytics, engineering, and business analytics. Table

4 shows the overall tasks per phase, where 1) Project

Order, 2) Data Provision, 3) Analysis, 4) Deployment,

5) Application. Data Preparation and Data Manage-

ment tasks align heavily with DSDA (Data Science

Analytics) and DSENG (Data Science Engineering)

competencies, focusing on data handling, process au-

tomation, and technical implementation. The Analy-

sis phase, in turn, highlights a strong focus on DSDA

competencies for analytical methods. Deployment

and Application tasks emphasize DSENG for techni-

cal realization and DSBA for ensuring business appli-

cability and impact.

Overall, the framework demonstrates a compre-

hensive approach to data science projects, combining

technical, analytical, and business perspectives. The

importance of data quality, accessibility, and business

relevance throughout the project lifecycle is under-

scored by the integration of DSDM (Data Manage-

ment) and DSBA competencies.

5 CONCLUSION

This study analyzed how automation frameworks sup-

port tasks across the DASC-PM process model. The

analysis revealed substantial variation in coverage.

DATA 2025 - 14th International Conference on Data Science, Technology and Applications

496

Table 4: Mapping DASC-PM Tasks to EDISON Competencies.

Phase Task EDISON Competencies

1 Sustainability Check DSDA01,DSDA03,DSDA04,DSDA05,DSDM05

Ensuring Realizability DSENG01,DSENG03-05,DSRMP02,DSRMP06,DSDA02,DSDA04-

05,DSBA01,DSBA03

2 Data Preparation DSENG01,DSENG03,DSENG05-06,DSDA01-0,DSDM03-

06,DSBA04

Data Management: Data

protection

DSENG04-06,DSDM03,DSDM06,DSDA03

EDA: Data validation DSDA01-04,DSDA06,DSBA03

3 Identifying Analytical

Methods

DSDA01-03,DSBA03

Applying Analytical

Methods

DSENG01,DSENG03,DSDA01-02,DSDA04-05,DSBA04

Developing Analytical

Methods

DSDA01-03,DSDA05

Tool Selection DSENG01-02,DSDA01,DSBA03

Evaluation DSDA02-05,DSBA03-04

4 Technical and Methodical

Provision

DSDA04-6,DSBA03,DSENG01-04

Ensuring Technical Real-

izability

DSDA01,DSDA03-04,DSBA04,DSENG01-02, DSENG04-06

Ensuring Applicability:

Identify target recipients

DSBA01-03,DSENG06, DSDA01,DSDA06,DSBA05-06

5 Monitoring DSDA02,DSDA04,DSDA06,DSBA04

Tools like AutoDS, VizSmith, and QuickViz offer

strong support for tasks in the Data Provision phase.

These tasks receive far less support in the Project Or-

der and Application phases. Key responsibilities such

as risk assessment, infrastructure testing, and post-

deployment monitoring remain largely unsupported.

This lack of coverage became evident through a map-

ping of DASC-PM tasks against data science compe-

tencies using the EDISON framework. The mapping

highlighted where automation can help mitigate exist-

ing competency gaps.

These gaps inform several actionable recommen-

dations. The recommendations include conducting

empirical studies to explore practical and organiza-

tional barriers to automation adoption. These stud-

ies should also examine context-speciﬁc constraints

that affect tool effectiveness. Benchmarking efforts

should assess how well existing tools support un-

derserved DASC-PM tasks, including infrastructure

setup, risk analysis, and post-deployment monitor-

ing. Integration guidelines can help teams embed

automation into workﬂows while managing compli-

ance, scalability, and security. These workﬂows

should align with both organizational processes and

project requirements. Competency-to-tool mappings

can support this alignment by helping teams select

tools that match their existing competencies.

Automation tools that bridge these gaps can

strengthen collaboration and integration across all

DASC-PM phases. This integration can reduce frag-

mentation, align competencies, and increase the suc-

cess of data science projects. This study serves as a

pre-study and provides a structured foundation for fu-

ture research, tool development, and implementation

efforts aimed at more holistic and effective automa-

tion.

REFERENCES

Abbaszadegan, A. and Grau, D. (2015). Assessing the inﬂu-

ence of automated data analytics on cost and schedule

performance. Procedia Engineering, 123:3–6.

Abdelaal, M., Koparde, R., and Schoening, H. (2023). Au-

tocure: Automated tabular data curation technique for

ml pipelines. In Bordawekar, R., Shmueli, O., Amster-

damer, Y., Firmani, D., and Kipf, A., editors, Proceed-

ings of the Sixth International Workshop on Exploit-

ing Artiﬁcial Intelligence Techniques for Data Man-

agement, pages 1–11, New York, NY, USA. ACM.

Aljohani, N. R., Aslam, A., Khadidos, A. O., and Has-

san, S.-U. (2022). Bridging the skill gap between the

acquired university curriculum and the requirements

of the job market: A data-driven analysis of scien-

tiﬁc literature. Journal of Innovation & Knowledge,

7(3):100190.

Alok Chakravarthy, N. and Padma, U. (2023). A com-

prehensive study of automation using a webapp tool

Bridging Competency Gaps in Data Science: Evaluating the Role of Automation Frameworks Across the DASC-PM Lifecycle

497

for robot framework. In Hemanth, J., Pelusi, D., and

Chen, J. I.-Z., editors, Intelligent Cyber Physical Sys-

tems and Internet of Things, volume 3 of Engineer-

ing Cyber-Physical Systems and Critical Infrastruc-

tures, pages 577–586. Springer International Publish-

ing, Cham.

Aryai, V., Mahdavi, N., West, S., and Henze, G. (2023).

An automated data-driven platform for buildings sim-

ulation. In Proceedings of the 10th ACM International

Conference on Systems for Energy-Efﬁcient Buildings,

Cities, and Transportation, pages 61–68, New York,

NY, USA. ACM.

Bavishi, R., Laddad, S., Yoshida, H., Prasad, M. R., and

Sen, K. (2021). Vizsmith: Automated visualiza-

tion synthesis by mining data-science notebooks. In

IEEE/ACM, editor, 2021 36th IEEE/ACM Interna-

tional Conference on Automated Software Engineer-

ing (ASE), pages 129–141. IEEE.

Bilal, M., Ali, G., Iqbal, M. W., Anwar, M., Malik, M. S. A.,

and Kadir, R. A. (2022). Auto-prep: Efﬁcient and au-

tomated data preprocessing pipeline. IEEE Access,

10:107764–107784.

Boina, R., Achanta, A., and Mandvikar, S. (2023). Integrat-

ing data engineering with intelligent process automa-

tion for business efﬁciency. International Journal of

Science and Research (IJSR), 12(11):1736–1740.

Brauner, S., Murawski, M., and Bick, M. (2025). The devel-

opment of a competence framework for artiﬁcial intel-

ligence professionals using probabilistic topic mod-

elling. Journal of Enterprise Information Manage-

ment, 38(1):197–218.

Cuadrado-Gallego, J. J. and Demchenko, Y., editors (2020).

The Data Science Framework: A View from the EDI-

SON Project. Springer eBook Collection. Springer

International Publishing and Imprint Springer, Cham,

1st ed. 2020 edition.

Davenport, T. H., Mittal, N., and Saif, I. (2020). What sep-

arates analytical leaders from laggards? MIT Sloan

management review.

Demchenko, Y., Belloum, A., et al. (2022). Data Science

Competence Framework (CF-DS). EDISON Commu-

nity Initiative.

Demchenko, Y. and Jos

e, C. G. J. (2021). Edison data sci-

ence framework (edsf): addressing demand for data

science and analytics competences for the data driven

digital economy. In IEEE, editor, IEEE Global En-

gineering Education Conference (EDUCON), pages

1682–1687.

Elshawi, R., Maher, M., and Sakr, S. (2019). Automated

machine learning: State-of-the-art and open chal-

lenges.

European Commission (2017). Education for data intensive

science to open new science frontiers.

European Committee for Standardization (2014). European

e-Competence Framework.

Galhotra, S. and Khurana, U. (2022). Automated relational

data explanation using external semantic knowledge.

Proceedings of the VLDB Endowment, 15(12):3562–

3565.

okay, G. T., Nazlıel, K., S¸ener, U., G

okalp, E., G

okalp,

M. O., Genc¸al, N., Da

gdas¸, G., and Eren, P. E. (2023).

What drives success in data science projects: A taxon-

omy of antecedents. In Garc

ıa M

arquez, F. P., Jamil,

A., Eken, S., and Hameed, A. A., editors, Computa-

tional Intelligence, Data Analytics and Applications,

volume 643 of Lecture Notes in Networks and Sys-

tems, pages 448–462. Springer International Publish-

ing, Cham.

Haertel, C., Pohl, M., Nahhas, A., Staegemann, D., and Tur-

owski, K. (2022). Toward a lifecycle for data science:

A literature review of data science process models.

Paciﬁc Asia Conference on Information Systems 2022.

James, S. and Duncan, A. D. (2023). Over 100 data and an-

alytics predictions through 2028. Gartner Research,

pages 1–24.

Joshi, M. P., Su, N., Austin, R. D., and Sundaram, A. K.

(2021). Why so many data science projects fail to de-

liver. 62(3):84–90.

Kitchenham, B. (2004). Procedures for performing system-

atic reviews.

Kruhse-Lehtonen, U. and Hofmann, D. (2020). How to de-

ﬁne and execute your data and ai strategy. Harvard

Data Science Review.

Kuckartz, U. and R

adiker, S. (2022). Qualitative Inhalt-

sanalyse. Methoden, Praxis, Computerunterst

utzung:

Grundlagentexte Methoden. Grundlagentexte Meth-

oden. Beltz Juventa, Weinheim and Basel, 5. auﬂage

edition.

Kutzias, D., Dukino, C., and Kett, H. (2021). Towards a

continuous process model for data science projects. In

Leitner, C., Ganz, W., Satterﬁeld, D., and Bassano, C.,

editors, Advances in the Human Side of Service En-

gineering, volume 266 of Lecture Notes in Networks

and Systems, pages 204–210. Springer International

Publishing, Cham.

Lam, H. T., Buesser, B., Min, H., Minh, T. N., Wistuba,

M., Khurana, U., Bramble, G., Salonidis, T., Wang,

D., and Samulowitz, H. (2021). Automated data sci-

ence for relational data. In 2021 IEEE 37th Inter-

national Conference on Data Engineering (ICDE),

pages 2689–2692. IEEE.

Li, G., Yuan, C., Kamarthi, S., Moghaddam, M., and Jin, X.

(2021). Data science skills and domain knowledge re-

quirements in the manufacturing industry: A gap anal-

ysis. Journal of Manufacturing Systems, 60:692–706.

Lwakatare, L. E., R

ange, E., Crnkovic, I., and Bosch,

J. (2021). On the experiences of adopting auto-

mated data validation in an industrial machine learn-

ing project. In 2021 IEEE/ACM 43rd International

Conference on Software Engineering: Software En-

gineering in Practice (ICSE-SEIP), pages 248–257.

IEEE.

Macas, M., Lagla, L., Fuertes, W., Guerrero, G., and Toulk-

eridis, T. (2017). Data mining model in the discovery

of trends and patterns of intruder attacks on the data

network as a public-sector innovation. In Ter

an, L. and

Meier, A., editors, 2017 Fourth International Con-

ference on eDemocracy & eGovernment (ICEDEG),

pages 55–62. IEEE, Piscataway, NJ.

DATA 2025 - 14th International Conference on Data Science, Technology and Applications

498

Manyika, J., Chui, M., Miremadi, M., Bughin, J., George,

K., Willmott, P., and Dewhurst, M. (2017). A future

that works: Automation, employment, and productiv-

ity.

Martinez, I., Viles, E., and Olaizola, I. G. (2021). A sur-

vey study of success factors in data science projects.

1:2313–2318.

Mavrogiorgos, K., Mavrogiorgou, A., Kiourtis, A.,

Zafeiropoulos, N., Kleftakis, S., and Kyriazis, D.

(2022). Automated rule-based data cleaning using nlp.

In 2022 32nd Conference of Open Innovations Asso-

ciation (FRUCT), pages 162–168. IEEE.

Mikalef, P. and Krogstie, J. (2019). Investigating the

data science skill gap: An empirical analysis. In

2019 IEEE Global Engineering Education Confer-

ence (EDUCON), pages 1275–1284. IEEE.

Mota, JR., Joselito, Santana, R., and Machado, I. (2021).

Grumpy: an automated approach to simplify issue

data analysis for newcomers. In Brazilian Symposium

on Software Engineering, pages 33–38, New York,

NY, USA. ACM.

NewVantage Partners (2019). Big data and ai executive sur-

vey.

Nidhra, S., Yanamadala, M., Afzal, W., and Torkar, R.

(2013). Knowledge transfer challenges and miti-

gation strategies in global software development—

a systematic literature review and industrial valida-

tion. International Journal of Information Manage-

ment, 33(2):333–355.

Patel, H., Guttula, S., Gupta, N., Hans, S., Mittal, R. S.,

and N, L. (2023). A data-centric ai framework for

automating exploratory data analysis and data qual-

ity tasks. Journal of Data and Information Quality,

15(4):1–26.

Petricek, T., den van Burg, G. J. J., Naz

abal, A., Ceritli,

T., Jim

enez-Ruiz, E., and Williams, C. K. I. (2023).

Ai assistants: A framework for semi-automated data

wrangling. IEEE Transactions on Knowledge and

Data Engineering, 35(9):9295–9306.

Pitroda, H. (2022). A proposal of an interactive web ap-

plication tool quickviz: To automate exploratory data

analysis. In 2022 IEEE 7th International conference

for Convergence in Technology (I2CT), pages 1–8.

IEEE.

Potanin, M., Holtkemper, M., and Beecks, C. (2024). Ex-

ploring the integration of data science competencies

in modern automation frameworks: Insights for work-

force empowerment. In Intelligent Systems Confer-

ence, pages 232–240.

Ransbotham, S., Khodabandeh, S., Fehling, R., LaFountain,

B., and Kiron, D. (2019). Winning with ai. MIT Sloan

management review.

Reddy, R. C., Mishra, D., Goyal, D. P., and Rana,

N. P. (2024). A conceptual framework of barri-

ers to data science implementation: a practitioners’

guideline. Benchmarking: An International Journal,

31(10):3459–3496.

Robinson, E. and Nolis, J. (2020). Build a career in data

science. Manning, Shelter Island.

Salminen, K., Hautam

aki, P., and J

ahi, M. (2024). Aligning

industry needs and education: Unlocking the potential

of ai via skills. In 2024 Portland International Confer-

ence on Management of Engineering and Technology

(PICMET), pages 1–10. IEEE.

Saltz, J. S. (2021). Crisp-dm for data science: strengths,

weaknesses and potential next steps. In 2021 IEEE In-

ternational Conference on Big Data (Big Data), pages

2337–2344.

Santana, M. and D

ıaz-Fern

andez, M. (2023). Competen-

cies for the artiﬁcial intelligence age: visualisation of

the state of the art and future perspectives. Review of

Managerial Science, 17(6):1971–2004.

Schulz, M., Neuhaus, U., Kaufmann, J., K

uhnel, S.,

Alekozai, E. M., Rohde, H., Hoseini, S., Theuerkauf,

R., Badura, D., Kerzel, U., et al. (2022). Dasc-pm v1.

1 a process model for data science projects.

Shrivastava, S., Patel, D., Bhamidipaty, A., Gifford, W. M.,

Siegel, S. A., Ganapavarapu, V. S., and Kalagnanam,

J. R. (2019). Dqa: Scalable, automated and interactive

data quality advisor. In 2019 IEEE International Con-

ference on Big Data (Big Data), pages 2913–2922.

IEEE.

Thawanthaleunglit, N. and Sripanidkulchai, K. (2019).

Sweeper. In Proceedings of the 2019 3rd International

Conference on Software and e-Business, pages 17–23,

New York, NY, USA. ACM.

Varde, A. S. (2022). Computational estimation by scien-

tiﬁc data mining with classical methods to automate

learning strategies of scientists. ACM Transactions on

Knowledge Discovery from Data, 16(5):1–52.

VentureBeat (2019). Why do 87% of data science projects

never make it into production?

Wang, D., Andres, J., Weisz, J. D., Oduor, E., and Dugan,

C. (2021). Autods: Towards human-centered automa-

tion of data science. In Kitamura, Y., Quigley, A.,

Isbister, K., Igarashi, T., Bjørn, P., and Drucker, S.,

editors, Proceedings of the 2021 CHI Conference on

Human Factors in Computing Systems, pages 1–12,

New York, NY, USA. ACM.

Wang, D., Weisz, J. D., Muller, M., Ram, P., Geyer, W.,

Dugan, C., Tausczik, Y., Samulowitz, H., and Gray,

A. (2019). Human-ai collaboration in data science:

Exploring data scientists’ perceptions of automated ai.

Proceedings of the ACM on Human-Computer Inter-

action, 3(CSCW):1–24.

Webster, J. and Watson, R. T. (2002). Analyzing the past

to prepare for the future: Writing a literature review.

MIS Quarterly, (26):13–23.

Bridging Competency Gaps in Data Science: Evaluating the Role of Automation Frameworks Across the DASC-PM Lifecycle

499