Healthcare Bias in AI: A Systematic Literature Review

Andrada-Mihaela-Nicoleta Moldovan

1 a

, Andreea Vescan

1 b

and Crina Grosan

2 c

Babes¸-Bolyai University, Faculty of Mathematics and Computer Science, Computer Science Department, Cluj-Napoca,

Romania

Applied Technologies for Clinical Care, Kings College London, London, U.K.

Keywords:

Systematic Literature Review, Algorithmic Bias, Fairness, Health Disparities.

Abstract:

The adoption of Artiﬁcial Intelligence (AI) in healthcare is transforming the ﬁeld by enhancing patient care,

advancing diagnostic precision, and optimizing clinical ﬂows. Despite its promise, algorithmic bias remains a

pressing challenge, raising critical concerns about fairness, equity, and the reliability of AI systems in diverse

healthcare settings. This Systematic Literature Review (SLR) investigates how bias manifests across the AI

lifecycle—spanning data collection, model training, and real-world application and examines its implications

for healthcare outcomes. By rigorously analyzing peer-reviewed studies based on inclusion and exclusion

criteria, this review identiﬁes the populations most impacted by bias and explores the diversity of existing mit-

igation strategies, fairness metrics, and ethical frameworks. Our ﬁndings reveal persistent gaps in addressing

health inequities and underscore the need for targeted interventions to ensure AI systems serve as tools for

equitable and ethical care. This work aims to guide future research and inform policy development, in order

to prioritize both technological progress and social responsibility in healthcare AI.

1 INTRODUCTION

Artiﬁcial Intelligence (AI) is transforming health-

care by improving diagnostic accuracy, personaliz-

ing treatment, and optimizing patient outcomes. Bias

in medical environments is deﬁned by Panch et al.

(Panch et al., 2019) as “the instances when the appli-

cation of an algorithm compounds existing inequities

in socioeconomic status, race, ethnic background, re-

ligion, gender, disability or sexual orientation to am-

plify them and adversely impact inequities in health

systems”. Although it does not always happen, bias

usually can lead to discrimination.

In recent years, the academic community has

given the problem of algorithmic bias in Machine

Learning (ML) systems (van Assen et al., 2024) used

in the healthcare industry a lot of attention. Given

their signiﬁcant effects on healthcare outcomes and

decision-making processes, researchers have directed

more of their attention to comprehending and resolv-

ing possible biases that may exist in these systems.

In an Action Plan published in January 2021

(Clark et al., 2023), the Food and Drug Administra-

https://orcid.org/0009-0006-6289-9869

https://orcid.org/0000-0002-9049-5726

https://orcid.org/0000-0003-1049-2136

tion (FDA) highlighted the importance of detecting

and mitigating bias in machine learning-based medi-

cal systems. The WHO (World Health Organization)

Guidance on Ethics and Governance of AI for Health

(Guidance, 2021) also acknowledges the possibility

of prejudice being ingrained in AI technology. In Oc-

tober 2023, WHO adopted a plan of guiding princi-

ples which are: autonomy, safety, transparency, re-

sponsibility, equity, and sustainability (Bouderhem,

2024).

Burema et al. (Burema et al., 2023) have compiled

a collection of situations to highlight the prevalence

of ethical issues in medical systems. Some incidents

that occurred from algorithmic bias and discrimina-

tion within the healthcare sector include a case that in-

volved the distribution of care work (i.e., the number

of hours a caregiver should spend with their patient)

(Lecher, 2020), and another that reffers to the distri-

bution of Covid-19 vaccinations (Wiggers, 2020a).

Researchers raise awareness of racial algorithmic

bias in medical settings (Ledford, 2019), (Jain et al.,

2023). Black patients were shown to have inferior

diagnostic accuracy when using neural network algo-

rithms trained to categorize skin lesions (Kamulegeya

et al., 2023). It has also been discovered that health

sensors exhibit racial bias (Sjoding et al., 2020).

Covid-19 prediction models with ﬂaws (Thomp-

Moldovan, A.-M.-N., Vescan, A. and Grosan, C.

Healthcare Bias in AI: A Systematic Literature Review.

DOI: 10.5220/0013480300003928

In Proceedings of the 20th International Conference on Evaluation of Novel Approaches to Software Engineering (ENASE 2025), pages 835-842

ISBN: 978-989-758-742-9; ISSN: 2184-4895

835

son, 2020), as well as electronic symptom checks

(Fraser et al., 2018) reveal problems with the AI ac-

curacy. When assessing kidney function, prejudice or

discrimination against speciﬁc populations was dis-

covered in AI (Simonite, 2020). Even Google’s lack

of openness in their AI for breast cancer prediction

was questioned by experts (Wiggers, 2020b).

We conducted a Systematic Literature Review

(SLR) in accordance with the guidelines proposed by

Kitchenham and Charters (Kitchenham and Charters,

2007). The SLR assesses and synthesizes the state-

of-the-art concerning healthcare bias in Artiﬁcial In-

telligence (AI) systems. The focus is on existing lit-

erature regarding various aspects related to healthcare

bias, speciﬁcally concerning algorithm bias.

The contributions in this paper are the following:

• Conducting an SLR regarding healthcare bias in

AI systems that included a set of 97 articles from

six database publication sources;

• Providing answers to ﬁve research questions re-

garding algorithm bias in AI systems, consider-

ing various aspects such as: algorithmic bias cat-

egories, sources of bias in healthcare algorithms,

generated risks due to lack of algorithmic fairness,

bias mitigation strategies, and metrics for identiﬁ-

cation of fairness in healthcare algorithms.

• Gaps, challenges, and open issues are discussed,

with proposed opportunities/recommendations.

The remainder of this paper is structured as fol-

lows: Section 2 covers related work, Section 3 out-

lines the methodology, Section 4 details the review

process, Section 5 presents the results, Section 6 dis-

cusses the ﬁndings, Section 7 addresses threats to va-

lidity, and Section 8 concludes the paper.

2 RELATED WORK

Yfantidou et al. (Yfantidou et al., 2023) presents a

comprehensive study on bias in personal informatics,

which includes the identiﬁcation of bias types and the

proposal of guidelines for mitigating discrimination.

However, their research is speciﬁcally centered on

personal informatics, focusing exclusively on systems

implemented in devices (watches, wearables) rather

than addressing bias within the healthcare domain.

While existing surveys and reviews (Mienye et al.,

2024), (Kumar et al., 2024) deﬁne bias types and

sources, propose frameworks, or offer critical per-

spectives, our SLR takes a complementary approach.

Instead of focusing on ﬁxed statements about bias

types and sources, we aim to consolidate diverse per-

spectives from the literature, drawing parallels and

highlighting variations in how bias is understood. Our

goal is not to establish conclusions, but to present a

comprehensive overview of existing knowledge, serv-

ing as a resource for researchers in AI healthcare.

There are qualitative surveys and systematic re-

views on AI ethics in healthcare (Singh et al., 2023),

(Williamson and Prybutok, 2024), however, our ap-

proach focuses strictly on algorithmic fairness.

3 METHODOLOGY

This section contains the details of the performed re-

search, research questions, and protocol deﬁnition.

3.1 Review Need Identiﬁcation

The literature on AI bias in healthcare is divided be-

tween technical studies on algorithm improvement

and social science research on ethical and cultural im-

plications, however, a combined approach is needed.

Addressing both technical and societal biases in AI

will lead to more effective and equitable healthcare

solutions (Belenguer, 2022).

3.2 Research Questions Deﬁnition

We outline the research questions in our investigation:

RQ1: Which are the main algorithmic bias cate-

gories in healthcare?

RQ2: What are the most common sources of bias

in healthcare algorithms?

RQ3: What risks are generated by the lack of

algorithmic fairness in healthcare and who is most

likely to be affected?

RQ4: Which bias mitigation strategies are pro-

posed to reduce algorithmic disparities in AI for

healthcare?

RQ5: How is bias assessed and which metrics

have been proposed for identifying fairness in health-

care algorithms?

3.3 Protocol Deﬁnition

The steps of the conducted SLR protocol are the fol-

lowing: the search and selection process with the in-

cluded and excluded criteria; the data extraction strat-

egy; the synthesis of the extracted data with the re-

sponses to the research questions; and the discussion

of the results, identiﬁed gaps and open issues, oppor-

tunities and recommendations. The next sections pro-

vide detailed descriptions of each step of the protocol.

ENASE 2025 - 20th International Conference on Evaluation of Novel Approaches to Software Engineering

836

4 CONDUCTING THE SLR

The SLR related activities are provided next, includ-

ing the selection process that contains the database

search and the speciﬁcation of the selection criteria,

followed by the data extraction.

4.1 Search and Selection Process

For our literature review, we followed the PRISMA

2020 statement (Page et al., 2021), which ensures a

transparent, unbiased, and reproducible process. Fig-

ure 1 shows the ﬁltering stages, resulting in the selec-

tion of 97 papers from the databases.

Figure 1: Diagram of the Scoping Review Flow in accor-

dance with the PRISMA 2020 declaration (Haddaway et al.,

2022).

4.1.1 Database Search

To ensure the inclusion of the most pertinent re-

search on the topic (Kitchenham and Charters, 2007),

a manual search was conducted across six prominent

databases: ACM Digital Library, IEEE Xplore, Sci-

enceDirect, Scopus, Springer, and Web of Science

(WOS)

. Three researchers were assigned databases

for the October 2024 search using ”AI algorithmic

bias in healthcare”, applying ﬁlters for publication

ACM: https://dl.acm.org, IEEE Xplore:

https://ieeexplore.ieee.org, ScienceDirect: https://www.

sciencedirect.com, Scopus: https://www.scopus.com,

Springer: https://link.springer.com, WOS: https:

//www.webofscience.com

year or domain to manage excessive results and sort-

ing by relevance.

4.1.2 Merging, and Duplicates and Impurity

Removal

Upon completing the search of each database, we uti-

lized Zotero software

to extract the BibTeX ﬁles

containing the citations of all retrieved papers as the

ﬁrst ﬁltering stage. These ﬁles were subsequently

parsed for the automatic removal of duplicate entries.

4.1.3 Application of the Selection Criteria

The main objective was to identify a selection of ap-

proximately 100 papers that provided the most sub-

stantial insights into the topic of AI algorithmic bias

in healthcare. These papers were assessed on whether

they include answers to the research questions and on

inclusion and exclusion criteria (Meline, 2006).

The inclusion and exclusion criteria that guided

the evaluation of each article to determine its applica-

bility to the research topics.

4.2 Data Extraction

The review protocol produced the initial ﬁndings (Ta-

ble 1). After retrieving articles, removing duplicates,

and excluding papers based on relevance and lack of

access, 235 articles were re-evaluated for eligibility in

a detailed third review and 97 papers were identiﬁed

as relevant to AI algorithmic bias in healthcare.

Table 1: Table showing the search and review statistics

across various sources.

Source DB

Sought

for re-

trieval

Assessed

for eli-

gibility

Selected

ACM

Digital-

Library

120 91 15 10

IEEE 73 93 93 14

Science

Direct

100 13 13 8

Springer 333 212 90 42

Scopus 159 25 11 9

WOS 125 19 13 14

Total 910 453 235 97

To ﬁlter out papers on unrelated topics, we man-

ually assessed a set of keywords before identifying

the main ideas, using keyword-based initial screen-

ing (Kitchenham and Charters, 2007). Our selection

Zotero: https://www.zotero.org

Healthcare Bias in AI: A Systematic Literature Review

837

of terms included AI, ML, healthcare, bias, fairness,

correctness, metrics, discrimination, clinic or medi-

cal. We recorded occurrences for each term found in

a document, and documented the main ideas of the

papers that scored positively on most terms.

Figure 2 shows the ﬁnal selection of papers, with

a six-fold increase in publications from 2022 to 2024.

Figure 2: Final database selection: distribution by year of

publication.

A replication package is available containing the

list of all the selected articles from this study (Au-

thor(s), 2025). Further, in this paper, we will refer to

the selected papers with S1 to S97.

5 RESULTS

The following section presents the ﬁndings for the

key RQs, addressing the main categories of algorith-

mic bias in healthcare (RQ1), their sources (RQ2), the

risks stemming from unfair algorithms (RQ3), mitiga-

tion strategies to reduce disparities (RQ4), and met-

rics to assess fairness (RQ5). We provide examples

from the selected literature for all answers.

5.1 RQ1: Main Algorithmic Bias

Categories in Healtcare

Seven distinct types of algorithmic bias were identi-

ﬁed in medical settings (S6): historical bias, which

mirrors existing societal prejudices against speciﬁc

groups (S15, S16, S79); representation bias, arising

from sampling methods that under-represent certain

population segments (S57, S64, S66, S74, S75); mea-

surement bias, resulting from poorly ﬁtted ML mod-

els (S78); aggregation bias, occurring when univer-

sal models fail to account for variations among sub-

groups; learning bias, where modeling decisions ex-

acerbate disparities in performance; evaluation bias,

which emerges when benchmark datasets do not ac-

curately reﬂect the intended target population; and

deployment biases (S25) when a model is applied in

ways that diverge from its intended purpose. Many

researchers agree on similar types of bias (S27, S36,

S48, S49, S83) but use varying terms: ’representa-

tion bias’ is also called ’sample bias’ or ’selection

bias’; ’aggregation bias’ overlaps with ’linking bias’;

and ’deployment bias’ is sometimes termed ’feedback

bias’ (S23).

Investigating the selected publications, we ex-

tracted information regarding only mention of the bias

and/or containing in depth investigation regarding al-

gorithmic bias. The algorithmic biases that are most

mentioned in the articles are those related to historical

bias and representation/sampling/selection bias, the

same being also in the case of in-depth investigation

as seen in Figure 3. When considering what database

publications investigated in-depth the algorithmic bi-

ases, the data indicated that Springer and ACM have

the largest number of such papers.

Figure 3: In-depth study investigation of algorithmic biases

in database publications.

Other studies (S46, S82) categorize bias into: ge-

netic variations, intra-observer labeling variability,

data acquisition processes. When classiﬁed by cause,

similar issues are stated as prevalence-, presentation-,

and annotation-sourced disparities (S70).

Keeling (S93) outlines how typical clinical ML

models are limited and task-speciﬁc, and that their bi-

ases are frequently caused by the under-representation

of certain demographics in training data. On the other

hand, whereas generalist models are wide-ranging,

they also incorporate more intricate biases like stereo-

type associations. Ustymenko and Phadke (S3) intro-

duce some guidelines that aim to be useful in a future

framework addressing bias in LLMs for healthcare.

5.2 RQ2: Sources of Bias in Healthcare

Algorithms

Several sources of bias are identiﬁed in the inves-

tigated approaches, Figure 4 depicts the main ones

from label bias and target speciﬁcation bias to cogni-

tive biases in decision-making and domain shifts dur-

ENASE 2025 - 20th International Conference on Evaluation of Novel Approaches to Software Engineering

838

ing clinical use. Speciﬁc biases are related to data:

data acquisition and health disparities. For each of

such biases, several solutions were proposed from

multitask learning (S67, S76) and feature-swapping

augmentation to algorithmic fairness audit frame-

works (S80). In the next paragraphs, we provide de-

tails about the sources of biases and some solutions.

Figure 4: Sources of bias in healthcare algorithms.

The types and sources of bias are highly related.

Sources of bias in Clinical Decision Support Sys-

tems (CDSS) were identiﬁed as cognitive biases in

decision-making, in domain transitions during clini-

cal usage, as health inequities, and during data gath-

ering (S33).

According to Mhasawade et al. (S2), label bias in

healthcare algorithms occurs when proxy labels, used

instead of actual labels, vary in their connection to

true health status across subgroups. Another simi-

lar approach employs multitask learning for the de-

tection of shortcut learning (S76). Similarly, QP-Net

uses multitask learning alongside a domain adaptation

module (S67), aligning feature distributions across

subgroups to improve less-frequent subgroup ﬁtting.

Questioning the bias in benchmarks used to evalu-

ate model accuracy is crucial. The authors (S1) argue

that label-matching accuracy may bias foreign data

and should be avoided in high-stakes medical con-

texts.

The debiased Survival Prediction solution pro-

posed by Zhong et.al. (S18) reduces the effect of

biases on model performance by isolating the direct

source of bias. By detaching identiﬁcation informa-

tion, researchers have proven that disentanglement

learning produces more equitable survival estimates

free from population biases (S18). This approach was

also empirically assessed in combination with feder-

ated learning for healthcare equity (S46, S50, S69).

5.3 RQ3: Generated Risks by the Lack

of Algorithmic Fairness in

Healthcare

The lack of algorithmic fairness generates several

risks from inaccurate diagnoses and harmful stereo-

types to the erosion of trust in the healthcare systems.

In Figure 5 several elements as source of risks are

identiﬁed: inequities, demographic diversity, gender

bias, and lack of conﬁdence in the healthcare system.

Figure 5: Risks generated by the lack of algorithmic fair-

ness in healthcare.

Children and young people (CYP) under 18 are

particularly underrepresented in research (S42). A

study on Covid-19 diagnosis data (S32) revealed sig-

niﬁcant age disparities, with only 10% of data rep-

resenting individuals under 20, while 50% pertained

to those over 70. Another study on diabetic read-

missions (S12) found that Naive Bayes performed

well for patients under 40, but scored weak non-

discrimination metrics for younger populations. The

ACCEPT-AI framework highlights key ethical con-

siderations for using pediatric data in AI/ML research

(S72).

Healthcare AI systems can inadvertently sustain

xenophobic biases, worsening health inequalities for

migrants and ethnic minorities (S85). For example,

using proxies for sensitive traits can intensify ex-

clusion and link foreignness with disease (S76). A

case study (S38) on a skin color detection algorithm

showed a 16% error in detecting skin tone and a 4%

error in recognizing white faces.

Literature on AI fairness in healthcare highlights

how real-world gender biases are reinforced (S60)

(S42), (S9). Women have been historically underrep-

resented in health data, limiting AI’s impact on their

healthcare (S43). A skin color detection case study

(S38) found a 6% error in recognizing women over

men, while Cardiovascular AI research (S23) shows

Healthcare Bias in AI: A Systematic Literature Review

839

women receive less care.

5.4 RQ4: Mitigation Strategies to

Reduce Algorithmic Disparities in

AI for Healthcare

The most frequently proposed mitigation solutions

and recommendations from the selected papers are

displayed in Figure 6.

Figure 6: Distribution of mitigation strategies.

Ethical principles. Distinct approaches to

fair AI include borrowing ethical principles from

ﬁelds where they are successfully applied. Amu-

gongo et al. (S10) suggest incorporating Ubuntu

ethics—principles. Belenguer (S83) advocates apply-

ing ethics systems from the pharmaceutical industry,

while Younas and Zeng (S95) propose using Central

Asian ethics. Creating a perfect plan to address all

ethical issues is unlikely, and debates continue on bal-

ancing algorithm transparency with data protection,

as well as on certifying developers who meet ethical

standards (S53, S55).

Frameworks. The most encountered bias mitiga-

tion solution stands in the form of a framework (S14,

S24, S26, S47, S48, S46, S38, S4, S21, S52, S54, S49,

72, S81, S91). Iabkoff et al. (S52) introduced one of

the most comprehensive frameworks.

Transparency. Explainable AI (XAI) refers to AI

systems that aim to provide human-understandable

justiﬁcations for their decisions and predictions, en-

hancing transparency (S35, S20, S46, S80, S96, S28).

Procedural justice emphasizes the importance of the

decision-making plan (S35), while Gerdes (S96) and

Kumar (S41) argue that AI models must be explain-

able in healthcare for better understanding. Legal

constraints may enforce transparency (S84), and open

data for XAI enhances accessibility and clarity, aid-

ing informed decisions (S20). Youseﬁ (S8) suggests

sharing data for public interest. The TWIX tech-

nique (S73) mitigates bias in surgeon skill assess-

ments. Ziosi et al. (S80) evaluate XAI methods based

on their ability to address fairness and transparency

(S30).

Data handling. Although it is common practice to

generate synthetic datasets (S40, S58, S77, S32) for

drawing parallels between situations with balanced

and unbalanced data (S29), or to generate datasets

that are as diverse as possible, approaches for fair-

ness include real-data datasets, or a middle solution

that modiﬁes existing data (S51, S65, S77). A pro-

posal is to exclude demographic data (S22), as models

achieve “locally optimal” fairness, but struggle in out-

of-distribution (OOD) settings, suggesting less demo-

graphic encoding promotes “globally optimal” fair-

ness (S56).

Human-in-the-Loop (HITL). The idea of HITL

is essential for challenging the function of human

knowledge and the interaction between algorithms

and people. A human-guided approach guarantees

that developments in technology are a reaction to real

clinical requests (S33, S34, S97). Iniesta (S71) in-

troduces a framework that emphasizes the HITL ap-

proach by several principles like accountability or pa-

tient education.

5.5 RQ5: Bias Assessment and Metrics

to Identify Fairness in Healthcare

Algorithms

Audit frameworks (S59, S17, S77, S83, S94, S71) aim

at adapting the development stage to be bias-aware

(S26), disparities to be acknowledged from data col-

lection to decision design (S46) through bias identi-

ﬁcation solutions (S38) for assessing datasets against

racial, age, or gender bias. Moghadasi et al. (S21) ar-

gue that identifying bias sources is key to mitigation.

In order to guarantee a more thorough, customized,

and impartial perspective, users should additionally

employ a variety of bias measures (S13, S31, S19,

S29).

Mienye et al. (S19) reviewed fairness metrics

(S62) and categorized them into six types: group fair-

ness, individual fairness, equality of opportunity, de-

mographic parity, equalized odds, and counterfac-

tual fairness, each with its own mathematical formula

(S61, S69, S32, S29, S56, S61).

A framework proposed for improved and ethical

patient outcomes (S17) implements fairness measures

to evaluate the impact of AI algorithms on different

demographic groups using the disparate impact met-

ric for assessing fairness (S69, S6, S29, S44, S56).

Bowtie analysis is used to assess risks and ensure

safety (S32). Similarly, Forward-Backward analysis

ENASE 2025 - 20th International Conference on Evaluation of Novel Approaches to Software Engineering

840

(S43) splits the process into forward (examining con-

sequences) and backward (tracing causes) steps.

6 FINDINGS AND DISCUSSIONS

This section outlines the ﬁndings of this investigation,

providing several perspectives regarding gaps, chal-

lenges, and open issues. Opportunities and recom-

mendations are detailed at the end of the section.

6.1 Discussions of Results

Although bias can also arise during data acquisition

by competent institutions or results interpretation by

clinicians, most frameworks (S14, S4, S26) and ap-

proaches (S20, S22, S37, S81, S83) assign the task

of working on bias mitigation to researchers and de-

velopers. Very few of them are addressed to medical

personnel (S48, S5). Some scholars and practitioners

posit that governance frameworks may serve as a vi-

able solution (S42, S90, S42). Combinations of legal

and audit frameworks are proposed as potential strate-

gies (S84), (S83).

Multiple studies consider fairness and even pro-

pose approaches for speciﬁc tasks or branches of

medicine (S7), such as ICU readmission (S11), dia-

betes (S12) (S92), lung ultrasound (S14), pulmonary

embolism (S18), critical care (S33), liver allocation

(S35, S87), cardiac sarcoidosis (S21), pain detection

(S45), medical imaging (radiology, dermatology, oph-

thalmology) (S56), S70), virology (S88) (with empha-

sis on Covid-19 (S61) (S82), oncology (S63) (S89),

cardiac imaging (S23), orthopedics (S65), thyroid ul-

trasound (S67), antibiotic prescription (S68), pedi-

atrics (S72), surgeon skill assessment (S73), health-

care time series (S77), neuroscience (S86), point-of-

care diagnostics (S39). Tandon et al. (S49) aim to pro-

vide help for developers to decide which AI-enabled

strategies to use for certain designs in the medical

ﬁeld based on context-speciﬁc criteria.

6.2 Gaps, Challenges, Open Issues

Several gaps have been identiﬁed based on the inves-

tigated papers:

• Underrepresentation of minority groups in medi-

cal datasets.

• Many AI models are “black boxes”, making it

hard to identify and mitigate biases.

• Limited testing of AI systems on varied demo-

graphic and geographic populations.

• Insufﬁcient collaboration between technologists,

clinicians, and patients.

• AI systems can perpetuate and magnify existing

systemic biases.

6.3 Opportunities and

Recommendations

Several Opportunities to address healthcare bias in

AI are identiﬁed and provided next: Enhanced data

diversity by incorporating diverse datasets, Person-

alized medicine advancements by having customiz-

able treatments, Collaborative multidisciplinary re-

search through multiple experts, Development of fair-

ness metrics that are tailored for healthcare applica-

tions, and Policy and regulation innovation.

The following Recommentations are provided to

address healthcare bias in AI: regular bias audits, di-

versify training datasets, stakeholder engagement, im-

plement eXplainable AI (XAI), and adopt fairness-

centric regulations.

7 THREATS TO VALIDITY

Despite being thorough, this literature study has a

limitation that should be noted, namely the speed at

which AI is developing, which may have resulted in

new discoveries after the research under considera-

tion. Publication bias may have skewed the ﬁndings,

as studies with positive results are more likely to be

published, while null or negative ﬁndings often go

unreported. To mitigate this, we aimed to include a

diverse range of studies and explicitly sought those

discussing limitations or failures.

8 CONCLUSIONS

Numerous ethical, political, and economic factors

have inﬂuenced the application of AI in healthcare,

improving the technology’s fairness in this area. By

taking these aspects into consideration, a suitable de-

gree of control over AI’s detrimental effects on the

healthcare industry has been demonstrated. Still,

many of the statements made in this study encourage

academics to identify additional useful implications

for practitioners and policymakers about the reliabil-

ity of AI applications in healthcare, which will en-

hance the literature in the future.

Healthcare Bias in AI: A Systematic Literature Review

841

ACKNOWLEDGEMENTS

The publication of this article was partially supported

by the 2024 Development Fund of the Babes-Bolyai

University.

REFERENCES

Author(s), A. (accessed January 2025). Healthcare

bias in ai: A systematic literature review. =

https://ﬁgshare.com/s/cc41c628a51a442181fd.

Belenguer, L. (2022). Ai bias: exploring discriminatory

algorithmic decision-making models and the appli-

cation of possible machine-centric solutions adapted

from the pharmaceutical industry. AI and Ethics,

2(4):771–787.

Bouderhem, R. (2024). Shaping the future of ai in health-

care through ethics and governance. Humanities and

Social Sciences Communications, 11(1):1–12.

Burema, D., Debowski-Weimann, N., von Janowski,

A., Grabowski, J., Maftei, M., Jacobs, M., Van

Der Smagt, P., and Benbouzid, D. (2023). A sector-

based approach to ai ethics: Understanding ethical is-

sues of ai-related incidents within their sectoral con-

text. In Proceedings of the 2023 AAAI/ACM Confer-

ence on AI, Ethics, and Society, pages 705–714.

Clark, P., Kim, J., and Aphinyanaphongs, Y. (2023). Mar-

keting and us food and drug administration clear-

ance of artiﬁcial intelligence and machine learning en-

abled software in and as medical devices: a system-

atic review. JAMA Network Open, 6(7):e2321792–

e2321792.

Fraser, H., Coiera, E., and Wong, D. (2018). Safety of

patient-facing digital symptom checkers. The Lancet,

392(10161):2263–2264.

Guidance, W. (2021). Ethics and governance of artiﬁcial

intelligence for health. World Health Organization.

Haddaway, N. R., Page, M. J., Pritchard, C. C., and

McGuinness, L. A. (2022). Prisma2020: An r package

and shiny app for producing prisma 2020-compliant

ﬂow diagrams, with interactivity for optimised digital

transparency and open synthesis. Campbell systematic

reviews, 18(2):e1230.

Jain, A., Brooks, J. R., Alford, C. C., Chang, C. S., Mueller,

N. M., Umscheid, C. A., and Bierman, A. S. (2023).

Awareness of racial and ethnic bias and potential so-

lutions to address bias with use of health care algo-

rithms. In JAMA Health Forum, volume 4, pages

e231197–e231197. American Medical Association.

Kamulegeya, L., Bwanika, J., Okello, M., Rusoke, D., Nas-

siwa, F., Lubega, W., Musinguzi, D., and B

orve, A.

(2023). Using artiﬁcial intelligence on dermatology

conditions in uganda: A case for diversity in training

data sets for machine learning. African Health Sci-

ences, 23(2):753–63.

Kitchenham, B. and Charters, S. (2007). Guidelines for per-

forming systematic literature reviews in software en-

gineering. 2.

Kumar, A., Aelgani, V., Vohra, R., Gupta, S. K., Bhagawati,

M., Paul, S., Saba, L., Suri, N., Khanna, N. N., Laird,

J. R., et al. (2024). Artiﬁcial intelligence bias in med-

ical system designs: A systematic review. Multimedia

Tools and Applications, 83(6):18005–18057.

Lecher, C. (2020). Can a robot decide my medical treat-

ment?

Ledford, H. (2019). Millions affected by racial bias in

health-care algorithm. Nature, 574(31):2.

Meline, T. (2006). Selecting studies for systemic re-

view: Inclusion and exclusion criteria. Contempo-

rary issues in communication science and disorders,

33(Spring):21–27.

Mienye, I. D., Obaido, G., Emmanuel, I. D., and Ajani,

A. A. (2024). A survey of bias and fairness in health-

care ai. In 2024 IEEE 12th International Confer-

ence on Healthcare Informatics (ICHI), pages 642–

650. IEEE.

Page, M. J., McKenzie, J. E., Bossuyt, P. M., Boutron, I.,

Hoffmann, T. C., Mulrow, C. D., Shamseer, L., Tet-

zlaff, J. M., Akl, E. A., Brennan, S. E., et al. (2021).

The prisma 2020 statement: an updated guideline for

reporting systematic reviews. bmj, 372.

Panch, T., Mattie, H., and Atun, R. (2019). Artiﬁcial intel-

ligence and algorithmic bias: implications for health

systems. Journal of global health, 9(2).

Simonite, T. (2020). How an algorithm blocked kidney

transplants to black patients— wired.

Singh, B., Rahim, M. A. B. U., Hussain, S., Rizwan, M. A.,

and Zhao, J. (2023). Ai ethics in healthcare-a sur-

vey. In 2023 IEEE 23rd International Conference on

Software Quality, Reliability, and Security Compan-

ion (QRS-C), pages 826–833. IEEE.

Sjoding, M. W., Dickson, R. P., Iwashyna, T. J., Gay, S. E.,

and Valley, T. S. (2020). Racial bias in pulse oxime-

try measurement. New England Journal of Medicine,

383(25):2477–2478.

Thompson, A. (2020). Coronavirus: Models pre-

dicting patient outcomes may be “ﬂawed” and

“based on weak evidence. Yahoo! Retrieved from

https://sg.style.yahoo.com/style/coronavirus-covid19-

models-patientoutcomes-ﬂawed-153227342.html.

van Assen, M., Beecy, A., Gershon, G., Newsome, J.,

Trivedi, H., and Gichoya, J. (2024). Implications

of bias in artiﬁcial intelligence: considerations for

cardiovascular imaging. Current Atherosclerosis Re-

ports, 26(4):91–102.

Wiggers, K. (2020a). Covid-19 vaccine distribution algo-

rithms may cement health care inequalities.

Wiggers, K. (2020b). Google’s breast cancer-predicting ai

research is useless without transparency, critics say.

Williamson, S. M. and Prybutok, V. (2024). Balanc-

ing privacy and progress: a review of privacy chal-

lenges, systemic oversight, and patient perceptions in

ai-driven healthcare. Applied Sciences, 14(2):675.

Yfantidou, S., Sermpezis, P., Vakali, A., and Baeza-Yates,

R. (2023). Uncovering bias in personal informatics.

Proceedings of the ACM on Interactive, Mobile, Wear-

able and Ubiquitous Technologies, 7(3):1–30.

ENASE 2025 - 20th International Conference on Evaluation of Novel Approaches to Software Engineering

842