Fair and Equitable Machine Learning Algorithms in Healthcare: A

Systematic Mapping

Marcelo S. Mattos

, Sean W. M. Siqueira

and Ana Cristina B. Garcia

Graduate Program of Informatics - PPGI, Federal University of the State of Rio de Janeiro, Rio de Janeiro, Brazil

Keywords:

Fairness, Equity, Bias, Machine Learning, Healthcare.

Abstract:

Artiﬁcial intelligence (AI) is being employed in many ﬁelds, including healthcare. While AI has the potential

to improve people’s lives, it also raises ethical questions about fairness and bias. This article reviews the

challenges and proposed solutions for promoting fairness in medical decisions aided by AI algorithms. A

systematic mapping study was conducted, analyzing 37 articles on fairness in machine learning in healthcare

from ﬁve sources: ACM Digital Library, IEEE Xplore, PubMed, ScienceDirect, and Scopus. The analysis

reveals a growing interest in the ﬁeld, with many recent publications. The study offers an up-to-date and

comprehensive overview of approaches and limitations for evaluating and mitigating biases, unfairness, and

discrimination in healthcare-focused machine learning algorithms. This study’s ﬁndings provide valuable

insights for developing fairer, equitable, and more ethical AI systems for healthcare.

1 INTRODUCTION

Ensuring fairness in healthcare is crucial for equi-

table access and quality care. Artiﬁcial Intelligence

(AI) promises advancements in healthcare decision-

making, but raises critical ethical concerns around

fairness and bias.

Unfair algorithms can lead to disparities in access,

quality, and health outcomes. For example, Ober-

meyer et al. (2019) found a widely used hospital read-

mission algorithm biased against black patients due to

historical healthcare disparities.

AI ethics addresses the ethical implications of

developing and deploying AI systems, drawing on

ﬁelds like engineering ethics, philosophy of technol-

ogy, and science and technology studies (Kazim and

Koshiyama, 2021). Fairness is a key ethical principle,

meaning AI systems should treat everyone equally,

regardless of personal characteristics (Ashok et al.,

2022). However, achieving fairness in AI can be chal-

lenging, as AI systems are often trained on data that

reﬂects historical biases.

This systematic mapping of the literature delves

into the intricate landscape of AI fairness in the con-

text of medical decision-making. By conducting a

https://orcid.org/0009-0006-9830-6391

https://orcid.org/0000-0002-0864-2396

https://orcid.org/0000-0002-3797-5157

comprehensive analysis, we aim to shed light on the

challenges posed by algorithmic biases and the pro-

posed solutions that can pave the way for a more eq-

uitable healthcare system. We analyzed 37 scholarly

articles sourced from reputable databases, including

the ACM Digital Library, IEEE Xplore, PubMed, Sci-

enceDirect, and Scopus.

Our analysis reveals a growing interest in AI fair-

ness in healthcare, with a recent surge in publications.

This study offers a comprehensive overview of ap-

proaches and limitations for assessing and mitigating

biases, unfairness, and discrimination in healthcare

machine learning algorithms.

Research questions (RQs):

RQ1: What are the main statistical, technical, and

ethical approaches used to assess and mitigate biases,

unfairness, inequalities, and discriminations in ma-

chine learning algorithms applied to healthcare?

RQ2: What are the technical, ethical, and so-

cial limitations and challenges in the design, develop-

ment, and implementation of fair and equitable ma-

chine learning algorithms in healthcare?

RQ3: What are the research gaps pointed out in

the articles on fairness and equity in machine learning

in healthcare?

This systematic mapping literature review aims to

contribute to the study of AI ethics, speciﬁcally focus-

ing on fairness in machine learning within the health-

care ﬁeld.

Mattos, M., Siqueira, S. and Garcia, A.

Fair and Equitable Machine Learning Algorithms in Healthcare: A Systematic Mapping.

DOI: 10.5220/0012394700003636

Paper published under CC license (CC BY-NC-ND 4.0)

In Proceedings of the 16th International Conference on Agents and Artiﬁcial Intelligence (ICAART 2024) - Volume 3, pages 815-822

ISBN: 978-989-758-680-4; ISSN: 2184-433X

815

2 RELATED WORKS

Literature review publications have been providing

insights into the state of the art about fairness in the

ﬁeld of ethical AI. Pessach and Shmueli (2022) con-

ducted a review on fairness in machine learning al-

gorithms, emphasizing the importance of developing

accurate, objective, and fair machine learning (ML)

algorithms. They discussed causes of algorithmic

bias, deﬁnitions, and measures of fairness, and mech-

anisms for enhancing fairness.

Garcia et al. (2023) conducted a systematic review

on algorithmic discrimination in the credit domain,

making important contributions by covering funda-

mentals of discrimination theory, the legal frame-

work, concepts of algorithmic fairness and fairness

metrics applied in machine learning.

Other reviews have focused on fairness in ethical

AI in the medical ﬁeld. Bear Don’t Walk et al. (2022)

conducted a scoping review on ethical considerations

in clinical natural language processing (NLP). The re-

view highlights ethical considerations in metric se-

lection, identiﬁcation of sensitive patient attributes,

and best practices for reconciling individual auton-

omy and leveraging patient data. Morley et al. (2020)

mapped the ethical issues surrounding the incorpo-

ration of AI technologies in healthcare delivery and

public health systems.

Our systematic mapping study differs from pub-

lished works due to its approach of searching the lit-

erature for studies that have investigated approaches,

limitations, and methods to evaluate and mitigate bi-

ases, injustices, inequalities, and discriminations in

machine learning algorithms applied in the healthcare

ﬁeld.

3 METHODOLOGY

We followed the guidelines of Kitchenham and Char-

ters (2007) for conducting systematic mappings.

The systematic mapping process was conducted

using three software tools: 1. Parsifal: Assisted

in creating the search query, applying inclusion and

exclusion criteria, and retrieving relevant papers; 2.

Zotero: Used for organizing papers in a taxonomy,

creating classiﬁcation terms, and conducting textual

analysis; 3. Iramuteq: Employed to visualize the re-

sults of the mapping process.

The search string design is the basis for any sys-

tematic study that will guarantee reproducibility.

The online tool Parsifal was used to identify pri-

mary studies. We entered the PICOC (population, in-

tervention, comparison, outcomes, and context) terms

and research questions into Parsifal, which automati-

cally generated keywords and a search string. We then

made the necessary adjustments.

The terms entered in Parsifal were: Population:

healthcare ﬁeld; Intervention: machine learning,

deep learning, neural networks; Outcomes: evalua-

tion of the fairness of decisions made from the use

of machine learning algorithms applied in the health-

care ﬁeld; Context: use of machine learning tech-

niques in the healthcare ﬁeld with the aim of improv-

ing decision-making processes and, at the same time,

guaranteeing justice and equity in the results.

Based on the PICOC deﬁnition, the search string

was: healthcare AND (“machine learning” OR “deep

learning” OR “neural networks”) AND (fairness OR

bias OR discrimination OR equity OR justice).

We applied this search string to the ACM Digital

Library, IEEE Xplore, PubMed, ScienceDirect, and

Scopus databases, searching titles and abstracts with-

out a speciﬁc time period. The searches of ACM DL,

IEEE Xplore, PubMed, and ScienceDirect were con-

ducted on April 13, 2023, and the Scopus search was

conducted on May 9, 2023.

We retrieved a total of 1,013 records from the fol-

lowing databases: 59 from ACM DL, 70 from IEEE

Xplore, 227 from PubMed, 123 from ScienceDirect,

and 484 from Scopus (Figure 1).

To ensure that all duplicate records were identi-

ﬁed, we used Zotero software in conjunction with Par-

sifal to check for duplicates: 349 duplicate records

were removed (Figure 1).

Figure 1: Prism diagram detailing the process of identiﬁca-

tion, screening, and inclusion of studies.

We applied the inclusion and exclusion criteria

(Table 1). Initially, we assessed the title and ab-

stract of the articles, leading to the elimination of 558

records out of the initial 664 considered, resulting

in 106 records eligible for further assessment. Sub-

sequently, we thoroughly read the full texts and re-

ICAART 2024 - 16th International Conference on Agents and Artiﬁcial Intelligence

816

Table 1: Inclusion and exclusion criteria.

Inclusion criteria

Criterion Description

IC1 Articles must describe the application of machine

learning techniques in healthcare and explore issues

of fairness and equity.

IC2 Articles must report on outcomes related to fairness or

equity in decision-making, with a focus on measures

of fairness in machine learning.

Exclusion criteria

Criterion Description

EC1 Articles that are not written in English.

EC2 Studies that are not related to healthcare or the appli-

cation of machine learning techniques in healthcare.

EC3 Studies that are preliminary reports, books, editorials,

abstracts, posters, panels, lectures, roundtables, work-

shops, tutorials, or demonstrations.

EC4 Studies that are systematic reviews, meta-analyses,

scoping studies, policy reports, guidelines, theoretical

analysis, or consensus statements.

EC5 Studies that do not report on outcomes related to fair-

ness or equity in decision-making.

applied the inclusion and exclusion criteria, which led

to the removal of an additional 69 records. As a result,

37 studies were included in the mapping (Figure 1).

4 AN OVERVIEW OF THE

ARTICLES

Our analysis of the 37 articles included in the system-

atic mapping revealed key themes and trends. In this

overview, we used a word cloud (Figure 2) and a sim-

ilarity graph (Figure 3).

The selected articles were published from 2019 to

2023. We observed a total of 1, 6, 8, 17, and 5 publi-

cations in the years 2019, 2020, 2021, 2022, and 2023

(5 publications by May 2023), respectively. These

data suggest a growing interest in our research topic

in 2022.

The word cloud (Figure 2) was generated based

on article titles and abstracts to identify the most fre-

quent words in the texts. We excluded the words that

were present from our search string to identify other

relevant terms.

Figure 2: Word cloud.

The word “model” emerges as the most prominent

word, highlighting its importance in the ﬁeld of study.

“Data” and “method” also highlight the importance of

data and training methods in machine learning algo-

rithms. Other noteworthy words related to application

development and machine learning include dataset,

prediction, performance, system, application, algo-

rithm, training, classiﬁcation, and accuracy. Words

like biases, group, racial, sex, subgroup, disparities,

and race are prominent in the context of fairness. Re-

garding healthcare, relevant words include clinical,

medical, and patient. The prominent words in Fig-

ure 2 align with the ﬁndings of the similarity graph

presented below.

Figure 3 illustrates a similarity analysis based on

the texts of article titles and abstracts performed us-

ing the Iramuteq software. The analysis highlights

the central term “model” and its connections to re-

lated concepts like healthcare, data, bias, fairness, and

machine learning.

Figure 3: Similarity analysis performed in Iramuteq.

The similarity analysis reveals several relevant in-

sights:

Healthcare Focus: Near the word “healthcare,” we

ﬁnd terms like system, application, datasets, and AI

(artiﬁcial intelligence), highlighting the focus on soft-

ware development and the application of machine

learning in healthcare in the selected articles.

Combating Bias: Within the context of “fairness,”

studies focus on combating disparities that may arise

from biased machine learning algorithms used in

healthcare. Words associated with “fairness” include

improve, problem, research, resource, explainability,

measure, test, challenge, and work. Notably, “sen-

sitive” and “subgroup” appear between “model” and

Fair and Equitable Machine Learning Algorithms in Healthcare: A Systematic Mapping

817

“fairness,” indicating speciﬁc areas of concern.

Data and Methods: Connections to “datum” reveal

important words like medical, image, information,

synthetic, disparity, and investigate. These highlight

the need for research in fairness, data challenges, and

synthetic data generation approaches to reduce dispar-

ities and promote equity in medical AI development.

Next to the word “datum,” we have “method,” which

connects with reduction, equity, regression, and base,

further emphasizing these themes.

Machine Learning in this Context: Within the “ma-

chine learning” (ML) category, relevant words in-

clude structure, inequality, and recent. Additionally,

the “ml” branch (referring to machine learning) con-

nects with words like analysis, ethical, cardiac, pro-

cess, and regression based. These terms highlight cur-

rent research directions and ethical considerations in

this domain.

Mitigating Bias: Words related to mitigating algo-

rithmic biases and understanding their negative im-

pacts connect to the term “bias” through terms like

algorithm, mitigation, metric, introduce, effect, mech-

anism, mitigate, and study. This underscores the im-

portance of addressing bias issues in machine learning

models.

This analysis provided valuable insights into the

key topics and concerns surrounding fairness in ma-

chine learning for healthcare.

5 RESULTS

This study contributes by providing answers to the

questions raised in the Introduction section, which are

presented below:

RQ1: What are the main statistical, technical, and

ethical approaches used to assess and mitigate biases

unfairness, inequalities, and discriminations in ma-

chine learning algorithms applied to healthcare?

For the purpose of organization, we classiﬁed the

approaches into three categories: statistical, technical,

and ethical.

1. Statistical approaches

There are several techniques to deal with imbal-

anced data, such as oversampling, undersampling,

and resampling. Among these techniques, we

found the following in the selected articles: 1.

Undersampling (Zhang et al., 2021); 2. Resam-

pling (Reeves et al., 2022); and 3. Stratiﬁed batch

sampling (Puyol-Ant

on et al., 2021).

Other articles were returned by the search string

that described the application of data sampling

techniques to balance imbalanced data, but they

were not included in this study because they did

not address fairness issues throughout the article.

2. Technical approaches

The following are the main technical approaches

described in the selected articles:

• Adversarial training framework: Yang et al.

(2023).

• A new machine learning algorithm, called

pseudo bias-balanced learning: Luo et al.

(2022).

• Assessing the impact of Swarm Learning (SL)

on justice: Fan et al. (2021).

• Differentially Private (DP): Suriyakumar et al.

(2021).

• EXplainable Artiﬁcial Intelligence (XAI) as a

contribution to fairness in machine learning in

healthcare: Rueda et al. (2022).

• Connections between interpretability methods

and fairness: El Shawi et al. (2019), Sahoo et al.

(2022), and Meng et al. (2022).

• Technique “de-bias” based on AIF360:

Paviglianiti and Pasero (2020).

3. Ethical approaches

The table 2 below summarizes the main studies

and approaches described in the articles, aimed at

ensuring that machine learning models are devel-

oped and used fairly and ethically, reducing the

risk of bias related to factors such as gender, age,

race, and other sociodemographic factors.

RQ2: What are the technical, ethical, and social lim-

itations and challenges in the design, development,

and implementation of fair and equitable machine

learning algorithms in healthcare?

• Complex interactions between clinical entities:

Predicting risk proﬁles accurately becomes difﬁ-

cult (Pham et al., 2023).

• Lack of interpretability: Understanding underly-

ing mechanisms and model decisions is hindered

(Chang et al., 2022; Meng et al., 2022).

• Access to healthcare data: Strict privacy laws pro-

tecting patient data in EHRs limit research re-

producibility and hinder new discoveries (Bhanot

et al., 2021).

• Tailoring methods for healthcare applications:

Developing effective and speciﬁc models remains

a challenge, as for example in medical image anal-

ysis (Stanley et al., 2022).

RQ3: What are the research gaps pointed out in the

articles on fairness and equity in machine learning in

healthcare?

ICAART 2024 - 16th International Conference on Agents and Artiﬁcial Intelligence

818

Table 2: Main ethical approaches described in the articles.

Approach Description Article

A multi-view multi-task neural network

architecture

Researchers designed a multi-view multi-task neural network architecture

(MuViTaNet) and an equity variant (F-MuViTaNet) to accurately predict the

onset of multiple complications and efﬁciently interpret its predictions, miti-

gating unfairness across different patient groups.

(Pham et al., 2023)

A framework for Representational Ethical

Model Calibration

Framework developed to detect and quantify inequities in model performance

across subpopulations deﬁned by multiple and interacting characteristics.

(Carruthers et al., 2022)

Analysis of the effects of sociodemo-

graphic confounding factors

Shows that unfair models can produce different outcomes between subgroups,

and that these outcomes can explain biased performance.

(Stanley et al., 2022)

Development of fairness metrics for syn-

thetic data

Two metrics were developed by the researchers: 1. a disparity metric for

synthetic data using the concept of disparate impact, and 2. a time-series

metric to assess disparate impact overtime.

(Bhanot et al., 2021)

Experiments on race prediction from con-

founding factors

Suggest that race prediction models can be biased due to the presence of con-

founding factors.

(Duffy et al., 2022)

Fairness metric and testing for regression

ML systems

Proposing a fairness metric and a fairness testing algorithm for regression

machine learning systems

(Perera et al., 2022)

Four-step analytical process for identify-

ing and mitigating biases

Provides a set of steps for identifying and mitigating biases in AI/ML algo-

rithms and solutions.

(Agarwal et al., 2023)

Machine learning (ML) and optimization

decoupling framework

Allows the ML and optimization components of the algorithm to be developed

independently, which can help to reduce bias.

(Shanklin et al., 2022)

Machine learning (ML) model to reduce

biases related to sex, age, and race

Uses a combination of techniques to reduce bias in machine learning models. (Perez Alday et al., 2022)

Metrics to measure the fairness of expla-

nation models or ﬁdelity gaps between

subgroups

The researchers introduce two new metrics: Maximum Fidelity Gap from

Average, and Mean Fidelity Gap Amongst Subgroups.

(Balagopalan et al., 2022)

Quantitative evaluation of interpretability

methods

Uses metrics to evaluate bias in deep learning models. (Meng et al., 2022)

Study on predictive risks of minimal

racial bias mitigation

Demonstrates that minimal racial bias mitigation can lead to worse predictive

performance for minority groups.

(Barton et al., 2023)

A major gap is the need to address structural barri-

ers and individual interactions in the health context to

achieve health equity (Monlezun et al., 2022). Simply

optimizing AI/ML algorithms to remove bias is insuf-

ﬁcient. It is crucial to understand the broader social

determinants of health and ﬁnd subtler patterns that

advocate for patients, rather than relying solely on

group-level minority subgroup corrections (Li et al.,

2022).

Another identiﬁed research gap emphasizes the

importance of continually evaluating and auditing ML

models for racial bias in clinical decision-making,

even when explicit sensitive identiﬁers are removed

from clinical notes (Adam et al., 2022).

Finally, there is a need for further research and

testing of domain generalization methods in machine

learning in clinical settings, exploring their impact on

fairness and their performance when in the presence

of bias (Zhang et al., 2021).

6 DISCUSSION

In this section, we discuss the results of our mapping.

6.1 Approaches for Machine Learning

Models to Mitigate the Imbalanced

Data Problem in Healthcare

This section discusses approaches for mitigating im-

balanced data in healthcare machine learning mod-

els, acknowledging that such data is inherently imbal-

anced, uncertain, and prone to missing values (Wang

et al., 2022). We speciﬁcally focus on three data sam-

pling techniques: undersampling, resampling, and

stratiﬁed batch, discussed in detail below.

6.1.1 Undersampling Use

Undersampling balances datasets by removing major-

ity class samples. However, this risks losing valuable

information (Alani et al., 2020; Reeves et al., 2022).

Zhang et al. (2021) suggested a framework for

stress-test domain generalization methods in health-

care. They found that these methods can sometimes

outperform traditional approaches, but may also lead

to worse fairness and performance under certain con-

ditions. They observed that directly providing the

subsampled feature signiﬁcantly reduces fairness and

performance for both domain generalization and em-

pirical risk minimization (ERM) algorithms. This

Fair and Equitable Machine Learning Algorithms in Healthcare: A Systematic Mapping

819

suggests increased reliance on spurious correlations

when the subsampled feature is directly provided, re-

sulting in poor performance and fairness signiﬁcantly

worse under distribution shift.

6.1.2 Resampling for Fairness

Resampling methods can be classiﬁed into two main

categories: oversampling and undersampling, both

aiming to achieve a balanced class distribution (Alani

et al., 2020).

In Reeves et al. (2022), the strategy used was re-

sampling techniques (Blind, Separate, Equity) to bal-

ance the racial distribution in the data sample. Detail-

ing the approach:

• Blind resampling: This is a baseline method that

randomly samples a subset of the majority class

(patients who do not die of suicide) to balance

the training set. It does not consider racial/ethnic

group membership.

• Separate resampling: This method separates the

training data by racial/ethnic group and under-

samples the majority class in each group to bal-

ance the data. It trains disjoint models for each

racial/ethnic group.

• Equity resampling: This method divides the train-

ing data by both racial/ethnic group and class la-

bel.

6.1.3 Stratiﬁed Batch Sampling

In this approach, the data is stratiﬁed by the protected

attribute(s) for each training batch, and samples are

selected to ensure that each protected group is equally

represented (Puyol-Ant

on et al., 2021).

6.2 The Importance of Explainability

for Justice in Machine Learning in

Healthcare

Rueda et al. (2022) highlight the inherent trade-off

between precision and explainability in AI models.

While high-performing models like deep learning of-

ten lack transparency, easily interpretable models typ-

ically exhibit lower precision (Holzinger et al., 2019

apud Rueda et al., 2022). This tension has signiﬁcant

implications for distributive justice, which concerns

the fair allocation of resources. In healthcare, this

means ensuring everyone has access to quality care,

even when resources are limited.

Outcome-oriented justice theories prioritize preci-

sion to maximize beneﬁts for the most people. How-

ever, Rueda et al. (2022) also bring procedural justice

into the argument, which emphasizes that the process

on which decisions are based is a fundamental aspect

of judgments about justice.

Rueda et al. (2022) argue for procedural justice,

emphasizing the importance of fair and transparent

decision-making processes. Explainability plays a

crucial role in achieving this, enabling veriﬁcation of

unbiased decisions and attributing moral responsibil-

ity.

Balagopalan et al. (2022) introduced metrics to

measure the fairness of explanation models or ﬁdelity

gaps between subgroups. They argue that an expla-

nation model can be faithful to the overall black box,

but still be unfair to certain subgroups.

6.3 Mitigating Biases and Injustices

Related to Sensitive Attributes

Mitigating biases related to age, race, gender, and

other sensitive attributes is critical in healthcare

AI. We identiﬁed various methods used by re-

searchers, including resampling techniques (Reeves

et al., 2022).

Adam et al. (2022) highlighted the importance

of clinical notes in machine learning models, but

also warned that these models can perpetuate biases

against minorities. Notably, they demonstrated the

ability to infer patient race from clinical notes even

without explicit access to the attribute.

This underscores the importance of combating bi-

ases in machine learning models used in healthcare,

as disparities in healthcare outcomes for minorities

have been well documented, such as the ﬁnding by

Lee et al. (2019) that “physicians are less likely to

provide Black patients with analgesia for acute pain

in the emergency room” (Lee et al., 2019 apud Adam

et al., 2022).

Puyol-Ant

on et al. (2021) conducted an analyzing

the impartiality of deep learning-based cardiac mag-

netic resonance imaging (MRI) segmentation models.

Their work focused on the impact of gender and racial

imbalance in training data and proposed three strate-

gies to mitigate bias: stratiﬁed batch sampling, fair

meta-learning for segmentation, and protected group

models.

7 CONCLUSION

Our ﬁndings suggest a growing interest in the re-

search topic, with a signiﬁcant number of publica-

tions in recent years. Analyzing article titles and ab-

stracts revealed key themes like justice, bias mitiga-

tion, interpretability, and the impact of imbalanced

ICAART 2024 - 16th International Conference on Agents and Artiﬁcial Intelligence

820

datasets. We identiﬁed various methods employed

by researchers, including hybrid approaches, subsam-

pling, and protected group models, to address imbal-

anced data and promote fairness. Additionally, ex-

plainability approaches were highlighted as crucial

for achieving transparency and understanding in ML

models.

Our study also emphasizes the importance of dis-

cussing and mitigating biases related to sensitive at-

tributes like age, race, and gender. The Studies de-

scribed approaches such as balancing racial propor-

tions in datasets, examining implicit bias in clinical

notes, and improving model interpretability to iden-

tify disparities across demographic groups.

This comprehensive overview of the research

landscape provides valuable insights into addressing

biases and injustices in healthcare ML algorithms.

REFERENCES

Adam, H., Yang, M. Y., Cato, K., Baldini, I., Senteio,

C., Celi, L. A., Zeng, J., Singh, M., and Ghassemi,

M. (2022). Write It Like You See It: Detectable Dif-

ferences in Clinical Notes by Race Lead to Differen-

tial Model Recommendations. In Proceedings of the

2022 AAAI/ACM Conference on AI, Ethics, and Society,

AIES ’22, pages 7–21, New York, NY, USA. Association

for Computing Machinery. event-place: Oxford, United

Kingdom.

Agarwal, R., Bjarnadottir, M., Rhue, L., Dugas, M., Crow-

ley, K., Clark, J., and Gao, G. (2023). Addressing algo-

rithmic bias and the perpetuation of health inequities: An

AI bias aware framework. Health Policy and Technology,

12(1):100702.

Alani, A. A., Cosma, G., and Taherkhani, A. (2020). Clas-

sifying Imbalanced Multi-modal Sensor Data for Hu-

man Activity Recognition in a Smart Home using Deep

Learning. In 2020 International Joint Conference on

Neural Networks (IJCNN), pages 1–8. ISSN: 2161-4407.

Ashok, M., Madan, R., Joha, A., and Sivarajah, U. (2022).

Ethical framework for artiﬁcial intelligence and digital

technologies. International Journal of Information Man-

agement, 62:102433.

Balagopalan, A., Zhang, H., Hamidieh, K., Hartvigsen, T.,

Rudzicz, F., and Ghassemi, M. (2022). The Road to

Explainability is Paved with Bias: Measuring the Fair-

ness of Explanations. In 2022 ACM Conference on

Fairness, Accountability, and Transparency, FAccT ’22,

pages 1194–1206, New York, NY, USA. Association for

Computing Machinery. event-place: Seoul, Republic of

Korea.

Barton, M., Hamza, M., and Guevel, B. (2023). Racial

Equity in Healthcare Machine Learning: Illustrating

Bias in Models With Minimal Bias Mitigation. Cureus,

15(2):e35037. Place: United States.

Bear Don’t Walk, O. J. t., Reyes Nieva, H., Lee, S. S.-J., and

Elhadad, N. (2022). A scoping review of ethics consid-

erations in clinical natural language processing. JAMIA

open, 5(2):ooac039. Place: United States.

Bhanot, K., Qi, M., Erickson, J. S., Guyon, I., and Ben-

nett, K. P. (2021). The Problem of Fairness in Synthetic

Healthcare Data. Entropy (Basel, Switzerland), 23(9).

Place: Switzerland.

Carruthers, R., Straw, I., Rufﬂe, J. K., Herron, D., Nel-

son, A., Bzdok, D., Fernandez-Reyes, D., Rees, G., and

Nachev, P. (2022). Representational ethical model cali-

bration. NPJ digital medicine, 5(1):170. Place: England.

Chang, C.-H., Caruana, R., and Goldenberg, A. (2022).

NODE-GAM: NEURAL GENERALIZED ADDITIVE

MODEL FOR INTERPRETABLE DEEP LEARNING.

In ICLR 2022 - 10th International Conference on Learn-

ing Representations. International Conference on Learn-

ing Representations, ICLR. Type: Conference paper.

Duffy, G., Clarke, S. L., Christensen, M., He, B., Yuan, N.,

Cheng, S., and Ouyang, D. (2022). Confounders mediate

AI prediction of demographics in medical imaging. NPJ

digital medicine, 5(1):188. Place: England.

El Shawi, R., Sherif, Y., Al-Mallah, M., and Sakr, S. (2019).

Interpretability in HealthCare A Comparative Study of

Local Machine Learning Interpretability Techniques. In

2019 IEEE 32nd International Symposium on Computer-

Based Medical Systems (CBMS), pages 275–280. ISSN:

2372-9198.

Fan, D., Wu, Y., and Li, X. (2021). On the Fairness of

Swarm Learning in Skin Lesion Classiﬁcation. Lecture

Notes in Computer Science (including subseries Lec-

ture Notes in Artiﬁcial Intelligence and Lecture Notes in

Bioinformatics), 12969 LNCS:120 – 129. ISBN: 978-

303090873-7 Publisher: Springer Science and Business

Media Deutschland GmbH Type: Conference paper.

Garcia, A. C. B., Garcia, M. G. P., and Rigobon, R. (2023).

Algorithmic discrimination in the credit domain: what

do we know about it? AI & SOCIETY.

Holzinger, A., Langs, G., Denk, H., Zatloukal, K., and

uller, H. (2019). Causability and explainability of arti-

ﬁcial intelligence in medicine. WIREs Data Mining and

Knowledge Discovery, 9(4):e1312.

Kazim, E. and Koshiyama, A. S. (2021). A high-level

overview of ai ethics. Patterns, 2(9):100314.

Kitchenham, B. A. and Charters, S. (2007). Guidelines

for performing Systematic Literature Reviews in Soft-

ware Engineering. Technical Report EBSE 2007-001,

Keele University. Backup Publisher: Keele University

and Durham University Joint Report.

Lee, P., Le Saux, M., Siegel, R., Goyal, M., Chen, C., Ma,

Y., and Meltzer, A. C. (2019). Racial and ethnic dispar-

ities in the management of acute pain in US emergency

departments: Meta-analysis and systematic review. The

American journal of emergency medicine, 37(9):1770–

1777. Place: United States.

Li, Y., Wang, H., and Luo, Y. (2022). Improving Fairness in

the Prediction of Heart Failure Length of Stay and Mor-

tality by Integrating Social Determinants of Health. Cir-

culation. Heart failure, 15(11):e009473. Place: United

States.

Fair and Equitable Machine Learning Algorithms in Healthcare: A Systematic Mapping

821

Luo, L., Xu, D., Chen, H., Wong, T.-T., and Heng, P.-

A. (2022). Pseudo Bias-Balanced Learning for Debi-

ased Chest X-Ray Classiﬁcation. Lecture Notes in Com-

puter Science (including subseries Lecture Notes in Arti-

ﬁcial Intelligence and Lecture Notes in Bioinformatics),

13438 LNCS:621 – 631. ISBN: 978-303116451-4 Pub-

lisher: Springer Science and Business Media Deutsch-

land GmbH Type: Conference paper.

Meng, C., Trinh, L., Xu, N., Enouen, J., and Liu, Y. (2022).

Interpretability and fairness evaluation of deep learn-

ing models on MIMIC-IV dataset. Scientiﬁc reports,

12(1):7166. Place: England.

Monlezun, D. J., Sinyavskiy, O., Peters, N., Steigner, L.,

Aksamit, T., Girault, M. I., Garcia, A., Gallagher, C.,

and Iliescu, C. (2022). Artiﬁcial Intelligence-Augmented

Propensity Score, Cost Effectiveness and Computational

Ethical Analysis of Cardiac Arrest and Active Cancer

with Novel Mortality Predictive Score. Medicina (Kau-

nas, Lithuania), 58(8). Place: Switzerland.

Morley, J., Machado, C. C., Burr, C., Cowls, J., Joshi, I.,

Taddeo, M., and Floridi, L. (2020). The ethics of ai

in health care: A mapping review. Social Science &

Medicine, 260:113172.

Obermeyer, Z., Powers, B., Vogeli, C., and Mullainathan,

S. (2019). Dissecting racial bias in an algorithm

used to manage the health of populations. Science,

366(6464):447–453.

Paviglianiti, A. and Pasero, E. (2020). VITAL-ECG: a de-

bias algorithm embedded in a gender-immune device. In

2020 IEEE International Workshop on Metrology for In-

dustry 4.0 & IoT, pages 314–318.

Perera, A., Aleti, A., Tantithamthavorn, C., Jiarpakdee,

J., Turhan, B., Kuhn, L., and Walker, K. (2022).

Search-based fairness testing for regression-based ma-

chine learning systems. Empirical Software Engineering,

27(3). Publisher: Springer Type: Article.

Perez Alday, E. A., Rad, A. B., Reyna, M. A., Sadr, N., Gu,

A., Li, Q., Dumitru, M., Xue, J., Albert, D., Sameni, R.,

and Clifford, G. D. (2022). Age, sex and race bias in

automated arrhythmia detectors. Journal of electrocardi-

ology, 74:5–9. Place: United States.

Pessach, D. and Shmueli, E. (2022). A Review on Fair-

ness in Machine Learning. ACM Comput. Surv., 55(3).

Place: New York, NY, USA Publisher: Association for

Computing Machinery.

Pham, T.-H., Yin, C., Mehta, L., Zhang, X., and Zhang,

P. (2023). A fair and interpretable network for clini-

cal risk prediction: a regularized multi-view multi-task

learning approach. Knowledge and information systems,

65(4):1487–1521. Place: England.

Puyol-Ant

on, E., Ruijsink, B., Piechnik, S. K., Neubauer,

S., Petersen, S. E., Razavi, R., and King, A. P. (2021).

Fairness in Cardiac MR Image Analysis: An Investi-

gation of Bias Due to Data Imbalance in Deep Learn-

ing Based Segmentation. Lecture Notes in Computer

Science (including subseries Lecture Notes in Artiﬁ-

cial Intelligence and Lecture Notes in Bioinformatics),

12903 LNCS:413 – 423. ISBN: 978-303087198-7 Pub-

lisher: Springer Science and Business Media Deutsch-

land GmbH Type: Conference paper.

Reeves, M., Bhat, H. S., and Goldman-Mellor, S. (2022).

Resampling to address inequities in predictive modeling

of suicide deaths. BMJ health & care informatics, 29(1).

Place: England.

Rueda, J., Rodr

ıguez, J. D., Jounou, I. P., Hortal-Carmona,

J., Aus

ın, T., and Rodr

ıguez-Arias, D. (2022). ”Just” ac-

curacy? Procedural fairness demands explainability in

AI-based medical resource allocations. AI & society,

pages 1–12. Place: Germany.

Sahoo, H. S., Ingraham, N. E., Silverman, G. M., and Sar-

tori, J. M. (2022). Towards Fairness and Interpretabil-

ity: Clinical Decision Support for Acute Coronary Syn-

drome. In 2022 21st IEEE International Conference

on Machine Learning and Applications (ICMLA), pages

882–886.

Shanklin, R., Samorani, M., Harris, S., and Santoro, M. A.

(2022). Ethical Redress of Racial Inequities in AI:

Lessons from Decoupling Machine Learning from Op-

timization in Medical Appointment Scheduling. Philos-

ophy & technology, 35(4):96. Place: Netherlands.

Stanley, E. A. M., Wilms, M., and Forkert, N. D. (2022).

Disproportionate Subgroup Impacts and Other Chal-

lenges of Fairness in Artiﬁcial Intelligence for Medical

Image Analysis. Lecture Notes in Computer Science (in-

cluding subseries Lecture Notes in Artiﬁcial Intelligence

and Lecture Notes in Bioinformatics), 13755 LNCS:14

– 25. ISBN: 978-303123222-0 Publisher: Springer

Science and Business Media Deutschland GmbH Type:

Conference paper.

Suriyakumar, V. M., Papernot, N., Goldenberg, A., and

Ghassemi, M. (2021). Chasing Your Long Tails: Dif-

ferentially Private Prediction in Health Care Settings. In

Proceedings of the 2021 ACM Conference on Fairness,

Accountability, and Transparency, FAccT ’21, pages

723–734, New York, NY, USA. Association for Com-

puting Machinery. event-place: Virtual Event, Canada.

Wang, Z., Liu, C., and Yao, B. (2022). Multi-Branching

Neural Network for Myocardial Infarction Prediction. In

2022 IEEE 18th International Conference on Automa-

tion Science and Engineering (CASE), pages 2118–2123.

ISSN: 2161-8089.

Yang, J., Soltan, A. A. S., Eyre, D. W., Yang, Y., and

Clifton, D. A. (2023). An adversarial training frame-

work for mitigating algorithmic biases in clinical ma-

chine learning. NPJ digital medicine, 6(1):55. Place:

England.

Zhang, H., Dullerud, N., Seyyed-Kalantari, L., Morris,

Q., Joshi, S., and Ghassemi, M. (2021). An Empirical

Framework for Domain Generalization in Clinical Set-

tings. In Proceedings of the Conference on Health, In-

ference, and Learning, CHIL ’21, pages 279–290, New

York, NY, USA. Association for Computing Machinery.

event-place: Virtual Event, USA.

ICAART 2024 - 16th International Conference on Agents and Artiﬁcial Intelligence

822