Nearest Neighbours and XAI Based Approach for Soft Labelling

Ramisetty Kavya, Jabez Christopher and Subhrakanta Panda

Department of Computer Science and Information Systems, BITS Pilani, Hyderabad Campus, Telangana, India

Keywords:

Supervised Learning, Probabilistic Label, K-nearest Neighbour, Dempster Shafer Theory, Uncertainty.

Abstract:

Hard label assignment is a challenging task in case of epistemic uncertainty. This work initially converts the

hard labels of evidential instances into probabilistic labels based on k-nearest neighbours. Neighbours are

identiﬁed in a way that at least half of them must belong to the hard label of the corresponding evidential

instance. The probabilistic label of a decision query is computed by combining the probabilistic labels of the

nearest neighbours using Dempster’s combination rule. Synthetic data is considered to verify the probabilistic

labels over hard labels by varying the number of samples, number of neighbours and the overlapping degree

between the classes. It is observed that the performance of the method mainly depends on the overlapping

degree between classes. Probabilistic labels are intuitive compared to hard labels in case of high overlapping

region. Moreover, few publicly available datasets are also considered to verify the performance of probabilistic

labels on boundary instances. The proposed method achieves an accuracy of 90.44% and 98.24% on breast

dataset trained with 10% and 90% of data respectively. Therefore the proposed method is sample efﬁcient,

calibrated, and interpretable.

1 INTRODUCTION

Machine learning (ML) algorithms learn patterns

from training data for assigning labels to prediction

queries. However, availability of less number of ev-

idential instances (training samples), and uncertainty

in their decision alternatives (classes) may reduce the

performance of the ML algorithms. If the ML model

is not interpretable, then it may not be able to pro-

vide explanations for decision alternatives. These two

characteristics of Explainable Artiﬁcial Intelligence

(XAI) – performance and explainability – motivates

to ﬁnd a learning methodology which is sample ef-

ﬁcient and interpretable to provide explainable out-

comes.

A model is said to be accurate when it is able

to differentiate among disjoint classes. Most of the

learning paradigms in literature may not be able to

produce accurate models because of uncertainty in ev-

idential instances (Kavya and Christopher, 2022). If

the learning model is not accurate, then assigning a

hard label to unknown decision query is quite difﬁ-

cult and challenging task.

Hard label, also known as crisp label, can be en-

coded as one-hot vector; this indicates that the in-

stance strictly belongs to a particular class. But the

strict encoding is difﬁcult when there is uncertainty

in the instances. The difﬁculty with this kind of

hard labelling can be handled in two ways: ﬁrst is

to gather additional instances which supports learn-

ing algorithms in differentiating disjoint classes for

generating models, and second is to convert the hard

labels to probabilistic labels. In domains like health-

care, gathering additional information which is hav-

ing less uncertainty may not be possible all the time.

A probabilistic label or soft label can be encoded as

a belief structure in which each disjoint class is as-

signed with a membership degree. Hard label is a

special case of probabilistic label when the member-

ship degrees of all the classes other than one particular

class are equal to zero. Probabilistic labelling is use-

ful in cases where the overlapping region among dis-

joint classes is high. Most of the existing datasets for

developing an accurate classiﬁcation model are hard

labelled irrespective of the uncertainty in evidential

data.

The existing learning algorithms like k-nearest

neighbours, support vector machine, decision trees,

neural networks, and ensemble approaches which are

trained on the evidential samples of hard labels are

assigning hard labels to decision queries though these

models are not 100% conﬁdent. In such less conﬁdent

scenarios, these learning algorithms would have fol-

lowed probabilistic labelling, but none of them does

108

Kavya, R., Christopher, J. and Panda, S.

Nearest Neighbours and XAI Based Approach for Soft Labelling.

DOI: 10.5220/0011619800003393

In Proceedings of the 15th International Conference on Agents and Artiﬁcial Intelligence (ICAART 2023) - Volume 3, pages 108-116

ISBN: 978-989-758-623-1; ISSN: 2184-433X

 2023 by SCITEPRESS – Science and Technology Publications, Lda. Under CC license (CC BY-NC-ND 4.0)

because they do not know the conversion of hard la-

bel to probabilistic label. Therefore, this work mainly

focuses on for generating accurate models when the

evidential data is limited and uncertain.

The proposed probabilistic labelling method ini-

tially converts the hard labelled evidential data into

probabilistic labelled evidential data by identifying

neighbours. The proposed method imposes a con-

dition while identifying neighbours of an evidential

instance; condition is that at least half of the neigh-

bours must belong to the hard label of the correspond-

ing evidential instance. This condition assures that

the membership degree corresponds to the hard la-

bel of an evidential instance will never be less that

0.5. When a new decision query arrives, the proposed

method identiﬁes the nearest evidential instances, and

combines their probabilistic labels using Dempster’s

combination rule. This rule results in a belief struc-

ture and it is considered as the probabilistic label of

that decision query. The main contributions of this

work are as follows:

1. A relabelling method is proposed based on the k-

nearest neighbours algorithm for converting hard

labels of evidential instances to probabilistic la-

bels.

2. A decision-making model is proposed based on

the k-nearest neighbours algorithm and Demp-

ster’s combination rule for assigning probabilistic

labels to decision queries.

2 FOUNDATIONS AND RELATED

WORKS

The objective of learning algorithms for solving clas-

siﬁcation problems is to ﬁnd a function, f : X → Y ,

where X ∈ R

and Y ∈ {c

,...,c

}, which min-

imises the error between original label and predicted

label. Dataset (D) consists of samples where each

sample (x ∈ D) is a combination of n attribute-value

pairs and a original label (c

∈ Y ). P number of in-

stances in D are considered to train the classiﬁcation

model, and the remaining instances in D are consid-

ered to test the performance of the developed classiﬁ-

cation model.

2.1 Probabilistic Labels

In general, original label c

of an instance x ∈ D can be

encoded as a vector of length m where i

position is

equal to one and all the remaining positions are equal

to zero. For example, if the original label of x is en-

coded as f (x) = [0,0,1,0,0], then it can be interpreted

as x belongs to c

(Vega et al., 2021). Assigning en-

tire probability to one single class is undesirable es-

pecially in healthcare domain where the information

contains uncertainty.

In contrast, probabilistic labels share the probabil-

ity among disjoint classes. A probabilistic label can

be encoded as a vector of probabilities where each

probability is the membership degree of the corre-

sponding class. For example, if the probabilistic label

of x is encoded as f (x) = [0.2,0.3,0.1,0.2,0.2], then

it can be interpreted as the decision-making model is

30% conﬁdent that x belongs to c

. This vector serves

as a knowledge to decision-maker for choosing an op-

timal decision.

In literature, there are two different types of prob-

abilistic labels, namely, instance-wise and group-

wise. Former is having independent probabilistic la-

bel for each instance, whereas, latter is having one

common probabilistic label for group of instances.

This work considers instance-wise probabilistic labels

which means that that each instance is having a label

of probability values corresponding to decision alter-

natives.

2.2 K-nearest Neighbours

Let q be the decision query to which a label (l) needs

to be assigned by k-Nearest Neighbours (k-NN) clas-

siﬁcation model (M) (Christopher, 2019). M ap-

plies distance measures like Euclidean, Manhattan,

and others to identify the set of nearest evidential in-

stances, NN = x

,..., x

∈ D for q. The labels of

the evidential instances in NN are considered for as-

signing l to q. The choice of the distance measure,

optimal number of neighbours, and the strategy for

ﬁnding nearest neighbours are the three main factors

which can impact the performance of k-NN.

2.3 Dempster Shafer Theory

Dempster Shafer theory considers the powerset of

decision alternatives to represent uncertainty, and a

combination operation which satisﬁes both associa-

tive and commutative properties to combine belief

structures (Dempster, 2008). Let Θ be the Frame of

Discernment (FoD) which consists of a ﬁnite non-

empty set of mutually exclusive or disjoint class la-

bels in a dataset, D. The powerset of Θ is represented

℘(Θ) = {{

0},{θ

},{θ

},..., {θ

,θ

},..., Θ}

Let y(x

) and y(x

) are the probabilistic labels associ-

ated with x

and x

instances. Dempster’s combination

Nearest Neighbours and XAI Based Approach for Soft Labelling

109

rule is applied to combine x

and x

x(2),θ







0 If θ =

1 − k

∑

A∩B=θ

y(x

,A)y(x

,B) If θ ∈ ℘(Θ)

(1)

Normalisation factor, k =

∑

A∩B=

y(x

,A)y(x

,B)

where, r

x(n),θ

is the membership degree for θ in the re-

sultant probabilistic label obtained by combining the

probabilistic labels of n instances.

2.4 Related Works

Membership degrees in a probabilistic label can be

either subjective or objective. Nguyen et al. fol-

lows subjective approach, and got membership de-

grees directly from domain experts (Nguyen et al.,

2011). Classiﬁcation model trained on these subjec-

tive probabilistic labels are more accurate compared

to the models trained on hard labels. However, it may

not be possible for human experts to provide reliable

membership degrees all the time.

Szegedy et al. follows objective approach, and

used a smoothing parameter to compute membership

degrees in probabilistic labels (Szegedy et al., 2016).

For example, if the parameter value is 0.1 then the

hard label, [0,1] is converted as probabilistic label

([0.1,0.9]). Accuracy of the ImageNet dataset is in-

creased by 2% when hard labels are converted into

probabilistic labels using smoothing parameter. How-

ever, Norouzi et al. argued that all the disjoint classes

should not have equal membership degrees (Norouzi

et al., 2016). For example, if there were 10 disjoint

class, and the parameter value is equal to 0.1, then this

leads to have 0.1 as the membership degree to all the

classes. Arbitrary way of penalizing hard labels is not

a good approach in case of more number of classes.

Hinton et al. used model distillation to assign

probabilistic labels (Hinton et al., 2015). Initially, a

complex model is trained to output a real-valued vec-

tor in which k

value can be considered as the mem-

bership degree of k

class. Then a model that is more

simple is trained on the output of the complex model

for prediction. Though model distillation is an ef-

fective approach, it requires huge chunks of data for

training complex models.

Gayar et al. proposes a novel method for generat-

ing soft labels based on fuzzy-clustering (Gayar et al.,

2006). Five publicly available datasets are considered

to compare the performance of k-nearest neighbours

algorithm when it is trained on hard labels and soft

labels. Experimental results prove that learning soft

labels is more robust compared to learning hard la-

bels.

Vega et al. proposed an approach to use proba-

bilistic labels for training an accurate and calibrated

deep networks (Vega et al., 2021). Three classiﬁca-

tion tasks, namely, diagnosis of hip dysplasia, glau-

coma, and fatty liver are considered; results prove that

training with probabilistic labels increases accuracy

up to 22%.

Based on the knowledge attained from aforemen-

tioned literature, it can be understood that training the

models using probabilistic labels instead of hard la-

bels, supports the development of accurate classiﬁca-

tion models. However, there are two issues that need

to be be resolved: ﬁrst, there is no appropriate rela-

belling method for converting hard labels to proba-

bilistic labels; second, ignoring original labels may

lead to unreliable probabilistic labels. This work pro-

poses a relabelling approach which converts hard la-

bels to probabilistic labels without ignoring the origi-

nal labels.

3 PROPOSED METHOD

Framework of the proposed probabilistic labelling

method based on k-nearest neighbours (k-NN) and

Dempster Shafer (DS) theory consists of two phases,

namely, relabelling and decision-making.

3.1 Relabelling Phase

Relabelling phase focuses on converting hard labels

of evidential data into probabilistic labels by iden-

tifying nearest neighbours. Evidential data consists

of instances where each instance is a combination of

attribute-value pairs and a hard label. Consider an in-

stance (x) from evidential data which needs to be re-

labelled. The distances from x to remaining eviden-

tial instances are computed based on Euclidean mea-

sure to identify k nearest neighbours of x. In clas-

sical k-NN algorithm, k neighbours are the instances

with minimal distance from x, and can belong to any

class in the evidential data. Whereas in the proposed

system, at least k/2 neighbours are the instances with

minimal distance from x, and must belong to the same

class of x. The remaining k/2 neighbours are the in-

stances with minimal distance from x, and can belong

to any class including the class of x.

The rationale behind dividing the k neighbours

into two disjoint sets is to not ignore the hard labels of

evidential instances. Once the k neighbouring eviden-

tial instances are identiﬁed, the membership degree

of each class in probabilistic label of x is computed

using interestingness measure like conﬁdence. Since

at least 50% of the neighbours belong to the corre-

ICAART 2023 - 15th International Conference on Agents and Artiﬁcial Intelligence

110

Algorithm 1: Relabelling Phase.

Input: Evidential data (D

old

) with p

instances, n attributes and m class

labels.

User-deﬁned parameter (k) which

represents number of neighbours

Output: Relabeled evidential data (D

new

)

foreach p

in D

old

NN = {} ; // NN consists of k

nearest neighbours for p

NN(p

) = {} ; // NN(p

) consists

of k/2 nearest neighbours for p

with c

NN(p

) = {} ; // NN(p

) consists of

k/2 nearest neighbours for p

with any class label

foreach p

in D

old

dist(p

, p

) =

∑

a=1

− p

)

if (dist(p

, p

)) is minimal and

∈ c

) and (|NN(p

)| <

) then

NN(p

) = p

end

else if (dist(p

, p

)) is minimal and

(|NN(p

)| <

) then

NN(p

) = p

end

NN = NN(p

) + NN(p

)

foreach c

in D

old

PL(p

) =

|NN(c

|NN|

end

new

= PL(p

)

end

sponding hard label, membership degree of the origi-

nal class of x in its probabilistic label would never be

less than 0.5. The probabilistic labels of all the evi-

dential instances are computed, and a new relabelled

evidential dataset is formed. The pseudocode for re-

labelling the evidential instances from hard to proba-

bilistic is presented in Alg. 1.

3.2 Decision-Making Phase

Decision-making phase focuses on assigning proba-

bilistic labels for decision queries based on the rela-

belled evidential data. When a decision query arrives,

the proposed system identiﬁes its k nearest evidential

instances based on the Euclidean distance. The condi-

tion which was imposed on identifying k neighbours

in relabelling phase is not considered in decision-

making phase, because the hard label of an instance

was known in relabelling phase, but the hard label of

a query is not known in decision-making phase. Thus,

the k neighbours of a decision query are the eviden-

tial instances with minimal distance, and can belong

to any class in data.

It depends on decision-maker whether to continue

with same k value or to use different k values for dif-

ferent phases. If there exist overlapping among dis-

joint classes, then increase in k value may increase the

neighbours from different classes. If the evidential

neighbours of a decision query belongs to different

classes, then there may be no considerable difference

among the membership values of different classes.

Thus, it can be observed that the choice of k value

has a signiﬁcant impact on the membership values in

a probabilistic label of a decision query when there is

overlapping.

Once the nearest evidential instances for a deci-

sion query are identiﬁed based on the Euclidean dis-

tance and k value, its probabilistic label is assigned

by combining the probabilistic labels of all the near-

est evidential instances using Dempster’s combina-

tion rule. If the probabilistic labels of nearest neigh-

bours of a decision query are highly conﬂicting, then

the membership values are updated from 0 to 0.0001.

The pseudo code for assigning a probabilistic label

to a decision query is presented in Alg. 2. The pro-

Algorithm 2: Decision-making Phase.

Input: Relabelled evidential data (D

new

)

with p instances, n attributes and m

class labels.

User-deﬁned parameter (k) which

represents number of neighbours

Decision Query (x) with n attributes

Output: Probabilistic label for x (PL(x))

foreach p

in D

new

NN(x) = {} ; // NN(x) consists of

nearest neighbours for x

dist(x, p

) =

∑

a=1

− p

)

if dist(x, p

) is minimal then

NN(x) = p

end

foreach c

in D

new

PL(x,c

) = r

NN(x),c

; // refer Eq.(1)

end

posed method focuses more on predicting the proba-

bility of a decision query belonging to each disjoint

class rather than predicting a class directly. Since

Nearest Neighbours and XAI Based Approach for Soft Labelling

111

the probabilities are important, calibration of the pro-

posed probabilistic labelling method needs to be mea-

sured and improved. Calibration of a model ensures

that the distribution of the predicted probabilities are

similar to the distribution of the observed probabil-

ities. A model is said to be calibrated if it returns

probabilities which are good estimates of the actual

likelihood of a class (Vega et al., 2021).

4 EXPERIMENTS AND RESULTS

This section presents the robustness of the proposed

probabilistic labelling method using both synthetic

data and real-world data.

4.1 Synthetic Data

Two distinct Gaussians are considered for generat-

ing synthetic data using sklearn package in python.

Total number of samples, separation region between

classes, and number of samples per each class are

considered as three parameters to generate different

datasets. The proposed method is trained and tested

on all different datasets to verify its robustness. In the

entire experiment on synthetic data, k value remains

as 5. In each dataset, 80% of the samples are consid-

ered for training, and the remaining 20% are consid-

ered for testing the proposed method.

Case 1: Total number of samples is varied from

250 to 7000 by ﬁxing the separation between the

class as 0.5, and by maintaining the equal number of

samples in each class. Figure 1 and 1 presents the

synthetic data where total number of samples is 250

and 7000 respectively. In a probabilistic label, the

(a) No. of samples = 250 (b) No. of samples = 7000

Figure 1: Varying number of samples.

class with highest membership degree is considered

as the predicted label to verify the accuracy. Table 1

presents the total number of instances along with ac-

curacy of the proposed method and the accuracy of

traditional k-NN. It can be observed from Table 1 that

increasing or decreasing the total number of instances

does not have signiﬁcant impact on the performance

of the proposed method. Moreover, the accuracy of

Table 1: Accuracy with varying number of samples.

Samples

Probabilistic

labels accuracy

Hard labels

accuracy

250 82 82

500 94 94

750 90 89

1000 86 86.5

2000 75 73.75

3000 86 86

4000 85.75 85.87

5000 72.6 71.8

6000 87.16 86.91

7000 81.71 81.28

the proposed method is almost similar to the accuracy

of the traditional k-NN.

Case 2: The separation between the classes varies

between 0 and 1 by ﬁxing the total number of sam-

ples as 1000, and by maintaining the equal number of

samples in each class. Figure 2 and 2 presents the syn-

thetic data with 0 and 1 as the separation degree be-

tween distinct classes respectively. Table 2 presents

(a) sep degree = 0 (b) sep degree = 1

Figure 2: Varying separation degree.

the separation degree between classes and accuracy.

It can be observed from Table 2 that the accuracy

Table 2: Accuracy with sep degrees.

Sep deg Acc Sep deg Acc

0.1 68.5 0.2 74.5

0.3 80 0.4 84

0.5 86 0.6 90

0.7 92 0.8 92.5

0.9 95.5 1 96

increases with increase in the separation degree be-

tween the classes. Since the proposed method uses k-

nearest neighbours as prototype to relabel training in-

stances, the decrease in the separation region leads to

have neighbours of different classes. If the neighbours

belongs to different classes, then there may not be

high difference between the membership degrees of

classes in the probabilistic label. It can be mentioned

ICAART 2023 - 15th International Conference on Agents and Artiﬁcial Intelligence

112

from Table 2 that the separation degree between the

classes has signiﬁcant impact on the accuracy of the

proposed method.

Case 3: The number of samples in each class

varies 0.1 to 0.9 by ﬁxing the total number of sam-

ples as 1000, and the separation degree between the

classes as 0.5. Figure 3 and 3 presents the synthetic

data with different distribution of samples between

distinct classes. Table 3 presents the class distribution

along with accuracy of each class.

(a) Class 1 as majority (b) Class 2 as majority

Figure 3: Varying distribution of samples.

Table 3: Accuracy with class distribution.

acc

0.1 0.9 55.5 1

0.2 0.8 65.1 1

0.3 0.7 65.5 99.2

0.4 0.6 69 99.1

0.5 0.5 76.6 95.8

0.6 0.4 79.5 87.1

0.7 0.3 84.1 85.2

0.8 0.2 88.9 71.7

0.9 0.1 92.6 27.2

It can be observed from Table 3 that the accuracy

of a class increases with increase in the number of

samples of that class. Though it is idealistic to have

balanced number of samples, the performance of the

proposed method does not greatly rely on the distri-

bution of classes. Instead it depends on the closeness

among the samples of same class.

Case 4: In this case, the percentage of training

and testing samples varies by ﬁxing the total number

of samples as 1000, separation between the classes

as 0.5, and the number of samples in each class are

equal. Table 4 presents the training and testing per-

centage and accuracy.

It can be observed from Table 4 that accuracy in-

creases with increase in the training data. However,

the proposed method is capable enough to achieve

reasonable accuracy even with less number of train-

ing instances.

Table 4: Accuracy with varying train-test-splits.

Train Test Accuracy

0.1 0.9 81.44

0.2 0.8 81.75

0.3 0.7 84.57

0.4 0.6 84.66

0.5 0.5 85

0.6 0.4 87.25

0.7 0.3 86.66

0.8 0.2 86

0.9 0.1 86

4.2 Real-World Data

This work considers the publicly available real-world

datasets from University of California Irvine (UCI)

repository.

Pima Indians Diabetes Dataset consists of 768

samples where each sample is a combination of eight

continuous-valued attributes and a binary class label.

Among 786 samples, 268 are predicted as diabetes

and the remaining 500 are predicted as non-diabetes.

This experiment starts with partitioning the

dataset two parts, namely, evidential data, and query

data. 80% of the samples belongs to evidential data

which are used to train the proposed method, and the

remaining 20% samples are considered as queries.

All the samples in the evidential data are hard la-

beled. The proposed method initially converts them

into probabilistic labels by identifying neighbours.

When a decision query arrives, proposed method

identiﬁes its nearest evidential neighbours. If the

probabilistic labels of neighbours are not having con-

siderable difference between the membership degrees

of different classes, then it leads to misclassiﬁca-

tion. There is a decision query with attribute-values

as {a

: 4, a

: 132, a

: 86, a

: 31, a

: 0, a

: 28, a

0.419,a

: 63} and class 0 is the original hard label.

After combining the probabilistic labels of its neigh-

bours using Dempster’s combination rule, the resul-

tant probabilistic label has 0.5614 and 0.4385 as the

membership degrees for class 1 and class 0 respec-

tively. Since the proposed method considers the class

with highest membership degree to compute accu-

racy, this particular decision query is considered as

misclassiﬁed. Change in the number of neighbours

may make this particular instance to assign high mem-

bership degree for class 0. However, change in num-

ber of neighbours changes the membership degrees of

remaining queries as well. Thus, classiﬁed and mis-

classiﬁed instances changes with change in k value.

Table 5 presents the k values with corresponding ac-

curacies. It can be observed from Table 5 that accu-

Nearest Neighbours and XAI Based Approach for Soft Labelling

113

Table 5: Accuracy of diabetes dataset.

k value Accuracy

1 61.68

3 75.32

5 75.97

7 76.62

9 81.81

racy increases with increase in the k value. If there

exist overlapping between classes, then increasing k

value increases the chance for neighbours from differ-

ent class. This leads to decrease in accuracy. If there

is no much overlapping between classes, then increas-

ing or decreasing the k value does not have much im-

pact on accuracy. Thus, accuracy of the proposed

method depends on the overlapping region between

the classes, not on k value. Overlapping depends

on the uncertainty in the data; probabilistic labelling

method based on Dempster’s theory is one such ap-

proach to represent uncertainty in efﬁcient manner.

Indian Liver Patient Dataset (ILPD) consists of

416 liver disease patient records and 167 non-liver

disease records. Each record is a combination of

10 continuous-valued attributes and a binary label.

The proposed method initially converts the hard la-

bels of 80% of the samples into probabilistic labels

using k-nearest neighbours. The probabilistic labels

of remaining 20% of the samples are computed by

combining the probabilistic labels of corresponding

neighbours using Dempster’s combination rule. Ta-

ble 6 presents the k values along with accuracies. It

Table 6: Accuracy of liver disease dataset.

k value Accuracy

1 59.82

3 67.52

5 64.10

7 67.52

9 64.95

can be observed from Table 6 that increasing or de-

creasing the k value does not have signiﬁcant impact

on accuracy. Only the overlapping region between the

class has an impact on the performance of the pro-

posed method.

Wisconsin breast cancer dataset consists of 569

samples where each sample is a combination of 30

continuous-valued attributes and a binary class label.

In this experiment, k value remains constant, and the

ratio of train-test split varies. Table 7 presents the

train-test-split values along with accuracies. It can be

observed from Table 7 that accuracy increases with

increase in the training data. However, the proposed

Table 7: Accuracy of breast cancer dataset.

Train Test Accuracy

0.1 0.9 90.44

0.2 0.8 92.54

0.3 0.7 93.98

0.4 0.6 94.15

0.5 0.5 95.78

0.6 0.4 96.49

0.7 0.3 96.49

0.8 0.2 96.49

0.9 0.1 98.24

method has recorded reasonable accuracy even with

10% of training samples. Thus, it can be concluded

that the proposed probabilistic labelling method is

sample efﬁcient.

4.3 Comparative Analytics

This sub-section presents comparative analysis of the

proposed probabilistic labelling method with other re-

cent relevant works in literature. Most of these works

use either fuzzy functions or kernel functions for as-

signing probability values, and Dempster’s combina-

tion rule for assigning decision probabilities. Table

8 presents the performance of these models on UCI

datasets.

It can be observed from Table 8 that the proposed

decision-making model achieves the maximum accu-

racy for ﬁve out of seven datasets. Therefore, it can

be concluded that the decision-making model with

probabilistic labelling is superior compared to other

works.

5 CONCLUSION

The proposed method initially converts hard labels

of evidential instances to probabilistic labels using k-

nearest neighbours. The condition that ensures that at

least half of neighbours belongs to the hard label of

the corresponding evidence, supports in giving prior-

ity to the original class. After relabelling evidential

instances, the decision-making model assigns prob-

abilistic labels to decision queries by combining the

labels of neighbours using Dempster’s combination

rule. It is proven from the experimental results that

proposed method is sample efﬁcient. Moreover, the

proposed method is said to be calibrated because it

is able to represent the actual likelihood of classes in

terms of posterior probabilities. The soft label gives

an understanding to a decision-maker about differ-

ent alternatives and their combinations. The degree

ICAART 2023 - 15th International Conference on Agents and Artiﬁcial Intelligence

114

Table 8: Comparative Analysis.

Recent Works Breast IRIS Heart Diabetes Liver Hepatitis Sonar

(Denoeux, 2008) 76.3 80.57 75.81

(Xu et al., 2013) 82.59 79.4 72.57

(Xu et al., 2014) 95.6 85 85.5

(Xu et al., 2016) 83.7 79.88 68.26

(Liu et al., 2017) 85.56 83.85 76.02

(Qin and Xiao, 2018) 97.07 96.7 86.7 89.03

(Jiang et al., 2019) 69.91 95.33 75.19 70.96 76.48 81.29

(Pe

naﬁel et al., 2020) 93.85 95.9 72.7

(Song et al., 2021) 97.07 86.7 76.82 78.04

(Zhu et al., 2021) 97.15 99.33 91.48 81.61

(Ranjbar and Effati, 2022) 95.22 79.48 70.96 81.62

Proposed Method 98.24 100 85.36 81.81 67.52 89.65 76.19

of overlapping among the classes is the only factor

that has signiﬁcant impact on the membership degrees

in probabilistic labels. Since identifying neighbours

for each instance is computationally complex, density

models or fuzzy approaches may be considered as the

prototype to convert hard labels to probabilistic labels

in future.

REFERENCES

Christopher, J. (2019). The science of rule-based classiﬁers.

In 2019 9th International Conference on Cloud Com-

puting, Data Science & Engineering (Conﬂuence),

pages 299–303. IEEE.

Dempster, A. P. (2008). Upper and lower probabilities in-

duced by a multivalued mapping. In Classic works of

the Dempster-Shafer theory of belief functions, pages

57–72. Springer.

Denoeux, T. (2008). A k-nearest neighbor classiﬁcation rule

based on dempster-shafer theory. In Classic works of

the Dempster-Shafer theory of belief functions, pages

737–760. Springer.

Gayar, N. E., Schwenker, F., and Palm, G. (2006). A study

of the robustness of knn classiﬁers trained using soft

labels. In IAPR Workshop on Artiﬁcial Neural Net-

works in Pattern Recognition, pages 67–80. Springer.

Hinton, G., Vinyals, O., Dean, J., et al. (2015). Distilling

the knowledge in a neural network. arXiv preprint

arXiv:1503.02531, 2(7).

Jiang, W., Huang, C., and Deng, X. (2019). A new prob-

ability transformation method based on a correlation

coefﬁcient of belief functions. International Journal

of Intelligent Systems, 34(6):1337–1347.

Kavya, R. and Christopher, J. (2022). Interpretable sys-

tems based on evidential prospect theory for decision-

making. Applied Intelligence, pages 1–26.

Liu, Y.-T., Pal, N. R., Marathe, A. R., and Lin, C.-T. (2017).

Weighted fuzzy dempster–shafer framework for mul-

timodal information integration. IEEE Transactions

on Fuzzy Systems, 26(1):338–352.

Nguyen, Q., Valizadegan, H., and Hauskrecht, M. (2011).

Learning classiﬁcation with auxiliary probabilistic in-

formation. In 2011 IEEE 11th International Confer-

ence on Data Mining, pages 477–486. IEEE.

Norouzi, M., Bengio, S., Jaitly, N., Schuster, M., Wu, Y.,

Schuurmans, D., et al. (2016). Reward augmented

maximum likelihood for neural structured prediction.

Advances In Neural Information Processing Systems,

29.

naﬁel, S., Baloian, N., Sanson, H., and Pino, J. A. (2020).

Applying dempster–shafer theory for developing a

ﬂexible, accurate and interpretable classiﬁer. Expert

Systems with Applications, 148:113262.

Qin, B. and Xiao, F. (2018). A non-parametric method to

determine basic probability assignment based on ker-

nel density estimation. IEEE Access, 6:73509–73519.

Ranjbar, M. and Effati, S. (2022). A new approach for fuzzy

classiﬁcation by a multiple-attribute decision-making

model. Soft Computing, 26(9):4249–4260.

Song, X., Qin, B., and Xiao, F. (2021). Fr–kde: a hybrid

fuzzy rule-based information fusion method with its

application in biomedical classiﬁcation. International

Journal of Fuzzy Systems, 23(2):392–404.

Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., and Wo-

jna, Z. (2016). Rethinking the inception architecture

for computer vision. In Proceedings of the IEEE con-

ference on computer vision and pattern recognition,

pages 2818–2826.

Vega, R., Gorji, P., Zhang, Z., Qin, X., Rakkunedeth, A.,

Kapur, J., Jaremko, J., and Greiner, R. (2021). Sam-

ple efﬁcient learning of image-based diagnostic clas-

siﬁers via probabilistic labels. In International Con-

ference on Artiﬁcial Intelligence and Statistics, pages

739–747. PMLR.

Xu, P., Davoine, F., Zha, H., and Denoeux, T. (2016). Evi-

dential calibration of binary svm classiﬁers. Interna-

tional Journal of Approximate Reasoning, 72:55–70.

Xu, P., Deng, Y., Su, X., and Mahadevan, S. (2013). A

new method to determine basic probability assign-

Nearest Neighbours and XAI Based Approach for Soft Labelling

115

ment from training data. Knowledge-Based Systems,

46:69–80.

Xu, P., Su, X., Mahadevan, S., Li, C., and Deng, Y. (2014).

A non-parametric method to determine basic proba-

bility assignment for classiﬁcation problems. Applied

intelligence, 41(3):681–693.

Zhu, C., Qin, B., Xiao, F., Cao, Z., and Pandey, H. M.

(2021). A fuzzy preference-based dempster-shafer ev-

idence theory for decision fusion. Information Sci-

ences, 570:306–322.

ICAART 2023 - 15th International Conference on Agents and Artiﬁcial Intelligence

116