Synergy Conformal Prediction for Regression

Niharika Gauraha and Ola Spjuth

Department of Pharmaceutical Biosciences, Uppsala University, Uppsala, Sweden

Keywords:

Conformal Prediction, Machine Learning, Regression, Synergy, Ensemble Methods.

Abstract:

Large and distributed data sets pose many challenges for machine learning, including requirements on com-

putational resources and training time. One approach is to train multiple models in parallel on subsets of data

and aggregate the resulting predictions. Large data sets can then be partitioned into smaller chunks, and for

distributed data sets the need for pooling can be avoided. Combining results from conformal predictors using

synergy rules has been shown to have advantageous properties for classiﬁcation problems. In this paper we

extend the methodology to regression problems, and we show that it produces valid and efﬁcient predictors

compared to inductive conformal predictors and cross-conformal predictors for 10 different data sets from

the UCI machine learning repository using three different machine learning methods. The approach offers a

straightforward and compelling alternative to pooling data, such as when working in distributed environments.

1 INTRODUCTION

Data has become one of the key assets of many or-

ganizations, and the amount of data continues to in-

crease in virtually all domains; sometimes referred to

as Big Data (Hashem et al., 2015; Stephens et al.,

2015; Gandomi and Haider, 2015). Training statis-

tical (machine) learning models on large data sets

can be challenging from different perspectives (El-

ish and danah boyd, 2018). For example, large data

sets require substantial computational hardware, and

in some cases large data can be difﬁcult to move into

a single computer (Zhou et al., 2017). A multitude of

approaches have been developed for machine learn-

ing on large data sets (Zhou et al., 2017; Peteiro-

Barral and Guijarro-Berdi

nas, 2013). The most pop-

ular methodology is probably to train a global model

in distributed environments that preserves data local-

ity, using frameworks such as Apache Spark (Meng

et al., 2016). Another methodology is to distribute

(partition) data into smaller units and train multiple

models in parallel and aggregate predictions (Vapnik

and Izmailov, 2016).

Conformal Prediction is a relatively recent

methodology where conformal predictors are built on

top of standard machine learning algorithms and com-

plement the predictions with valid measures of conﬁ-

dence (Vovk et al., 2005). The two main approaches

are Transductive Conformal Prediction (TCP) (Vovk,

2013) and Inductive Conformal Prediction (ICP) (Pa-

padopoulos, 2008) and they can be used for both clas-

siﬁcation and regression problems. The main draw-

back of using TCP is that it is computationally de-

manding; for every test example a re-training of the

model is required. ICP was developed to overcome

this issue; it has little computational overhead to the

underlying algorithm but there is some loss in terms

of informational efﬁciency due to a subset of training

examples are set aside for calibration. To address this

problem of information efﬁciency, ensembles of con-

formal predictors were introduced such as Cross Con-

formal Prediction (CCP) (Vovk, 2015) and Aggre-

gated Conformal Prediction (ACP) (Carlsson et al.,

2014a). The validity of TCP and ICP is proven in

that they produce 1 − ε expectation tolerance regions,

where ε is the selected signiﬁcance level (Vovk et al.,

2005). However, the validity of ensembles of CPs has

not been theoretically proven and has been discussed

in (Carlsson et al., 2014a) and (Linusson et al., 2017).

Synergy Conformal Prediction (SCP) was re-

cently proposed to address the validity problem of en-

sembles of conformal predictors that combines mono-

tonic conformity scores instead of p-values (Gauraha

and Spjuth, 2018). Its applicability has been shown

in two scenarios; where data is partitioned in order to

reduce the total model training time, and where an en-

semble of different machine learning methods is used

to improve the overall efﬁciency of predictions.

In this paper, we extend the SCP methodology for

regression problems and explore its usefulness when

partitioning large data and aggregating results, and

also for working with distributed data without pooling

into a single dataset. The paper is organized in the fol-

lowing way. In section 2, we outline the background

212

Gauraha, N. and Spjuth, O.

Synergy Conformal Prediction for Regression.

DOI: 10.5220/0010229402120221

In Proceedings of the 10th International Conference on Pattern Recognition Applications and Methods (ICPRAM 2021), pages 212-221

ISBN: 978-989-758-486-2

concepts and notations used throughout the paper. In

Section 3 we introduce synergy conformal prediction

for regression and discuss its properties. In Section

4 we perform numerical analysis on a set of real data

sets. In Section 5 we summarize our results, and in

Section 6 we conclude and discuss implications and

future outlook.

2 BACKGROUND

In this paper we mainly focus on regression problems

and assume exchangeability of observations. The ob-

ject space is denoted by X ⊂ R

, where p is the num-

ber of features, and label space is denoted by Y ⊂ R.

We assume that each example consists of an object

and its label, and its space is given as Z := X × Y .

In a classical regression setting, given ` data points

Z = {z

,...,z

} where each example z

= (x

) is la-

beled, we want to predict the label of a new object

new

In the conformal prediction setting, a nonconfor-

mity measure is a way of measuring the strangeness of

an example in relation to the previous examples (Vovk

et al., 2005). In our experiments, we use the standard

regression nonconformity measure

= |y

− ˆy

|, (1)

where ˆy

is the estimated output for the object x

using regression algorithms such as Support Vector

Regression (SVR) or Random Forests (RF).

Deﬁnition 1. Inductive Conformal Prediction (ICP)

for Regression (Papadopoulos et al., 2002)

Given a training set of ` examples, Z = {z

,...,z

drawn from an exchangeable distribution P, the train-

ing data is ﬁrst divided into a proper training set {Z

}

and a calibration set {Z

}, where (T,C) is a partition

of {1, ..., `}. The regression algorithm (e.g. SVR or

RF) is applied to the proper training set, and using the

decision rule a strangeness measure is associated with

every example in the calibration set. In particular, we

use the standard regression nonconformity measure

= |y

− f (x

)| = |y

− ˆy

|,i ∈ C (2)

where f : X → Y , is a prediction rule of the model

trained on the proper training set Z

, and ˆy is the

estimated output. To denote the dependency on the

proper training set we write the decision rule as

f (Z

,.). Let us denote by α

(1)

,...,α

(|C|)

the sequence

of all α

corresponding to the calibration set sorted in

the ascending order. Let x

new

(following the same dis-

tribution P) be the new object we want to predict, and

let ˆy

new

be its estimated label using the same function

f (Z

,.). The Prediction Interval (PI) for the new ob-

ject x

new

is then computed as

( ˆy

new

− α

(s)

, ˆy

new

+ α

(s)

), (3)

where s =

ε(|C| + 1)

, and ε ∈ (0, 1) is a chosen sig-

niﬁcance level, and (1−ε) is the conﬁdence level. We

denote an ICP by a tuple (Z

, f (Z

,.)), which con-

sists of the calibration Z

set and the decision rule

given by its training set, Z

As ICP uses only part of the training examples for

training its underlying algorithm, and part of the ex-

amples for calculating the α-scores, it may result in

lower informational efﬁciency. The Cross Confor-

mal Predictor (CCP) for regression was introduced

in (Papadopoulos, 2015) and is based on a cross-

validation approach which helps overcome the infor-

mational loss of ICP; instead of only using a subset

of the data as a calibration set, data is divided into K

disjoint folds where each fold is treated as the cali-

bration set, and the others as the proper training set.

Other ensembles of conformal predictors for classiﬁ-

cation discussed in (Gauraha and Spjuth, 2018) have

also been effectively applied to regression problems.

Most of these ensembles methods for regression prob-

lems aim to get more informational efﬁcient confor-

mal predictors by combining p-values (or by com-

bining prediction intervals). Since the combined p-

values need not be uniformly distributed, as a result

the ﬁnal models are not guaranteed to be valid, see

(Linusson et al., 2017) for more details. Another re-

lated method is Ensemble Cross Conformal Predic-

tion (ECCP) (Beganovic and Smirnov, 2018) for clas-

siﬁcation settings, that has been shown to be com-

putationally efﬁcient. The synergy conformal predic-

tion for classiﬁcation was introduced in (Gauraha and

Spjuth, 2018) which is an ensemble method that com-

bines monotonic conformity scores, and is capable of

producing valid prediction interval.

3 REGRESSION SYNERGY

CONFORMAL PREDICTION

The objective of this paper is to extend SCP to re-

gression problems, which was originally inspired by

“Synergy of Monotonic Rules” proposed in (Vapnik

and Izmailov, 2016). For classiﬁcation problems, it

has been shown that, by combining monotonic con-

formity scores of the calibration set and the test ex-

amples, SCP is capable of producing valid prediction

regions. As with SCP for classiﬁcation, for regression

problems also one can suggest to combine nonconfor-

mity scores of the calibration set and test examples

computed from multiple trained models on the

Synergy Conformal Prediction for Regression

213

Algorithm 1: Synergy Conformal Predictor for Regression with data partitioning.

Input: training dataset:Z, object to predict:x

new

, a regression algorithm:A, number of partitions: M

Output: Prediction Interval (PI)

Step1: Split the training set into two smaller sets, {Z

} and the calibration set {Z

}. (T,C) is a partition

of {1,...,|Z|}.

Step2: Split the set Z

into M proper training sets, {Z

,m = 1, . . . , M}. (T

,...,T

) is a partition of T .

Step3: For each part Z

, train and construct the rule to generate nonconformity scores.

Step4: Compute the combined nonconformity scores across M models for each example in the

calibration set, α

, for j ∈ C, using eq. 5.

Step5: Compute the averaged estimated value across M regression models for the new object x

new

which results in ˆy

new

Step6: Compute prediction interval, PI using eq 3.

return PI

individual partitions of the proper training set. How-

ever, for regression settings it is difﬁcult to compute

nonconformity score for a test example as the postu-

lated output can be a real number. Therefore, we pro-

pose to combine predicted outputs for test examples

for constructing the prediction intervals using Equa-

tion 3. Synergy conformal prediction for regression is

described in the following.

Consider ` examples, Z = {z

,...,z

}, drawn from

an exchangeable distribution P. Similar to ICP, the

training data is ﬁrst divided into the proper training

set {Z

} and the calibration set {Z

}, where (T,C)

is a partition of {1, ..., `}. Then the proper training

data is further divided into M non-empty disjoint sub-

sets and each subset Z

,m = 1, . . . , M is then used for

training. Here (T

,...,T

) is a partition of T . The M

predictive models trained on the individual partitions

are then used to compute the nonconformity scores

for the calibration set denoted by α

m j

, for j ∈ C and

m = 1, . . . , M. For example,

m j

= |y

− f

)|, (4)

where f

(x) is the prediction rule deﬁned by the pre-

dictive model trained on the m

part of the train-

ing set. The aggregated nonconformity scores across

models are then deﬁned as

∑

m j

. (5)

Let x

new

(follows the same distribution P) be

the object we want to predict, and let ˆy be the av-

eraged estimated value across models. The syn-

ergy conformal predictor corresponding to the tuple

, f

,.),..., f

,.)) is deﬁned as a predic-

tion interval (PI) as given by eq. (3).

The SCP method differs from most of the other

ensemble methods discussed previously in that SCP

combines nonconformity scores rather than combin-

ing the conformal p-values or prediction intervals. In

the following, we discuss the validity property of Re-

gression SCP.

Proposition 1. The synergy conformal predictor for

regression is valid

Proof. For unpartitioned data, when the proper train-

ing set as a whole is used for training, in that case

SCP is exactly ICP and hence valid (Papadopoulos

et al., 2002). For partitioned training data, SCP

can be viewed as a single ICP (hence valid), when

the ensemble of M regression methods is consid-

ered as one function producing the (combined) non-

conformity scores. To illustrate this, let us con-

sider partitioning of the set Z

, into M subsets,

,... , Z

, and let their corresponding decision

rules be f

,.),.. . , f

,.) respectively. De-

ﬁne a new decision rule f (Z

,.) which combines

the estimated values of an example z, f (Z

,z) =

∑

m=1

,z). Then the nonconformity scores

can be computed using the combined estimated val-

ues. The pair {Z

, f (Z

,.)} forms an ICP correspond-

ing to the new decision rule f (Z

,.), hence valid.

4 EXPERIMENTS

We evaluate Regression SCP on ten regression

datasets from UCI machine learning repository (Lich-

man et al., 2013), namely Boston Housing Data

(Housing)Wine Quality (Wine), Parkinson Speech

Dataset with Multiple Types of Sound Recordings

(PD), Combined Cycle Power Plant (PowerPlant),

Energy Efﬁciency (Energy), Concrete Compressive

Strength (Concrete), Electrical Grid Stability Sim-

ulated Data (GridStability), Superconductivty Data

(SuperConduct), Condition Based Maintenance of

Naval Propulsion Plants (CBM) and SkillCraft1 Mas-

ter Table Dataset (Game). Since ICP and CCP, run-

ning on whole datasets, are computationally expen-

sive, we select randomly 2000 data points from larger

datasets. In all our experiments, the data is scaled

for target (or label) and for all features to be within

ICPRAM 2021 - 10th International Conference on Pattern Recognition Applications and Methods

214

Table 1: Datasets from UCI repository that are used in

the evaluation. Train refers to the number of examples in

the training set, Calib refers to the number of examples in

the calibration set, Test refers to the number of objects in

the test set, Features refer to the number of features in the

dataset.

Dataset Train Calib Test Features

Housing 282 122 102 13

Wine 1120 480 400 10

PD 582 250 208 26

PowerPlant 1120 480 400 4

Energy 429 185 154 8

Concrete 576 248 206 8

GridStability 1120 480 400 12

SuperConduct 1120 480 400 81

CBM 1120 480 400 16

Game 1120 480 400 18

the [0,1] range, as feature scaling improves the con-

vergence of the underlying optimization method and

target is scaled for better comparisons.

Four experiments were carried out (the exper-

iments will be explained in greater detail further

down), where each dataset was randomly divided into

a training and test subset of sizes using a 80 : 20 ratio.

The training set was further (randomly) divided into

proper training set and calibration set of sizes using

a 70 : 30 ratio. The speciﬁc breakdowns for training,

calibration and test sets are given in Table 1. Then the

proper training set was randomly partitioned into M

equal parts, and a model was trained on each individ-

ual part. We used three machine learning algorithms:

Support Vector Regression using linear kernel (SVR),

Support Vector Regression using RBF kernel (RBF-

SVR) (implemented in Python with LIBSVM) and

Random Forest (RF) (implemented in Python). The

corresponding hyper-parameters were learned using

5-fold cross-validation and the best value was used to

train the ﬁnal model on the entire dataset. The esti-

mated target values ˆy were computed for each exam-

ple in calibration set and test set, and were averaged

across models. Then the predictions interval (PI) was

computed using the averaged predicted values accord-

ing to Equation 3. The whole process was repeated 10

times.

To assess a conformal predictor we consider valid-

ity and efﬁciency(Vovk et al., 2016; Svensson et al.,

2018). In our experiments, validity is empirically

assessed in terms of calibration plots; the plot of the

percentage of errors against ε ∈ (0,1) being close to

the bisector of the ﬁrst quadrant with some statistical

ﬂuctuations. We use size of the prediction interval as

the measure of efﬁciency; lower width implies better

informational efﬁciency and hence indicates a

‘better’ model.

4.1 Experiment 1: Synergy of

Conformal Prediction using the

Same Machine Learning Algorithm

on Partitioned Data

The objective of this experiment is to compare efﬁ-

ciency of SCP with ICP on partitioned and unparti-

tioned data, using linear (linear SVR) and non-linear

(SVR with RBF kernel) regression methods. Inspired

by the experiments in (Vapnik and Izmailov, 2016)

we partition the proper training set into three equal

partitions in the SCP method, and results for the test

sets are shown in Table 2. To illustrate the quantita-

tive difference between ICP

, ICP and SCP, box plots

for dispersion of efﬁciencies are presented in Figures

1a and 1b for SuperConduct dataset (similar plots are

shown in the Appendix for the remaining datasets).

We also perform this experiment with more partitions

(5,7,9,11,15 and 20) for a couple of datasets selected

randomly, and the results are reported in Table 3.

From Table 2 we observe that ICP on unpartioned

data using non-linear regression method (RBF-SVR)

performs best for each individual measures as ex-

pected, whereas both ICP

linear and non-linear have

somewhat larger intervals (lower efﬁciency) as com-

pared to the corresponding SCPs for most of the

cases, which is more prominent for the non-linear

case. This pattern is repeated for the case when we

have more partitions which is evident from Table 3.

4.2 Experiment 2: Synergy of

Conformal Prediction using

Different Machine Learning

Algorithms on Partitioned Data

The objective of Experiment 2 is similar to Exper-

iment 1, with the difference that multiple machine

learning methods were used for the different parti-

tions. We use the same setup as in the previous exper-

iment with three equal partitions of the proper train-

ing set, but with three different machine learning al-

gorithms: linear SVR, Random Forest (RF) and SVR

using RBF kernel (RBF-SVR) for each partition. The

results are reported in Table 4. To illustrate the quan-

titative difference between ICP

and SCP, box plots

are presented in Figure 1c for Super Conduct dataset.

From Table 4 we observe that all ICP

have lower

efﬁciency as compared to the SCP for most of the

datasets.

Synergy Conformal Prediction for Regression

215

Table 2: Results from Experiment 1, Synergy of Conformal Prediction using the Same Machine Learning Algorithm on

Partitioned Data. Efﬁciency is calculated at conﬁdence level 90% for linear SVR and non-linear SVR with RBF kernel is used

as the underlying machine learning algorithm. ICP

corresponds to the partition with the best efﬁciency, ICP to the efﬁciency

where the whole dataset is used, and SCP to the synergy conformal prediction with regression method.

Linear NonLinear

Dataset ICP

ICP SCP ICP

ICP SCP

Housing 0.316 0.312 0.309 0.282 0.219 0.249

Wine 0.203 0.201 0.204 0.207 0.195 0.199

PD 1.605 1.519 1.465 1.603 1.545 1.441

PowerPlant 0.207 0.208 0.209 0.191 0.19 0.186

Energy 0.265 0.28 0.276 0.255 0.177 0.245

Concrete 0.441 0.446 0.443 0.356 0.275 0.325

GridStability 0.430 0.426 0.422 0.235 0.177 0.209

SuperConduct 0.483 0.466 0.478 0.400 0.351 0.377

CBM 0.380 0.321 0.367 0.022 0.018 0.021

Game 0.534 0.533 0.538 0.535 0.528 0.535

Table 3: Results from Experiment 1, using different parti-

tions where non-linear SVR with RBF kernel is used as the

underlying machine learning algorithm.

Housing # Partitions ICP

ICP SCP

5 0.322 0.219 0.278

7 0.336 0.219 0.286

9 0.382 0.219 0.312

11 0.397 0.219 0.317

15 0.426 0.219 0.336

20 0.428 0.219 0.351

SuperConduct # Partitions ICP

ICP SCP

5 0.430 0.351 0.380

7 0.450 0.351 0.397

9 0.450 0.351 0.410

11 0.488 0.351 0.419

15 0.506 0.351 0.434

20 0.539 0.351 0.452

4.3 Experiment 3: Synergy of

Conformal Prediction using

Different Machine Learning

Algorithms on Unpartitioned Data

The objective of this study is to compare SCP with

ICP and CCP trained on the whole training set, in

order to show that using different machine learning

algorithms for training the same data (proper train-

ing set) in SCP also improves the informational efﬁ-

ciency. In this experiment we use three different ma-

chine learning algorithms: linear SVR, RF and RBF-

SVR; the results are reported in Table 5. The results

of the individual ICPs and SCP are given in columns

2-4 and column 5 respectively. Column 6-8 shows

Table 4: Results from Experiment 2, Synergy of Conformal

Prediction using Different Machine Learning Algorithms

on partitioned Data. Efﬁciency is calculated at conﬁdence

level 90% for ICP

and SCP with synergy of three differ-

ent machine learning algorithms: linear SVR, RF and RBF-

SVR. The SVR-ICP

corresponds to the ICP on ﬁrst par-

tition with linear SVR as underlying algorithm, RF-ICP

to the ICP on second partition with RF as underlying algo-

rithm, RBF-SVR-ICP

to the third partition with SVR with

RBF as underlying algorithm, and SCP to the Synergy Con-

formal Prediction with Regression method.

Dataset SVR-

ICP

RF-

ICP

RBF-

SVR-

ICP

SCP

Housing 0.316 0.294 0.317 0.256

Wine 0.212 0.299 0.216 0.212

PD 1.605 1.600 1.603 1.393

PowerPlant 0.210 0.205 0.191 0.187

Energy 0.276 0.193 0.278 0.220

Concrete 0.441 0.309 0.356 0.317

GridStability 0.432 0.394 0.238 0.299

SuperConduct 0.485 0.424 0.400 0.382

CBM 0.381 0.412 0.022 0.224

Game 0.534 0.567 0.537 0.525

the average informational efﬁciency of Cross Confor-

mal Prediction (CCP) applied on the whole training

set with three fold cross conformal prediction using

linear SVR, RF and RBF-SVR as an underlying ma-

chine learning algorithm respectively. The prediction

intervals for CCP were combined by taking the me-

dians of the lower and upper bounds as suggested

in (Park and Budescu, 2015). It is important to note

that in this experiment the methods ICP (70:30), SCP

(70:30) and CCP (2/3:1/3, due to three folds) get al-

most the same amount of training and calibration data,

ICPRAM 2021 - 10th International Conference on Pattern Recognition Applications and Methods

216

and that SCP and ICP combines results from three dif-

ferent methods, and three folds respectively. To illus-

trate the quantitative difference between the ICP, SCP

and CCP, box plots are presented in Figure 1d for Su-

per Conduct dataset.

From Table 5 it is observed that efﬁciency of SCP

on unpartioned data is higher than ICP and CCP for

more than half of the datasets.

4.4 Experiment 4: Validity of

Conformal Predictors

In this section we study the validity of ICP, SCP and

CCP. We use the same setup as in Experiment 1, using

Random Forest (RF) with 10 trees as the underlying

machine learning algorithm. We also train three fold

CCP using RF with 10 trees. Calibration plots are

available in Figure 2, indicating that all models show

little deviation from validity.

5 DISCUSSIONS

The aim of this paper was to develop SCP for re-

gression and to explore its performance in different

settings. In Experiment 1, we considered partitioned

proper-training data with the same machine learning

algorithm applied on each partition. As with SCP for

classiﬁcation, SCP for regression also succeeded in

combining the models (see Table 2) and for the linear

case achieved an efﬁciency that was on par with the

individual best ICP

, whereas in the non-linear case

using SVR with RBF kernel SCP showed improved

efﬁciency (lower median prediction interval width)

for all datasets as compared with ICP

. The differ-

ence between ICP trained on the entire dataset and

the SCP regression methodology is not large in abso-

lute numbers for the NonLinear case (Table 2), but we

observe a decrease the efﬁciency of SCP with a larger

number of partitions which is expected since the indi-

vidual sub-models are then trained on less data. Nev-

ertheless, SCP outperformed the best individual par-

tition ICP

in all cases. In Experiment 2, we con-

sidered partitioned proper training data with different

machine learning algorithms applied on each parti-

tion. Table 4 shows results from the evaluation, and

we note that in 7 out of 10 datasets, SCP has the low-

est (best) efﬁciency compared to the other methods

trained on individual partitions. We here argue that

using SCP for integrating predictions using different

machine learning models in most cases is superior

than relying on an individual prediction methodology.

This would resemble a real-world use case when dif-

ferent organizations have implemented different mod-

eling methods, are not able to share data, but are able

to share nonconfomity measures to make up an ag-

gregated prediction. In Experiment 3, we considered

unpartitioned proper training data with different ma-

chine learning algorithms, hence all individual parties

have access to the entire dataset but use different un-

derlying machine learning methods. Table 5 shows

that for 6 out of 10 datasets, SCP has improved efﬁ-

ciency compared to the individual methods using ICP.

Further, the results comparing SCP with CCP show

that SCP integrating different machine learning meth-

ods is in all cases superior than using a three-fold CCP

with a single machine learning method. In Experi-

ment 4, results show that all method indicate valid

models (close to the bisector of the ﬁrst quadrant),

with a small deviation observed for CCP. The attrac-

tive property for ICP to produce valid models while

aggregating results over multiple models constitutes

a clear improvement over previous aggregation meth-

ods for conformal predictors, such as reported in (Li-

nusson et al., 2017; Carlsson et al., 2014b).

The experiments show in general that SCP has

attractive properties for aggregating prediction inter-

vals, evaluated on a set of benchmark datasets using

different forms of data partitioning. While SCP do not

live up to producing as efﬁcient models as for pooling

all data in one location and training an ICP, in almost

all cases it is advantageous compared to the best indi-

vidual model trained on partitioned data. This impli-

cates that SCP has good applicability in distributed or

federated settings, where data is located in different

locations and cannot be pooled due to privacy, regula-

tory, or practical reasons. We also envision that SCP

could be a foundation upon which implementations

can train individual partition in parallel, such as in

locality-aware Big Data frameworks. The main draw-

back of SCP is that it requires a calibration set that

is shared between individual partitions. This is likely

not a big concern when partitioning data, but it does

constitute an additional step when working with data

sources that are distributed.

6 CONCLUSIONS AND FUTURE

DIRECTIONS

We introduced Synergy Conformal Prediction for

regression and evaluated its validity and informa-

tional efﬁciency using various underlying machine

learning algorithms. The key outcome is that SCP

offers an alternative to pooling distributed data when

using inductive conformal predictors, with reduced

training time as models can be trained in parallel and

predictions aggregated, while still preserving validity.

Synergy Conformal Prediction for Regression

217

Table 5: Results from Experiment 3, Synergy of Conformal Prediction using Different Machine Learning Algorithms on

Unpartitioned Data. Efﬁciency is calculated at conﬁdence level 90% for ICP, SCP and CCP on unpartitioned data using three

different machine learning algorithms (linear SVR, RF and RBF-SVR). Columns 2-4 shows results from ICP, the ﬁfth column

shows results for SCP for an ensemble of different machine learning algorithms , and columns 6-8 shows results for three fold

CCPs.

Dataset SVR-ICP RF-ICP RBF-ICP SCP SVR-CCP RF-CCP RBF-CCP

Housing 0.312 0.233 0.219 0.216 0.316 0.241 0.224

Wine 0.201 0.258 0.195 0.194 0.209 0.256 0.199

PD 1.519 1.400 1.545 1.436 1.522 1.461 1.582

PowerPlant 0.208 0.196 0.190 0.180 0.205 0.195 0.186

Energy 0.28 0.190 0.177 0.168 0.294 0.188 0.184

Concrete 0.446 0.238 0.275 0.267 0.446 0.233 0.271

GridStability 0.426 0.315 0.177 0.251 0.419 0.324 0.180

SuperConduct 0.466 0.340 0.351 0.340 0.463 0.344 0.353

CBM 0.321 0.224 0.018 0.153 0.321 0.228 0.019

Game 0.533 0.533 0.528 0.516 0.534 0.533 0.531

(a) Linear SVR, partitioned data (b) RBF-SVR, partitioned data

(d) Unpartitioned data

Figure 1: Results from Experiment 1, 2 and 3 comparing the efﬁciency between different methods for the SuperConduct

dataset. (a and b): Experiment 1, comparison of individual ICPs trained on data partitions, ICP on the entire dataset, and

SCP where (a) linear SVR is used, and (b) RBF SVR is used as the underlying machine learning algorithm; (c): Experiment

2, comparing individual ICPs on partitions and SCP where three different algorithms (linear SVR, RF and RBF-SVR ) are

used on partitioned training data; (d): Experiment 3 comparing ICP, SCP and CCP where three methods (linear SVR, RF and

RBF-SVR ) are used on unpartitioned training data.

(a) ICP (b) SCP (c) CCP

Figure 2: Illustration of validity of ICP, SCP and CCP for Super Conduct dataset using random forest (with 10 trees) for

training all the underlying models.

ICPRAM 2021 - 10th International Conference on Pattern Recognition Applications and Methods

218

The efﬁciency of SCP does not reach the level of

ICP trained on pooled data, but efﬁciency is lower

(lower median prediction interval width) compared to

the predictions made on the best individual partition,

indicating attractive properties in distributed and fed-

erated settings as a valid conﬁdence predictor. Future

directions when working on partitioned data include

(i) studying the effect of the number and size of data

partitions as well as overlapping partitions (ii) eval-

uating the effect of different nonconformity scores

and different underlying machine learning algorithms

with individual partitions.

REFERENCES

Beganovic, D. and Smirnov, E. (2018). Ensemble

cross-conformal prediction. In Proceedings of the

2018 IEEE International Conference on Data Min-

ing Workshops, ICDM Workshops, Singapore, Singa-

pore, November 17-20, 2018, pages 870–877, United

States. IEEE.

Carlsson, L., Eklund, M., and Norinder, U. (2014a). Ag-

gregated conformal prediction. In IFIP International

Conference on Artiﬁcial Intelligence Applications and

Innovations, pages 231–240. Springer.

Carlsson, L., Eklund, M., and Norinder, U. (2014b). Aggre-

gated conformal prediction. In Iliadis, L., Maglogian-

nis, I., Papadopoulos, H., Sioutas, S., and Makris, C.,

editors, Artiﬁcial Intelligence Applications and Inno-

vations, pages 231–240. Springer Berlin Heidelberg.

Elish, M. C. and danah boyd (2018). Situating methods in

the magic of big data and ai. Communication Mono-

graphs, 85(1):57–80.

Gandomi, A. and Haider, M. (2015). Beyond the hype: Big

data concepts, methods, and analytics. International

Journal of Information Management, 35(2):137–144.

Gauraha, N. and Spjuth, O. (2018). Synergy conformal pre-

diction. UU.

Hashem, I. A. T., Yaqoob, I., Anuar, N. B., Mokhtar, S.,

Gani, A., and Khan, S. U. (2015). The rise of “big

data” on cloud computing: Review and open research

issues. Information systems, 47:98–115.

Lichman, M. et al. (2013). Uci machine learning repository.

Linusson, H., Norinder, U., Bostr

om, H., Johansson, U.,

and L

ofstr

om, T. (2017). On the calibration of ag-

gregated conformal predictors. In Gammerman, A.,

Vovk, V., Luo, Z., and Papadopoulos, H., editors, Pro-

ceedings of the Sixth Workshop on Conformal and

Probabilistic Prediction and Applications, volume 60

of Proceedings of Machine Learning Research, pages

154–173, Stockholm, Sweden. PMLR.

Meng, X., Bradley, J., Yavuz, B., Sparks, E., Venkataraman,

S., Liu, D., Freeman, J., Tsai, D., Amde, M., Owen,

S., et al. (2016). Mllib: Machine learning in apache

spark. The Journal of Machine Learning Research,

17(1):1235–1241.

Papadopoulos, H. (2008). Inductive conformal prediction:

Theory and application to neural networks. In Tools

in artiﬁcial intelligence, pages 315–329. InTech.

Papadopoulos, H. (2015). Cross-conformal prediction with

ridge regression. In International Symposium on Sta-

tistical Learning and Data Sciences, pages 260–270.

Springer.

Papadopoulos, H., Proedrou, K., Vovk, V., and Gammer-

man, A. (2002). Inductive conﬁdence machines for re-

gression. In European Conference on Machine Learn-

ing, pages 345–356. Springer.

Park, S. and Budescu, D. V. (2015). Aggregating multiple

probability intervals to improve calibration. Judgment

and Decision Making, 10(2):130.

Peteiro-Barral, D. and Guijarro-Berdi

nas, B. (2013). A

survey of methods for distributed machine learning.

Progress in Artiﬁcial Intelligence, 2(1):1–11.

Stephens, Z. D., Lee, S. Y., Faghri, F., Campbell, R. H.,

Zhai, C., Efron, M. J., Iyer, R., Schatz, M. C., Sinha,

S., and Robinson, G. E. (2015). Big data: astronomi-

cal or genomical? PLoS biology, 13(7):e1002195.

Svensson, F., Aniceto, N., Norinder, U., Cortes-Ciriano, I.,

Spjuth, O., Carlsson, L., and Bender, A. (2018). Con-

formal regression for quantitative structure–activity

relationship modeling—quantifying prediction uncer-

tainty. Journal of Chemical Information and Model-

ing, 58(5):1132–1140.

Vapnik, V. and Izmailov, R. (2016). Synergy of monotonic

rules. The Journal of Machine Learning Research,

17(1):4722–4754.

Vovk, V. (2013). Transductive conformal predictors. In

IFIP International Conference on Artiﬁcial Intelli-

gence Applications and Innovations, pages 348–360.

Springer.

Vovk, V. (2015). Cross-conformal predictors. Annals of

Mathematics and Artiﬁcial Intelligence, 74(1-2):9–28.

Vovk, V., Fedorova, V., Nouretdinov, I., and Gammerman,

A. (2016). Criteria of efﬁciency for conformal predic-

tion. In Symposium on Conformal and Probabilistic

Prediction with Applications, pages 23–39. Springer.

Vovk, V., Gammerman, A., and Shafer, G. (2005). Algo-

rithmic learning in a random world. Springer Science

& Business Media.

Zhou, L., Pan, S., Wang, J., and Vasilakos, A. V. (2017).

Machine learning on big data: Opportunities and chal-

lenges. Neurocomputing, 237:350 – 361.

Synergy Conformal Prediction for Regression

219