Controlling the Cost of Prediction in using a Cascade of Reject Classiﬁers

for Personalized Medicine

Blaise Hanczar

and Avner Bar-Hen

IBISC, IBGBI University of Evry, 23 bd de France, 91034 Evry, France

MAP5, University Paris Descartes, 45 rue des saint-peres 75006 Paris, France

Keywords:

Supervised Learning, Reject Option, Cascade Classiﬁer, Genomics Data, Personalized Medecine.

Abstract:

The supervised learning in bioinformatics is a major tool to diagnose a disease, to identify the best therapeutic

strategy or to establish a prognostic. The main objective in classiﬁer construction is to maximize the accuracy

in order to obtain a reliable prediction system. However, a second objective is to minimize the cost of the use

of the classiﬁer on new patients. Despite the control of the classiﬁcation cost is high important in the medical

domain, it has been very little studied. We point out that some patients are easy to predict, only a small subset

of medical variables are needed to obtain a reliable prediction. The prediction of these patients can be cheaper

than the others patient. Based on this idea, we propose a cascade approach that decreases the classiﬁcation cost

of the basic classiﬁers without dropping their accuracy. Our cascade system is a sequence of classiﬁers with

rejects option of increasing cost. At each stage, a classiﬁer receives all patients rejected by the last classiﬁer,

makes a prediction of the patient and rejects to the next classiﬁer the patients with low conﬁdence prediction.

The performances of our methods are evaluated on four real medical problems.

1 INTRODUCTION

The personalized medicine is an ongoing revolution

in medicine; its objective is to maximize the wellness

for each individual rather than simply to treat disease.

According to Hood and Friend (Hood and Friend,

2011), this revolution is based on several points. The

ﬁrst one is to consider that medicine is an informa-

tion science. The second point is the emergence of

technologies that will let us explore new dimensions

of patient data space, like the ”omics” technologies.

The last point is the development of powerful new

mathematical and computational methods, specially

in machine learning, that will let us analysis the marge

amount of data associated with each individual. To-

day physicians have access to a large amount data for

each patient from different sources: clinical, environ-

mental, psychological, biologic or omic. The use of

automated methods is indispensable to analyse and

extract relevant information from these data. An im-

portant way of research is the development of predic-

tion systems whose objectives generally are to diag-

nose a disease, to identify the best therapeutic strat-

egy or to establish a prognostic for a patient. These

systems, called classiﬁers, are constructed from su-

pervised learning methods, the most popular are the

discriminant analysis (Dudoit et al., 2002), the sup-

port vector machine (Furey et al., 2000), the random

forest (Diaz-Uriarte and Alvarez de Andres, 2006),

the neural networks (Khan et al., 2001) or the en-

semble methods (Yang Pengyi; Hwa Yang Yee; Bing

B. Zhou;, 2010).

The primordial objectiveof the prediction systems

is to maximize their accuracy in order to obtain reli-

able predictions. However, a second objective, gen-

erally ignored in research studies, is to minimize the

cost of the prediction. In a classiﬁer, a patient is rep-

resented by a set of variables. These variables come

from different medical exams and each of these ex-

ams has a cost. The use of a classiﬁer requires the

values of all variables of the patient; the cost of the

prediction is the sum of the costs of all exams used

by the classiﬁer. Note that the cost does not neces-

sary represent money, it may also represent time, sec-

ondary effects of treatment or any other non-inﬁnite

resource. In practice, a good prediction system has to

both maximize its accuracy and minimize its cost.

In this paper we propose a new method that re-

duce the prediction cost without increasing the error

rate. In prediction problems, it is worth to note that

some patients are easier to predict than others and

do not need all medical exams. For these patients,

a reliable prediction can be done with a small sub-

Hanczar, B. and Bar-Hen, A.

Controlling the Cost of Prediction in using a Cascade of Reject Classiﬁers for Personalized Medicine.

DOI: 10.5220/0005685500420050

In Proceedings of the 9th International Joint Conference on Biomedical Engineering Systems and Technologies (BIOSTEC 2016) - Volume 3: BIOINFORMATICS, pages 42-50

ISBN: 978-989-758-170-0

set of variables and can be therefore less expensive.

Based on this observation, we propose a new super-

vised classiﬁcation approach using a cascade of clas-

siﬁers with reject option. This cascade is a sequential

set of classiﬁers with reject option of increasing cost.

The patient data are submitted to the ﬁrst classiﬁer

that makes a prediction. If this prediction is judged

not reliable, the patient is rejected to the next classi-

ﬁer of the cascade needing additional variables. The

process is repeated until a reliable prediction has been

done. This approach allows reducing the cost of the

basic classiﬁer using all variables. In this approach,

there is a trade-off between the accuracy and the cost

of the predictions. The two main scientiﬁc keys of

our method are the computation of the rejection area

of all classiﬁers of the cascade. The second is to ﬁnd

the optimal order of the variables that form the struc-

ture of the cascade. The section two gives the state

of the art of cost minimization methods and cascade

classiﬁcation. In section three, we provide the formu-

lation of the classiﬁcation with rejection option and

the cascade. The two algorithms for the computation

of the rejection areas and the order of the variables are

given in detail. The section four presents the results

on four real datasets and analysis the performance of

our method.

2 RELATED WORK

The reduction prediction cost problem is close to the

active feature acquisition problem in the cost sensitive

learning (Saar-Tsechansky et al., 2009). The objec-

tive is sequentially decided if we want to acquire the

next feature in order to increase the accuracy of the

classiﬁer. Markov decision process is one of the usual

approach used in this context, for example, Kapoor

(Kapoor and Horvitz, 2009) propose a new class of

policies inspired from active learning. Tan propose

an attribute value acquisition algorithm driven by the

expected cost saving of acquisition only for support

vector machine (Tan and yen Kan, 2010). Nan devel-

oped a variant of random forest dealing with the cost

of the variables (Nan et al., 2015).

One simple structure for incorporating the cost

into learning is through a cascade of classiﬁers. This

approach has been popularized by Viola and Jones

(Viola and Jones, 2004) with their detection cascade

used in image analysis for object detection. Cheap

variables are used to discard examples belonging to

the negative class. This type of method is focused on

unbalanced data with very few positive examples and

a large number of negative examples. Note that the

main objective of this Viola’s cascade is to increase

the accuracy of the prediction, they do not deal with

the prediction cost. In the context of information re-

trieval, Wang adapted the cascades to ranking and in-

corporated variables costs but retained the underlying

greedy paradigm (Wang et al., 2011). Raykar et al.

(Raykar et al., 2010) explore the idea of a cascade of

reject classiﬁer. Their version is a soft cascade where

each stage accepts or rejects examples according to a

probability distribution induced by the previous stage.

Each stage of the cascade is limited to linear clas-

siﬁers, but they are learned jointly and take in ac-

count of the cost of the variables. Trapeznikov and

Saligrama (Trapeznikov and Saligrama, 2013) pro-

pose a multi-stage multi-class system where the reject

decision at each stage is posed as a supervised binary

classiﬁcation problem. They derive bound for VC di-

mension to quantify the generalization error.

A common limitation of all these methods is that

the order of the variables is supposed to be known.

The structure of the cascade is therefore ﬁxed. Our

method overcomes these limitations in using a heuris-

tic to compute an order of the variables.

3 CASCADE OF REJECT

CLASSIFIERS FORMULATION

3.1 Formulation of the Problem

We consider a classiﬁcation problem with two classes

(positive ”1” and negative ”0”) with D variables

, ...,v

}. Let’s a training set of N examples T =

{(x

, y

), ..., (x

, y

)} where x

∈ R

is the variable

vector and y

∈ {0, 1} is the label. We denote c

the

cost for acquiring the i-th variable of an example.

Let’s Ψ : R

→ {0, 1} the basic classiﬁer constructed

from a usual supervised learning procedure and mak-

ing predictions in using all variables. Our objective is

to construct a cascade that obains better performances

than the basic classiﬁer.

In this context, the performance of a classiﬁer is

measured by two values: its error rate i.e. the prob-

ability that the prediction does not correspond to the

true label, noted E = p(Ψ(x) 6= y) and its cost that

is the total acquisition cost of all variables required

by the classiﬁer, noted C =

∑

i=1

. These values are

combined into a new value called loss, that represent

the total performance of the classiﬁer and is deﬁned

by:

L = C + ΛE (1)

Λ is a parameter that represented the penalty of a mis-

classiﬁcation. In our cascade, this parameter controls

the trade-off between the cost and the error rate. For

Controlling the Cost of Prediction in using a Cascade of Reject Classiﬁers for Personalized Medicine

Figure 1: Distribution of the classes on the classiﬁer out-

put. TP, TN, FP, FN and R represent respectively the true

positive, true negative, false positive, false negative and re-

jection area.

the basic classiﬁer, the cost C is constant since we al-

ways have to pay all variables. The objective is to

construct a cascade with a loss lower than the loss of

the basic classiﬁer.

3.2 Classiﬁer with Rejection Option

The base element of our cascade system is the clas-

siﬁer with reject option. This type of classiﬁer can

reject examples if it is not enough conﬁdent in the

predictions. No class is assigned to rejected exam-

ples. Let’s Ψ a classiﬁer whose the output ω(x) is a

continuous value. In ﬁxing a threshold t on this out-

put, we deﬁne a classic classiﬁer that assigns one of

the two classes to each example. In ﬁxing two thresh-

olds {t

}, we deﬁne a classiﬁer that rejects some

examples and assigns one of the two classes to the

non-rejected examples.

Ψ(x) =







0 if ω(x) ≤ t

1 if ω(x) ≥ t

R if t

< ω(x) < t

(2)

with the constraint t

≤ t

. R represents the rejec-

tion of the example x. Figure 1 shows the distribution

of the two classes on the classiﬁer output. The two

thresholds t

and t

divide the classiﬁer output into

three decision regions ({Ψ(x) = 1, Ψ(x) = 0, Ψ(x) =

R}). The performance of the classiﬁer depends on

the following values: the error rate E = p(Ψ(x) 6=

y, Ψ(x) 6= R) (represented by the FP and FN areas in

the ﬁgure 1), the penalty of an error λ

, the accuracy

A = p(Ψ(x) = y) (represented by the TP and TN ar-

eas), the penalty of a good classiﬁcation λ

, the rejec-

tion rate R = p(Ψ(x) = R) (represented by the R area)

and the penalty of a rejection λ

. Note that we have

A+ R+ E = 1. The performance of a reject classiﬁer

is measured by its expected loss:

L(Ψ) = λ

A+ λ

E + λ

R (3)

The objective is to ﬁnd the thresholds t

and t

minimizing the expected loss of the classiﬁer. For that

we use the Chow’s rule (Chow, 1970) that consider

a Bayesian scenario where the output of the classi-

ﬁers is the posterior probability of the positive class

ω(x) = p(1|x). Let’s the three loss functions L

, L

and L

that represent the expected loss that is obtained

in assigning an example x to respectively the class 1,

0 or R.

(x) = λ

ω(x) + λ

(1− ω(x)))

(x) = λ

ω(x) + λ

(1− ω(x))

(x) = λ

(4)

From these formulas we can compute directly the

optimal decision thresholds in solving the equations

(x)

= 1 ⇒ t

∗

− λ

(x)

= 1 ⇒ t

∗

− λ

(5)

3.3 Cascade of Reject Classiﬁers

Our cascade system is a sequence of D classiﬁers with

reject option Ψ

, ..., Ψ

of increasing cost, illustrated

by the ﬁgure 2. The i-th classiﬁer Ψ

receives all ex-

amples rejected by the classiﬁer Ψ

i−1

, makes predic-

tions and sends all rejected examples to Ψ

i+1

. The last

classiﬁer Ψ

has no reject option and makes a predic-

tion for all received examples. The ﬁrst classiﬁer Ψ

receives all examples. For the moment, we consider

that the order of the variables is ﬁxed, the classiﬁer Ψ

uses only the i ﬁrst features so its cost is

∑

j=1

. For

each classiﬁer Ψ

, its error rate E

, accuracy A

and

rejection rate R

are computed as :

= p(Ψ

(x) 6= y, Ψ

(x) 6= R|Ψ

(x) = R ∀ j ∈ [1, i− 1])

= p(Ψ

(x) = y|Ψ

(x) = R ∀ j ∈ [1, i− 1])

= p(Ψ

(x) = R|Ψ

(x) = R ∀ j ∈ [1, i− 1])

(6)

From these formulas, we can deﬁne the loss L

each classiﬁer of the cascade by a weighted combi-

nation of their error rate, accuracy and rejection rate.

The weight of a good classiﬁcation is the cost of the

used variables, the weight of an error is the cost of

BIOINFORMATICS 2016 - 7th International Conference on Bioinformatics Models, Methods and Algorithms

Figure 2: Cascade of D reject classiﬁers.

the used variables plus the penalty of misclassiﬁca-

tion. When an example is rejected, it is sent to the

next classiﬁer so the weight of rejection is the loss of

the next classiﬁer L

i+1

. The loss of an entire cascade

L can be computed recursively by:

L = L

= A

∑

j=1

+ E

(

∑

j=1

+ Λ) + R

i+1

= A

∑

j=1

+ E

(

∑

j=1

+ Λ)

(7)

The optimization of the cascade consist of ﬁnd-

ing the optimal rejection areas of each classiﬁer that

minimize the loss of the cascade. For each classi-

ﬁer of the cascade, the rejection area i.e. the thresh-

olds t

and t

, can be computed in using the Chow’s

rule. For the classiﬁer Ψ

the penalty of a good clas-

siﬁcation is λ

∑

j=1

, the penalty of an error is

∑

j=1

+ Λ and the penalty of a rejection is

= L

i+1

. In using the formulas (5) we obtain the

optimal rejection area of the classiﬁer Ψ

∗

0,(i)

i+1

−

∑

j=1

∗

1,(i)

∑

j=1

+ Λ − L

i+1

(8)

Unfortunately, we can not simply use these formu-

las on each classiﬁer to obtain the optimal cascade.

The problem is that the classiﬁers and their perfor-

mances are depending each other. When a new re-

jection area of a classiﬁer is computed, the sets of

examples rejected to the next classiﬁers change, the

performances of the next classiﬁers and their penal-

ties of rejection change too. A new rejection area

has, therefore, to be computed. All rejection areas,

performances and penalties of all classiﬁers are circu-

larly dependent. To solve this optimization problem

we propose a heuristic described in the algorithm 1.

The cascade is initialized as the basic classiﬁer i.e.

all classiﬁers reject all examples and all examples are

sent to the last classiﬁer using all variables. The iter-

ative procedure contains three steps. The ﬁrst one is

to compute the accuracy, error rate and rejection rate

of all classiﬁers. Then the penalties of rejection of

all classiﬁers (excepted the last one) are computed in

using the formula (7). The penalty of rejection de-

pends on the performances of the next classiﬁer, the

penalties are therefore computed from the classiﬁer

D−1

to the classiﬁer Ψ

. Finally, the two rejection

thresholds are computed for each classiﬁer from the

penalties of good classiﬁcation, misclassiﬁcation, and

rejection. This procedure is iterated MaxIter times,

MaxIter is a parameter to be chosen by the users. In

the results section, we investigate empirically the im-

pact of this parameter and select MaxIter = 10.

3.4 Order of the Variables

In the previous section, we considered that the order

of the variables in the cascade was ﬁxed, but in real

case the variable order is rarely known. The perfor-

mance of the cascade depend highly on the order, we

want the most informative and less expensive vari-

ables at the beginning and the less informative and

most expensive at the end. The usefulness of a vari-

able is not correlated to its cost and is depending on

the variable selected in the previous classiﬁers. For

these reasons, it is not easy to compute the quality of

the variables and determine their position in the cas-

cade. One solution is to test all orders and select the

one that produce the best cascade. However, there

are D! possible orders, this method is intractable for

D > 10. We propose a heuristic, in the algorithm 2,

that selects an order of the variable. The heuristic be-

gins with an empty set of variables and selects one by

one each variable. At each iteration i, i− 1 variables

have been already selected and are used to construct

a cascade of size i− 1. All non-selected variables are

tested to form the i-th stage of the cascade. We select

the variable that minimizes the loss of the cascade.

The procedure is iterated until all variables have been

selected.

Controlling the Cost of Prediction in using a Cascade of Reject Classiﬁers for Personalized Medicine

Algorithm 1: Optimization algorithm of the reject areas.

1: procedure REJECT AREAS OPTIMIZATION

2: // Initialization

3: for i from 1 to D− 1 do

4: t

0,(i)

← 0; t

1,(i)

← 1

5: end for

6: t

0,(D)

← 0.5; t

1,(D)

← 0.5

7: for nbiter from 1 to MaxIter do

8: // Computation of the reject classiﬁers

performances

9: L ← newL

10: for i from 1 to D do

11: A

(i)

← accuracy ofΨ

12: E

(i)

← error rate ofΨ

13: R

(i)

← rejection rate ofΨ

14: end for

15: // Computation of the rejection costs

16: λ

← 0

17: for i from D− 1 to 1 do

18: λ

R,(i)

← A

(i+1)

∑

i+1

j=1

(i+1)

(

∑

i+1

j=1

+ Λ) + R

(i+1)

R,(i+1)

19: end for

20: // Computation of the thresholds

21: for i from 1 to D− 1 do

22: (t

0,(i)

1,(i)

) computed from the cost

∑

j=1

, λ

∑

j=1

+ Λ and λ

= λ

R,(i)

23: end for

24: end for

25: return (t

0,(i)

1,(i)

) ∀i ∈ [1, D− 1]

26: end procedure

Algorithm 2: Selection of the variables order.

1: procedure REJECT AREAS OPTIMIZATION

2: V ← {v

, ..., v

}

3: Order ←

4: for j from 1 to D do

5: best.L ← Λ

6: for i from 1 to D− j + 1 do

7: Tested.Order ← concat(Order,V[i])

8: Construction of the cascade from

Tested.Order

9: L ← loss of the cascade

10: if L < best.L then

11: best.L ← L, best.V ← V[i]

12: end if

13: end for

14: Order ← Order + best.V

15: V ← V − best.V

16: end for

17: return Order

18: end procedure

4 EXPERIMENTS AND RESULTS

4.1 Study Design and Datasets

We perform a set of experiments to investigate the

performance of our cascade method. For these ex-

periments, we use several real medical and genomic

datasets. The ﬁrst one is the pima dataset (Smith et al.,

1988) whose the objective is to predict signs of dia-

betes of 768 patients based on eight clinical variables.

We select this dataset because the costs of variables

are provided, this information is very rare in the pub-

lic datasets. The second one is the Wisconsin Diag-

nostic Breast Cancer (wdbc) datasets whose the ob-

jective is to differentiate the malignant tumors from

benign tumors of 569 patients based on 30 medical

variables. Since the costs of variables were no avail-

able, we randomly draw from a uniform distribution

U[0, 1] the cost of the variables. The third one is the

lung cancer dataset (Bhattacharjee, 2001) whose the

objective is to identify the adenocarcinoma from the

other type of tumor based on the several thousand of

gene expression. The last one is the prostate cancer

dataset (Singh et al., 2002) whose the objective is to

diagnosis cancer from safe tissues based on the sev-

eral thousand of gene expression of 339 patients. For

the two last datasets, since all gene expressions have

been measured simultaneously with microarrays, the

costs of all variables are equal. For all datasets, we

normalize the costs of the variables such that the sum

of all costs is 1. The basic classiﬁer using all variables

pays, therefore, one for each example.

We tested our method with two classiﬁcation al-

gorithms: the linear discriminant analysis (LDA) and

the support vector machine (SVM) with a radial ker-

nel. For high dimensional datasets, like the lung and

prostate cancer dataset, a feature selection step is in-

cluded in the classiﬁcation in order to reduce the num-

ber of variables. We have used a ﬁlter method based

on the t-test score to select the best variables. Note

that the variable selection is performed in the classi-

ﬁer construction in order to avoid any selection bias

(Ambroise and McLachlan, 2002).

The objective of our method is to reduce the clas-

siﬁcation cost and have a lower loss than the basic

classiﬁer. We, therefore, compare the performances

of our cascade of the performance of the basic classi-

ﬁers. One of the key points of the cascade construc-

tion is the selection of the variables order for which

we have proposed a heuristic (algo 2). In order to

show the usefulness of our heuristic, we compare our

method to the performance of cost based order cas-

cade. In the cost-based order cascade, the variables

are ordered by their increasing cost. The cascade be-

BIOINFORMATICS 2016 - 7th International Conference on Bioinformatics Models, Methods and Algorithms

Figure 3: Loss of the cascade during the computation of the

rejection areas.

gins with the cheapest variables and ﬁnishes with the

more expensive.

4.2 Sensitivity Analysis

Our method depends on two parameters: the number

of iterations in the heuristic of rejection areas compu-

tation MAXITER and the penalty of an error Λ. We

investigate the impact of these two parameters on the

behavior of our method.

The ﬁgure 3 shows the loss of the cascade during

the heuristic of the rejection areas computation for the

four datasets with the LDA classiﬁer. The loss values

have been normalized such that all curves are plotted

in the same graphics, we set the loss of the cascade at

its initialization to 1. The ﬁgure shows that the cas-

cade convergesquickly toward a stable solution for all

datasets. Moreover, we see that the loss of the solu-

tion is much lower than the loss of initialization. This

last point shows that our heuristic provides good re-

jection areas for the cascade. According to these re-

sults, we choose to set MAXITER = 10, this value is

enough to reach a stable solution and limits the com-

putation time of cascade learning.

The ﬁgures 4, 5 and 6 gives respectively the error

rate vs λ, the cost of the cascade vs Λ and the cost

vs the error rate with the pima dataset and the LDA

classiﬁer. We do not have the space to put the graph-

ics of the other datasets and classiﬁers, but they are

similar to these ﬁgures and lead to the same conclu-

sions. The dot represents the basic classiﬁer, the trian-

gle line is the cost based order cascade and the cross

line is the heuristic based order cascade. Λ is increas-

ing with the cost of the cascade and decreasing with

its error rate. Λ controls the trade-off between the er-

ror rate and the variable cost. For a low value of Λ, the

misclassiﬁcations are more tolerated, fewer variables

are therefore needed, but the error rate increases. At

the extreme, Λ ≤ 2 in these ﬁgures, the cascade keeps

:basic classiﬁer, △:cost based, +:heuristic based

Figure 4: Error rate of the cascade in function on Λ.

:basic classiﬁer, △:cost based, +:heuristic based

Figure 5: Cost of the cascade in function on Λ.

:basic classiﬁer, △:cost based, +:heuristic based

Figure 6: Error of the cascade in function on prediction cost.

The performances are presented by a set of points because

their are depending on the value of the parameter Λ.

only the ﬁrst variable for all examples. For a high

value of Λ, the misclassiﬁcations are very penalized,

the cascade needs more variables in order to get more

Controlling the Cost of Prediction in using a Cascade of Reject Classiﬁers for Personalized Medicine

pima wdbc

lung cancer prostate cancer

Figure 7: Classiﬁcation cost vs Error rate plots for all datasets with linear discriminant analysis. The performances are

presented by a set of points because their are depending on the value of the parameter Λ.

information and minimize the risk of error. We see

that the error rate of the cascade is never lower than

the error rate of the basic classiﬁer. That is logic since

the basic classiﬁer uses all information, i.e. all vari-

ables for all examples. The error rate of the cascade

can be only higher or equal than the error of the ba-

sic classiﬁer. Note that there is always a value of Λ

where the error rate of the cascade reaches the error

rate of the basic classiﬁer. In the ﬁgure 4, it is Λ = 20

for heuristic based order cascade and Λ > 20 for cost

based order cascade. This point is interesting because

it corresponds to a cascade that does not decrease the

accuracy of the classiﬁer. Let focus on the behavior

of the cost based order cascade (triangle curve) in the

loss ﬁgure. For low values of Λ, the loss of cost based

order cascade is the same than heuristic based order

cascade, for high values of Λ it reaches the loss of ba-

sic classiﬁer. The reason is that the cost based order

cascade favors cheap variables, for low values of Λ

the cost of the cascade is more important than its er-

ror rate, the cost based order cascade is therefore well

adapted.

4.3 Classiﬁcation Results

The ﬁgures 7 and 8 show the classiﬁcation cost vs

error rate plot for all datasets with the linear discrim-

inant analysis. In these graphics, the closest a point

is from the left bottom corner, the better the perfor-

mance is. The dot represents the performance of the

basic classiﬁer, its cost is 1 by deﬁnition and its er-

ror rate is represented by the dotted line. The trian-

gles and the crosses represent respectively the perfor-

mance of the cost based order cascade and the heuris-

tic based order cascade. The performances are pre-

sented by a set of points because their are depending

on the value of the parameter Λ. In all graphics, we

see that the cascade can decreases strongly the cost of

the classiﬁcation. With the same accuracy as the ba-

sic classiﬁer, we can reduce the cost by 85% for pima

datasets, 86% for wdbc dataset, 70% for the lung can-

cer dataset and 74% for prostate cancer datatset with

the LDA classiﬁer and by 74% for pima datasets, 73%

for wdbc dataset, 90% for the lung cancer dataset and

80% for prostate cancer datatset with the SVM classi-

BIOINFORMATICS 2016 - 7th International Conference on Bioinformatics Models, Methods and Algorithms

pima wdbc

lung cancer prostate cancer

Figure 8: Classiﬁcation cost vs Error rate plots for all datasets with support vector machine. The performances are presented

by a set of points because their are depending on the value of the parameter Λ.

ﬁer. This cost can again be decreased, if we accept to

increase the error rate. We also see that in all graphics

that the triangles clearly dominate the crosses. This

means that the heuristic based order cascade outper-

forms the cost based order cascade.

5 CONCLUSIONS

The cascade methods are very promising for person-

alized medicine since the prediction system and its

cost are adapted to each patient. There are some prob-

lems. The ﬁrst one is the problem of high dimension

data like the omics data. If the number of variables is

very high (several thousand and more) the heuristic of

order selection is computationally intractable. In the

current work, we deal with this problem in making a

classic variable selection step before the construction

of the cascade. A more efﬁcient solution would be to

perform the selection in taking account of the variable

cost and during the construction of the cascade. The

second question is the problem of cost based prob-

lems. In the current work the costs are unique and

ﬁxed for each variable. In another context they may

have several costs, for example, the medical exam to

obtain some variables may have a cost in money, du-

ration and a risk of secondary effect. All these costs

impact the performance of the cascade. They may

also have interactions between the costs. The cost of

a variable v

can decrease if a variable v

has already

been measured. We will study these new interesting

problems on future works.

REFERENCES

Ambroise, C. and McLachlan, G. (2002). Selection bias in

gene extraction on the basis of microarray gene ex-

pression data. Proc. Natl. Acad. Sci., 99(10):6562–

6566.

Bhattacharjee, A. (2001). Classiﬁcation of human

Controlling the Cost of Prediction in using a Cascade of Reject Classiﬁers for Personalized Medicine

lung carcinomas by mRNA expression proﬁling re-

veals distinct adenocarcinoma subclasses. PNAS,

98(24):13790–5.

Chow, C. (1970). On optimum recognition error and reject

tradeoff. IEEE Transactions on Information Theory,

16(1):41–46.

Diaz-Uriarte, R. and Alvarez de Andres, S. (2006). Gene

selection and classiﬁcation of microarray data using

random forest. BMC Bioinformatics, 7(3).

Dudoit, S., Fridlyand, J., and Speed, P. (2002). Compari-

son of discrimination methods for classiﬁcation of tu-

mors using gene expression data. Journal of American

Statististial Association, 97:77–87.

Furey, T., Cristianini, N., Duffy, N., Bednarski, D., Schum-

mer, M., and Haussler, D. (2000). Support vector

machine classiﬁcation and validation of cancer tissue

samples using microarray expression data. Bioinfor-

matics, 16(10):906–914.

Hood, L. and Friend, S. H. (2011). Predictive, personalized,

preventive, participatory (p4) cancer medicine. Nat

Rev Clin Oncol, 8(3):184–187.

Kapoor, A. and Horvitz, E. (2009). Breaking boundaries:

Active information acquisition across learning and di-

agnosis. Advances in neural information processing

systems.

Khan, J., Wei, J., Ringner, M., Saal, L., Ladanyi, M., West-

ermann, F., Berthold, F., Schwarb, M., Antonescu, C.,

Peterson, C., and Meltzer, P. (2001). Classiﬁcation

and diagnostic prediction of cancers using gene ex-

pression proﬁling and artiﬁcial neural networks. Na-

ture Medecine, 7:673–679.

Nan, F., Wang, J., and Saligrama, V. (2015). Feature-

budgeted random forest. International Conference on

Machine Learning.

Raykar, V. C., Krishnapuram, B., and Yu, S. (2010). De-

signing efﬁcient cascaded classiﬁers: tradeoff be-

tween accuracy and cost. In Proceedings of the 16th

ACM SIGKDD international conference on Knowl-

edge discovery and data mining, pages 853–860.

ACM.

Saar-Tsechansky, M., Melville, P., and Provost, F. (2009).

Active feature-value acquisition. Management Sci-

ence, 55(4):664–684.

Singh, D., Febbo, P., Ross, K., Jackson, D., Manola, J., and

Ladd, C. (2002). Gene expression correlates of clini-

cal prostate cancer behavior. Cancer Cell., 1(2):203–

209.

Smith, J. W., Everhart, J., Dickson, W., Knowler, W., and

Johannes, R. (1988). Using the adap learning algo-

rithm to forecast the onset of diabetes mellitus. In Pro-

ceedings of the Annual Symposium on Computer Ap-

plication in Medical Care, page 261. American Med-

ical Informatics Association.

Tan, Y. F. and yen Kan, M. (2010). Cost-sensitive attribute

value acquisition for support vector machines. Tech-

nical report, National University of Singapore.

Trapeznikov, K. and Saligrama, V. (2013). Supervised se-

quential classiﬁcation under budget constraints. In

Proceedings of the Sixteenth International Conference

on Artiﬁcial Intelligence and Statistics, pages 581–

589.

Viola, P. and Jones, M. J. (2004). Robust real-time face

detection. International journal of computer vision,

57(2):137–154.

Wang, L., Lin, J., and Metzler, D. (2011). A cascade

ranking model for efﬁcient ranked retrieval. In Pro-

ceedings of the 34th International ACM SIGIR Con-

ference on Research and Development in Information

Retrieval, SIGIR ’11, pages 105–114, New York, NY,

USA. ACM.

Yang Pengyi; Hwa Yang Yee; Bing B. Zhou;, B. B. Z.

(2010). A review of ensemble methods in bioinfor-

matics. Current Bioinformatics, 5(4):296.

BIOINFORMATICS 2016 - 7th International Conference on Bioinformatics Models, Methods and Algorithms