Decomposable Probability-of-Success Metrics in Algorithmic Search

Tyler Sam

1 a

, Jake Williams

1 b

, Abel Tadesse

2 c

, Huey Sun

3 d

and George Monta

nez

1 e

Harvey Mudd College, California, U.S.A.

Claremont McKenna College, California, U.S.A.

Pomona College, California, U.S.A.

Keywords:

Decomposable Probability-of-Success Metric, Machine Learning as Search, Algorithmic Search Framework.

Abstract:

Prior work in machine learning has used a speciﬁc success metric, the expected per-query probability of

success, to prove impossibility results within the algorithmic search framework. However, this success metric

prevents us from applying these results to speciﬁc subﬁelds of machine learning, e.g. transfer learning. We

deﬁne decomposable metrics as a category of success metrics for search problems which can be expressed as

a linear operation on a probability distribution to solve this issue. Using an arbitrary decomposable metric to

measure the success of a search, we demonstrate theorems which bound success in various ways, generalizing

several existing results in the literature.

1 INTRODUCTION

Analyzing the success of a machine learning algo-

rithm on speciﬁc problems is often very difﬁcult given

all the different variables in each problem. One so-

lution is to reduce machine learning to search since

many machine learning tasks, such as classiﬁcation,

regression, and clustering, can be reduced to search

problems (Monta

nez, 2017b). Through this reduc-

tion, one can apply concepts from information theory

to derive impossibility results about machine learn-

ing. For example, any speciﬁc machine learning al-

gorithm can only do well on a small subset of all pos-

sible problems. To compare the success of different

algorithms, or the expected probability of ﬁnding a

desired element, Monta

nez deﬁned a metric of suc-

cess that averaged the probability of success over all

iterations of an algorithm (Monta

nez, 2017b). While

this metric has many applications, it is not appropriate

for cases where the probability of success for a given

iteration of an algorithm is required. An example of

this is transfer learning, where the probability of suc-

cess at the ﬁnal step of the algorithm is more relevant

than the average probability of success.

Building on this work, we deﬁne decomposability

https://orcid.org/0000-0001-7974-3226

https://orcid.org/0000-0001-9714-1851

https://orcid.org/0000-0002-3337-9454

https://orcid.org/0000-0002-0949-3169

https://orcid.org/0000-0002-1333-4611

as a property of probability-of-success metrics and

show that the expected per-query probability of suc-

cess (Monta

nez, 2017b) and more general probability

of success metrics are decomposable. We then show

that the results previously proven for the expected per-

query probability of success hold for all decompos-

able probability-of-success metrics. Under this gener-

alization, we can prove results related to the probabil-

ity of success for speciﬁc iterations of a search rather

than just uniformly averaged over the entire search,

giving the results much broader applicability.

2 RELATED WORK

Several decades ago, Mitchell proposed that clas-

siﬁcation could be viewed as search, and reduced

the problem of learning generalizations to a search

problem within a hypothesis space (Mitchell, 1980;

Mitchell, 1982). Monta

nez subsequently expanded

this idea into a formal search framework (Monta

nez,

2017b).

Monta

nez showed that for a given algorithm with

a ﬁxed information resource, favorable target sets, or

the target sets on which the algorithm would perform

better than uniform random sampling, are rare. He

did this by proving that the proportion of b-bit favor-

able problems has an exponentially decaying restric-

tive bound (Monta

nez, 2017a). He further showed

that this scarcity of favorable problems exists even for

Sam, T., Williams, J., Tadesse, A., Sun, H. and Montañez, G.

Decomposable Probability-of-Success Metrics in Algorithmic Search.

DOI: 10.5220/0009098807850792

In Proceedings of the 12th International Conference on Agents and Artiﬁcial Intelligence (ICAART 2020) - Volume 2, pages 785-792

ISBN: 978-989-758-395-7; ISSN: 2184-433X

785

small k-sized target sets.

Monta

nez et al. later deﬁned bias, the degree to

which an algorithm is predisposed to a ﬁxed target,

with respect to the expected per-query probability of

success metric, and proved that there were a limited

number of favorable information resources for a given

bias (Monta

nez et al., 2019). Using the search frame-

work, they proved that an algorithm cannot be favor-

ably biased towards many distinct targets simultane-

ously.

As machine learning grew in prominence, re-

searchers began to probe what was possible within

machine learning. Valiant considered learnability of

a task as the ability to generate a program for per-

forming the task without explicit programming of the

task (Valiant, 1984). By restricting the tasks to a

speciﬁc context, Valiant demonstrated a set of tasks

which were provably learnable.

Schaffer provided an early foundation to the

idea of bounding universal performance of an algo-

rithm (Schaffer, 1994). Schaffer analyzed general-

ization performance, the ability of a learner to clas-

sify objects outside of its training set, in a classiﬁ-

cation task. Using a baseline of uniform sampling

from the classiﬁers, he showed that, over the set of

all learning situations, a learner’s generalization per-

formance sums to zero, which makes generalization

performance a conserved quantity.

Wolpert and Macready demonstrated that the his-

torical performance of a deterministic optimization

algorithm provides no a priori justiﬁcation whatso-

ever for its continued use over any other alternative

going forward (Wolpert and Macready, 1997), imply-

ing that there is no utility in rationally choosing a

thus-far better algorithm over choosing the opposite.

Furthermore, just as there does not exist a single al-

gorithm that performs better than random on all possi-

ble optimization problems, they proved that there also

does not exist an optimization problem on which all

algorithms perform better than average.

Continuing the application of prior knowledge

to learning and optimization, G

ulc¸ehre and Bengio

showed that the worse-than-chance performance of

certain machine learning algorithms can be improved

through learning with hints, namely, guidance using

a curriculum (G

ulc¸ehre and Bengio, 2016). So, while

Wolpert’s results might make certain tasks seem futile

and infeasible, G

ulc¸ehre’s empirical results show that

there exist some alternate means through which we

can use prior knowledge to attain better results in both

learning and optimization. Dembski and Marks mea-

sured the contributions of such prior knowledge us-

ing active information (Dembski and Marks II, 2009)

and proved the difﬁculty of ﬁnding a good search al-

gorithm for a ﬁxed problem (Dembski and Marks II,

2010), through their concept of a search for a search

(S4S). Eventually, their work expanded into a formal

general theory of search, characterizing the informa-

tion costs associated with success (Dembski et al.,

2013), which served as an inspiration for later devel-

opments in machine learning (Monta

nez, 2017b).

Others have worked towards meaningful bounds

on algorithmic success through different approaches.

Sterkenburg approached this concept from the per-

spective of Putnam, who originally claimed that a uni-

versal learning machine is impossible through the use

of a diagonalization argument (Sterkenburg, 2019).

Sterkenburg follows up on this claim, attempting to

ﬁnd a universal inductive rule by exploring a measure

which cannot be diagonalized. Even when attempting

to evade Putnam’s original diagonalization, Sterken-

burg is able to apply a new diagonalization that rein-

forces Putnam’s original claim of the impossibility of

a universal learning machine.

There has also been work on proving learn-

ing bounds for speciﬁc problems. Kumagai and

Kanamori analyzed the theoretical bounds of param-

eter transfer algorithms and self-taught learning (Ku-

magai and Kanamori, 2019). By looking at the local

stability, or the degree to which a feature is affected

by shifting parameters, they developed a deﬁnition for

parameter transfer learnability, which describes the

probability of effective transfer.

2.1 Distinctions from Prior Work

The expected per-query probability of success metric

previously deﬁned in the algorithmic search frame-

work (Monta

nez, 2017b) tells us, for a given infor-

mation resource, algorithm, and target set, how often

(in expectation) our algorithm will successfully locate

elements of the target set. While this metric is useful

when making general claims about the performance

of an algorithm or the favorability of an algorithm

and information resource to the target set, it lacks

the speciﬁcity to make claims about similar perfor-

mance and favorability on a per-iteration basis. This

trade-off calls for a more general metric that can be

used to make both general and speciﬁc (per iteration)

claims. For instance, in transfer learning tasks, the

performance and favorability of the last pre-transfer

iteration is more relevant than the overall expected

per-query probability of success. The general proba-

bility of success, which we will deﬁne as a particular

decomposable probability-of-success metric, is a tool

through which we can make claims at speciﬁc and rel-

evant steps.

ICAART 2020 - 12th International Conference on Agents and Artiﬁcial Intelligence

786

3 BACKGROUND

In this section, we will present deﬁnitions for the main

framework that we will use throughout this paper.

3.1 The Search Framework

Monta

nez describes a framework which formalizes

search problems in order to analyze search and learn-

ing algorithms (Monta

nez, 2017a). To that end, he

casts various machine learning algorithms, such as

regression, classiﬁcation, and clustering, into this

search framework. This ML-as-search framework is

valuable because it provides a structure to understand

and reason about different machine learning problems

within the same formalism. For example, we can un-

derstand regression as a search through a space of

possible regression functions, and parameter estima-

tion as a search through possible vectors for a black-

box process (Monta

nez, 2017b). Therefore, we can

apply results about search to any machine learning

problem we can cast into the search framework. Fur-

thermore, this framework lets us more easily analyze

the necessary factors for success in a machine learn-

ing problem by viewing it as a search problem.

There are three components to a search problem.

The ﬁrst is the ﬁnite discrete search space, Ω, which

is the set of elements to be examined. (Finiteness and

discreteness follows from ﬁnite precision numerical

representation, so the loss of generality is not great.)

Next is the target set, T , which is a nonempty sub-

set of the search space that we are trying to ﬁnd.

Finally, we have an external information resource,

F, which provides an evaluation of elements of the

search space. Typically, there is a tight relationship

between the target set and the external information

resource, as the resource is expected to lead to or de-

scribe the target set in some way, such as the target set

being elements which meet a certain threshold under

the external information resource.

Within the framework, we have an iterative algo-

rithm which searches for elements of the target set,

shown in Figure 1. The algorithm is a black-box that

has access to a search history and produces a proba-

bility distribution over the search space. At each step,

the algorithm samples over the search space using the

probability distribution, evaluates that element using

the information resource, adds the result to the search

history, and determines the next probability distribu-

tion through its own internal rules and logic. The ab-

straction of ﬁnding the next probability distribution as

a black-box algorithm allows the search framework to

work with all types of search problems.

BLACK-BOX

ALGORITHM

HISTORY

ω₀, F(ω₀)

ω₃, F(ω₃)

ω₈, F(ω₈)

ω₅, F(ω₅)

ω₂, F(ω₂)

i − 6

i − 5

i − 4

i − 3

i − 2

i − 1

ω₆, F(ω₆)

CHOOSE NEXT POINT AT TIME STEP i

ω, F(ω)

Figure 1: Black-box search algorithm. We iteratively popu-

late the history with samples from a distribution that is de-

termined by the black-box at each iteration, using the his-

tory (Monta

nez, 2017b).

3.2 Expected Per-query Probability of

Success

In order to compare search algorithms, Monta

nez de-

ﬁned the expected per-query probability of success,

q(t, f ) = E

P,H





∑

i=1

(w ∈ t)







= P(X ∈ t| f )

(3.1)

where

P is the sequence of probability distributions

generated by the black box, H is the search history,

and t and f are the target set and information re-

source of the search problem, respectively (Monta

nez,

2017a). This metric of success is particularly useful

because it can be shown that q(t, f ) = t

, where

is the average of the vector representation of the

probability distribution from the search algorithm at

each step, conditioned on an information resource f .

Measuring success using the expected per-

query probability of success, Monta

nez demon-

strated bounds on the success of any search algo-

rithm (Monta

nez, 2017a). The Famine of Forte states

that for a given algorithm, the proportion of target

set-information resource pairs yielding a success level

above a given threshold is inversely related to the

threshold. Thus, the greater the threshold for success,

the fewer problems you can be successful on, regard-

less of the algorithm. The expected per-query proba-

bility of success can also be used to prove a version

of the No Free Lunch theorem, demonstrating that all

algorithms perform the same averaged over all target

sets and information resources, as is done in Theo-

rem 1 of the current manuscript.

3.3 Bias

Using the search framework, Monta

nez deﬁned a

measure of bias between a distribution over infor-

mation resources and a ﬁxed target (Monta

nez et al.,

Decomposable Probability-of-Success Metrics in Algorithmic Search

787

2019). For a distribution D over a collection of possi-

ble information resources F , with F ∼ D, and a ﬁxed

k-hot

target t, the bias between the distribution and

the target is deﬁned as

Bias(D, t) = E

] −

|Ω|

(3.2)

= t

] −

ktk

|Ω|

(3.3)

= t

D( f )d f −

ktk

|Ω|

. (3.4)

Recall from above that P

was the averaged probabil-

ity distribution over Ω from a search.

The bias term measures the performance of an al-

gorithm in expectation (over a given distribution of in-

formation resources) compared to uniform sampling.

Mathematically, this is computed by taking the dif-

ference between the expected value of the average

performance of an algorithm and the performance of

uniform sampling. The distribution D captures what

information resources (e.g., datasets) one is likely to

encounter.

For a non-mathematical example of the effect of

bias, suppose we are searching for parking space

within a parking lot. If we randomly choose parking

spaces to check, we are searching without bias. How-

ever, if we consider the location of the parking spaces,

we may ﬁnd that parking spaces furthest from the en-

trance are usually free, and could ﬁnd an open parking

space with a higher probability. Here, the information

resource telling us the distance of each parking space

from the entrance and our belief that parking spaces

further from the entrance tend to be open creates a dis-

tribution over possible parking spaces, favoring those

that are further away for being checked ﬁrst.

4 PRELIMINARIES

In this section, we introduce a new property of success

metrics called decomposability, which allows us to

generalize concepts of success and bias. We provide

a number of prelimimary lemmata, with full proofs

given in the Appendix (available online (Sam et al.,

2020)).

4.1 Decomposability

We now give a formal deﬁnition for a decompos-

able probability-of-success metric, which will be used

throughout the rest of the paper.

k-hot vectors are binary vectors of length |Ω| with ex-

actly k ones.

Deﬁnition 4.1. A probability-of-success metric φ is

decomposable if and only if there exists a P

φ, f

such

that

φ(t, f ) = t

φ, f

= P

(X ∈ t| f ), (4.1)

where P

φ, f

is not a function of t, being conditionally

independent of it given f .

As we stated previously, what makes the expected

per-query probability of success particularly useful

is that it can be represented as a linear function of

a probability distribution. This deﬁnition allows us

to reference any probability-of-success metric having

this property.

As a ﬁrst example, we show that the expected

per-query probability of success is a decomposable

probability-of-success metric.

Lemma 4.2 (Decomposability of the Expected Per–

Query Probability of Success). The expected per-

query probability of success is decomposable, namely,

q(t, f ) = t

. (4.2)

Our goal is to show that the theorems proved for the

expected per-query probability of success hold for all

decomposable metrics. Showing that the expected

per-query probability of success is decomposable sug-

gests that these theorems may be generalizable to any

metrics sharing that property.

4.1.1 The General Probability of Success

While the expected per-query probability of success

averages the probability of success over each of the

queries in a search history, we may care more about

a speciﬁc query in the search history, e.g., the ﬁnal

query of a sequence. Thus, we can generalize the ex-

pected per-query probability of success by replacing

the averaging with an arbitrary distribution α over the

probability distributions in the search history. We de-

ﬁne the General Probability of Success as

(t, f ) = E

P,H





∑

i=1

(w ∈ t)







= P

(X ∈ t| f )

(4.3)

where P

is a valid probability distribution on the

search space Ω and α

is the weight allocated to the

ith probability distribution in our sequence. This for-

mula allows us to consider a wide variety of success

metrics as being instances of the general probability

of success metric. For example, the expected per-

query probability of success is equivalent to setting

α, f

= P

, with α

= 1/|

P|. Similarly, a metric of

success which only cares about the ﬁnal query can

be represented by letting P

α, f

= P

n, f

where n is the

length of the sequence of queries, and P

n, f

is the av-

erage of the distributions from the nth iteration of our

search.

ICAART 2020 - 12th International Conference on Agents and Artiﬁcial Intelligence

788

It should be noted that α within the expectation

will be random, being deﬁned over the random num-

ber of steps within

P. Our operative deﬁnition of the

α distribution, however, will allow us to generate the

corresponding distribution for the needed number of

steps, such as when we place all mass on the nth it-

eration of the search. With a slight abuse of notation,

we thus let α signify both the process by which the

distribution is generated as well as the particular dis-

tribution produced for a given number of steps.

As the general probability of success provides

a layer of abstraction above the expected per-query

probability of success, if we prove that results about

the expected per-query probability of success also

hold for the general probability of success, we gain

a more powerful tool set. To do so, we must ﬁrst

demonstrate that the general probability of success is

a decomposable probability-of-success metric.

Lemma 4.3 (Decomposability of the General Proba-

bility of Success Metric). The general probability of

success is decomposable, namely,

(t, f ) = t

α, f

. (4.4)

These lemmata allow us to apply later theorems about

decomposable metrics to these two useful metrics.

Given a metric of interest, performing a similar proof

of decomposability will allow for the application of

the subsequent theorems.

Lemma 4.4 (Decomposability closed under expec-

tation). Given a set S = {φ

} of decomposable

probability-of-success metrics and a distribution D

over S, it holds that

(t, f ) = E

[φ(t, f )] (4.5)

is also a decomposable probability-of-success metric.

Lemma 4.4 gives us an easy way to construct a new

decomposable metric from a set of known decompos-

able metrics. Note that not every success metric is

decomposable; we can create non-decomposable suc-

cess metrics by taking non-convex combinations of

decomposable probability-of-success metrics.

4.2 Generalization of Bias

Our deﬁnition of decomposability allows us to re-

deﬁne bias in terms of any decomposable metric,

φ(T, F). We replace P

with P

φ,F

and obtain

Bias

(D, t) = E

φ,F

] −

|Ω|

. (4.6)

= t

φ,F

] −

ktk

|Ω|

(4.7)

= t

φ, f

D( f )d f −

ktk

|Ω|

. (4.8)

Because φ(t, f ) is decomposable, it is equal to t

φ, f

This makes results about the bias particularly inter-

esting, since they relate directly to any probability-

of-success metric we create, so long as the metric is

decomposable.

5 RESULTS

Monta

nez proved a number of results and bounds on

the success of machine learning algorithms relative to

the expected per-query probability of success, along

with its corresponding deﬁnition of bias (Monta

nez,

2017b; Monta

nez et al., 2019). We now generalize

these to apply to any decomposable probability-of-

success metric, with full proofs given in the Appendix

(available online (Sam et al., 2020)).

5.1 No Free Lunch for Search

First, we prove a version of the No Free Lunch The-

orems for any decomposable probability-of-success

metric within the search framework.

Theorem 1 (No Free Lunch for Search and Ma-

chine Learning). For any pair of search/learning al-

gorithms A

, A

operating on discrete ﬁnite search

space Ω, any closed under permutation set of target

sets τ, any set of information resources B, and de-

composable probability-of-success metric φ,

∑

t∈τ

∑

f ∈B

(t, f ) =

∑

t∈τ

∑

f ∈B

(t, f ). (5.1)

This means that performance, in terms of our decom-

posable probability-of-success metric, is conserved in

the sense that increased performance of one algorithm

over another on some information resource-target pair

comes at the cost of a loss in performance elsewhere.

5.2 The Fraction of Favorable Targets

Monta

nez proved that for a ﬁxed information re-

source, a given algorithm A will perform favor-

ably relative to uniform random sampling on only a

few target sets, under the expected per-query prob-

ability of success (Monta

nez, 2017b). We gener-

alize this result with a decomposable probability-

of-success metric and deﬁne a version of active in-

formation of expectations for decomposable metrics

φ(t, f )

:= − log

φ(t, f )

. This transforms the ratio of

success probabilities into bits where p = |t|/|Ω|, the

per-query probability of success for uniform random

sampling with replacement. I

φ(t, f )

denotes the advan-

tage A has over uniform random sampling with re-

placement, in bits.

Decomposable Probability-of-Success Metrics in Algorithmic Search

789

Theorem 2 (The Fraction of Favorable Targets). Let

τ = {t | t ⊆ Ω}, τ

= {t |

0 6= t ⊆ Ω,I

φ(t, f )

≥ b} for

decomposable probability-of-success metric φ. Then

for b ≥ 3,

|τ

|τ|

≤ 2

−b

. (5.2)

Thus, the scarcity of b-bit favorable targets still holds

under for any decomposable probability-of-success

metric.

5.3 The Famine of Favorable Targets

Following up on the previous result, we can show a

similar bound in terms of the success of a given algo-

rithm, for targets of a ﬁxed size.

Theorem 3 (The Famine of Favorable Targets). For

ﬁxed k ∈ N, ﬁxed information resource f , and decom-

posable probability-of-success metric φ, deﬁne

τ = {T | T ⊆ Ω,|T | = k}, and

min

= {T | T ⊆ Ω,|T | = k,φ(T, f ) ≥ q

min

Then,

|τ

min

|τ|

≤

min

(5.3)

where p =

|Ω|

Here, we compare success not against uniform

sampling but against a ﬁxed constant q

min

. This the-

orem thus upper bounds the number of targets for

which the probability of success of the search is

greater than q

min

5.4 Famine of Forte

We generalize the Famine of Forte (Monta

nez,

2017b), showing a bound that holds in the k-sparse

case using any decomposable probability-of-success

metric.

Theorem 4 (The Famine of Forte). Deﬁne

= {T | T ⊆ Ω,|T | = k ∈ N}

and let B

denote any set of binary strings, such that

the strings are of length m or less. Let

R = {(T, F) | T ∈ τ

,F ∈ B

}, and

min

= {(T, F) | T ∈ τ

,F ∈ B

,φ(T, F) ≥ q

min

where φ(T,F) is the decomposable probability-of-

success metric for algorithm A on problem (Ω,T, F).

Then for any m ∈ N,

min

|R|

≤

min

. (5.4)

This demonstrates that for any decomposable met-

ric there is an upper bound on the proportion of prob-

lems an algorithm is successful on. Here, we measure

success as being above a certain threshold with re-

spect to a decomposable metric, and the upper bound

is inversely related to this threshold.

5.5 Learning Under Dependence

While the previous theorems highlight cases where

an algorithm is unlikely to succeed, we now consider

the conditions that make an algorithm likely to suc-

ceed. To begin, we consider how the target and infor-

mation resource can inﬂuence an algorithm’s success

by generalizing the Learning Under Dependence the-

orem (Monta

nez, 2017a).

Theorem 5 (Learning Under Dependence). Deﬁne

= {T | T ⊆ Ω, |T | = k ∈ N} and let B

denote

any set of binary strings (information resources), such

that the strings are of length m or less. Deﬁne q as

the expected decomposable probability of success un-

der the joint distribution on T ∈ τ

and F ∈ B

for

any ﬁxed algorithm A, such that q := E

T,F

[φ(T, F)],

namely,

q = E

T,F



(ω ∈ T |F)



= Pr(ω ∈ T ; A).

Then,

q ≤

I(T ; F) + D(P

) + 1

Ω

(5.5)

where I

Ω

= −logk/|Ω|, D(P

) is the Kullback-

Liebler divergence between the marginal distribution

on T and the uniform distribution on T , and I(T ; F)

is the mutual information. Alternatively, we can write

Pr(ω ∈ T ; A) ≤

H(U

) − H(T | F) + 1

Ω

(5.6)

where H(U

) = log



|Ω|



The value of q deﬁned here represents the ex-

pected single-query probability of success of an algo-

rithm relative to a randomly selected target and infor-

mation resource, distributed according to some joint

distribution. The probability of success for a single

query (marginalized over information resources) is

equivalent to the expectation of the conditional proba-

bility of success, conditioned on the random informa-

tion resource. Upper-bounding this value states that

regardless of the choice of decomposable probability-

of-success metric, the probability of success depends

on the amount of information regarding the target

contained within the information resource, as mea-

sured by the mutual information.

ICAART 2020 - 12th International Conference on Agents and Artiﬁcial Intelligence

790

5.6 Famine of Favorable Information

Resources

We now demonstrate the effect of the general bias

term deﬁned earlier on the probability of a suc-

cess of an algorithm. We begin with a general-

ization of the Famine of Favorable Information Re-

sources (Monta

nez et al., 2019).

Theorem 6 (Famine of Favorable Information Re-

sources). Let B be a ﬁnite set of information re-

sources and let t ⊆ Ω be an arbitrary ﬁxed k-size tar-

get set with corresponding target function t. Deﬁne

min

= { f | f ∈ B, φ(t, f ) ≥ q

min

where φ(t, f ) is an arbitrary decomposable

probability-of-success metric for algorithm A

on search problem (Ω,t, f ), and q

min

∈ (0,1] repre-

sents the minimally acceptable probability of success.

Then,

min

|B|

≤

p + Bias

(B,t)

min

(5.7)

where p =

|Ω|

This result demonstrates the mathematical effect

of bias, of which we have previously provided one

hypothetical example (car parking). In particular, we

can show that the bias of our expected information re-

sources towards the target will upper bound the prob-

ability of a given information resource leading to a

successful search.

5.7 Futility of Bias-Free Search

We can also use our deﬁnition of bias to generalize

the Futility of Bias-Free Search (Monta

nez, 2017b),

which demonstrates the inability of an algorithm to

perform better than uniform random sampling with-

out bias, deﬁned with respect to the expected per-

query probability of success. Our generalization

proves that the theorem holds for bias deﬁned with

respect to any decomposable probability-of-success

metric.

Theorem 7 (Futility of Bias-Free Search). For any

ﬁxed algorithm A, ﬁxed target t ⊆ Ω with correspond-

ing target function t, and distribution over informa-

tion resources D, if Bias

(D, t) = 0, then

Pr(ω ∈ t; A) = p (5.8)

where Pr(ω ∈ t; A) represents the expected decompos-

able probability of successfully sampling an element

of t using A, marginalized over information resources

F ∼ D, and p is the single-query probability of suc-

cess under uniform random sampling.

This result demonstrates that, regardless of how

we measure the success of an algorithm with respect

to a decomposable metric, it cannot perform better

than uniform random sampling without bias.

5.8 Famine of Favorable Biasing

Distributions

Monta

nez proved that the percentage of minimally fa-

vorable distributions (biased over some threshold to-

wards some speciﬁc target) is inversely proportional

to the threshold value and directly proportional to

the bias between the information resource and target

function (Monta

nez, 2017b). We will show that this

scarcity of favorable biasing distributions holds, in

general, for bias under any decomposable probability-

of-success metric.

Theorem 8 (Famine of Favorable Biasing Distribu-

tions). Given a ﬁxed target function t, a ﬁnite set

of information resources B, a distribution over in-

formation resources D, and a set P = {D | D ∈

|B|

∑

f ∈B

D( f ) = 1} of all discrete |B|-dimensional

simplex vectors,

µ(G

t,q

min

)

µ(P )

≤

p + Bias

(B,t)

min

(5.9)

where G

t,q

min

= {D | D ∈ P ,Bias

(D, t) ≥ q

min

}, p =

Ω

, and µ is Lebesgue measure.

This result shows that the more bias there is be-

tween our set of information resources B and the tar-

get function t, the easier it is to ﬁnd a minimally fa-

vorable distribution, and the higher the threshold for

what qualiﬁes as a minimally favorable distribution,

the harder our search becomes. Thus, unless we want

to suppose that we begin with a set of information

resources already favorable towards our ﬁxed target,

ﬁnding a highly favorable distribution is difﬁcult.

6 CONCLUSION

Casting machine learning problems as search pro-

vides a common formalism within which to prove

bounds and impossibility results for a wide variety

of learning algorithms and tasks. In this paper, we

introduce a property of probability-of-success met-

rics called decomposability, and show that the ex-

pected per-query probability of success and general

probability of success are decomposable. To demon-

strate the value of this property, we prove that a num-

ber of existing algorithmic search framework results

continue to hold for all decomposable probability-

of-success metrics. These results provide a number

Decomposable Probability-of-Success Metrics in Algorithmic Search

791

of useful insights: we show that algorithmic perfor-

mance is conserved with respect to all decomposable

probability-of-success metrics, favorable targets are

scarce no matter your decomposable probability-of-

success metric, and that without the generalized bias

deﬁned here, an algorithm will not perform better than

uniform random sampling.

The goal of this work is to offer additional ma-

chinery within the search framework, allowing for

more general application. To that end, we can de-

velop decomposable probability-of-success metrics

for problems concerned with the state of an algo-

rithm at speciﬁc steps, and leverage existing results

as a foundation for additional insight into those prob-

lems. One application is transfer learning. To do so,

we use a decomposable probability-of-success met-

ric that utilizes only the state of the algorithm at the

last step to represent the information learned from a

source problem.

ACKNOWLEDGEMENTS

This work was supported by the Walter Bradley Cen-

ter for Natural and Artiﬁcial Intelligence. We thank

Dr. Robert J. Marks II (Baylor University) for pro-

viding support and feedback. We also thank Harvey

Mudd College’s Department of Computer Science for

their continued resources and support.

REFERENCES

Dembski, W. A., Ewert, W., and Marks II, R. J. (2013). A

general theory of information cost incurred by suc-

cessful search. In Biological Information: New Per-

spectives, pages 26–63. World Scientiﬁc.

Dembski, W. A. and Marks II, R. J. (2009). Con-

servation of information in search: measuring the

cost of success. IEEE Transactions on Systems,

Man, and Cybernetics-Part A: Systems and Humans,

39(5):1051–1061.

Dembski, W. A. and Marks II, R. J. (2010). The search for a

search: Measuring the information cost of higher level

search. JACIII, 14(5):475–486.

ulc¸ehre, C¸ . and Bengio, Y. (2016). Knowledge matters:

Importance of prior information for optimization. The

Journal of Machine Learning Research, 17(1):226–

257.

Kumagai, W. and Kanamori, T. (2019). Risk bound of

transfer learning using parameric feature mapping and

its application to sparse coding. Machine Learning,

108:1975–2008.

Mitchell, T. M. (1980). The need for biases in learning gen-

eralizations. Technical report, Computer Science De-

partment, Rutgers University, New Brunswick, MA.

Mitchell, T. M. (1982). Generalization as Search. Artiﬁcial

intelligence, 18(2):203–226.

Monta

nez, G. D. (2017a). The Famine of Forte: Few Search

Problems Greatly Favor Your Algorithm. In Systems,

Man, and Cybernetics (SMC), 2017 IEEE Interna-

tional Conference on, pages 477–482. IEEE.

Monta

nez, G. D. (2017b). Why Machine Learning Works.

PhD thesis, Carnegie Mellon University.

Monta

nez, G. D., Hayase, J., Lauw, J., Macias, D., Trikha,

A., and Vendemiatti, J. (2019). The Futility of Bias-

Free Learning and Search. In 32nd Australasian Joint

Conference on Artiﬁcial Intelligence, pages 277–288.

Springer.

Sam, T., Williams, J., Tadesse, A., Sun, H., and Montanez,

G. (2020). Decomposable probability-of-success met-

rics in algorithmic search. CoRR, abs/2001.00742.

Schaffer, C. (1994). A Conservation Law for Generalization

Performance. Machine Learning Proceedings 1994,

1:259–265.

Sterkenburg, T. F. (2019). Putnam’s Diagonal Argument

and the Impossibility of a Universal Learning Ma-

chine. Erkenntnis, 84(3):633–656.

Valiant, L. (1984). A Theory of the Learnable. Communi-

cations of the ACM, 27:1134–1142.

Wolpert, D. H. and Macready, W. G. (1997). No free lunch

theorems for optimization. IEEE Trans. Evolutionary

Computation, 1:67–82.

ICAART 2020 - 12th International Conference on Agents and Artiﬁcial Intelligence

792