Decomposable Probability-of-Success Metrics in Algorithmic Search
Tyler Sam
1 a
, Jake Williams
1 b
, Abel Tadesse
2 c
, Huey Sun
3 d
and George Monta
˜
nez
1 e
1
Harvey Mudd College, California, U.S.A.
2
Claremont McKenna College, California, U.S.A.
3
Pomona College, California, U.S.A.
Keywords:
Decomposable Probability-of-Success Metric, Machine Learning as Search, Algorithmic Search Framework.
Abstract:
Prior work in machine learning has used a specific success metric, the expected per-query probability of
success, to prove impossibility results within the algorithmic search framework. However, this success metric
prevents us from applying these results to specific subfields of machine learning, e.g. transfer learning. We
define decomposable metrics as a category of success metrics for search problems which can be expressed as
a linear operation on a probability distribution to solve this issue. Using an arbitrary decomposable metric to
measure the success of a search, we demonstrate theorems which bound success in various ways, generalizing
several existing results in the literature.
1 INTRODUCTION
Analyzing the success of a machine learning algo-
rithm on specific problems is often very difficult given
all the different variables in each problem. One so-
lution is to reduce machine learning to search since
many machine learning tasks, such as classification,
regression, and clustering, can be reduced to search
problems (Monta
˜
nez, 2017b). Through this reduc-
tion, one can apply concepts from information theory
to derive impossibility results about machine learn-
ing. For example, any specific machine learning al-
gorithm can only do well on a small subset of all pos-
sible problems. To compare the success of different
algorithms, or the expected probability of finding a
desired element, Monta
˜
nez defined a metric of suc-
cess that averaged the probability of success over all
iterations of an algorithm (Monta
˜
nez, 2017b). While
this metric has many applications, it is not appropriate
for cases where the probability of success for a given
iteration of an algorithm is required. An example of
this is transfer learning, where the probability of suc-
cess at the final step of the algorithm is more relevant
than the average probability of success.
Building on this work, we define decomposability
a
https://orcid.org/0000-0001-7974-3226
b
https://orcid.org/0000-0001-9714-1851
c
https://orcid.org/0000-0002-3337-9454
d
https://orcid.org/0000-0002-0949-3169
e
https://orcid.org/0000-0002-1333-4611
as a property of probability-of-success metrics and
show that the expected per-query probability of suc-
cess (Monta
˜
nez, 2017b) and more general probability
of success metrics are decomposable. We then show
that the results previously proven for the expected per-
query probability of success hold for all decompos-
able probability-of-success metrics. Under this gener-
alization, we can prove results related to the probabil-
ity of success for specific iterations of a search rather
than just uniformly averaged over the entire search,
giving the results much broader applicability.
2 RELATED WORK
Several decades ago, Mitchell proposed that clas-
sification could be viewed as search, and reduced
the problem of learning generalizations to a search
problem within a hypothesis space (Mitchell, 1980;
Mitchell, 1982). Monta
˜
nez subsequently expanded
this idea into a formal search framework (Monta
˜
nez,
2017b).
Monta
˜
nez showed that for a given algorithm with
a fixed information resource, favorable target sets, or
the target sets on which the algorithm would perform
better than uniform random sampling, are rare. He
did this by proving that the proportion of b-bit favor-
able problems has an exponentially decaying restric-
tive bound (Monta
˜
nez, 2017a). He further showed
that this scarcity of favorable problems exists even for
Sam, T., Williams, J., Tadesse, A., Sun, H. and Montañez, G.
Decomposable Probability-of-Success Metrics in Algorithmic Search.
DOI: 10.5220/0009098807850792
In Proceedings of the 12th International Conference on Agents and Artificial Intelligence (ICAART 2020) - Volume 2, pages 785-792
ISBN: 978-989-758-395-7; ISSN: 2184-433X
Copyright
c
2022 by SCITEPRESS Science and Technology Publications, Lda. All rights reserved
785
small k-sized target sets.
Monta
˜
nez et al. later defined bias, the degree to
which an algorithm is predisposed to a fixed target,
with respect to the expected per-query probability of
success metric, and proved that there were a limited
number of favorable information resources for a given
bias (Monta
˜
nez et al., 2019). Using the search frame-
work, they proved that an algorithm cannot be favor-
ably biased towards many distinct targets simultane-
ously.
As machine learning grew in prominence, re-
searchers began to probe what was possible within
machine learning. Valiant considered learnability of
a task as the ability to generate a program for per-
forming the task without explicit programming of the
task (Valiant, 1984). By restricting the tasks to a
specific context, Valiant demonstrated a set of tasks
which were provably learnable.
Schaffer provided an early foundation to the
idea of bounding universal performance of an algo-
rithm (Schaffer, 1994). Schaffer analyzed general-
ization performance, the ability of a learner to clas-
sify objects outside of its training set, in a classifi-
cation task. Using a baseline of uniform sampling
from the classifiers, he showed that, over the set of
all learning situations, a learner’s generalization per-
formance sums to zero, which makes generalization
performance a conserved quantity.
Wolpert and Macready demonstrated that the his-
torical performance of a deterministic optimization
algorithm provides no a priori justification whatso-
ever for its continued use over any other alternative
going forward (Wolpert and Macready, 1997), imply-
ing that there is no utility in rationally choosing a
thus-far better algorithm over choosing the opposite.
Furthermore, just as there does not exist a single al-
gorithm that performs better than random on all possi-
ble optimization problems, they proved that there also
does not exist an optimization problem on which all
algorithms perform better than average.
Continuing the application of prior knowledge
to learning and optimization, G
¨
ulc¸ehre and Bengio
showed that the worse-than-chance performance of
certain machine learning algorithms can be improved
through learning with hints, namely, guidance using
a curriculum (G
¨
ulc¸ehre and Bengio, 2016). So, while
Wolpert’s results might make certain tasks seem futile
and infeasible, G
¨
ulc¸ehre’s empirical results show that
there exist some alternate means through which we
can use prior knowledge to attain better results in both
learning and optimization. Dembski and Marks mea-
sured the contributions of such prior knowledge us-
ing active information (Dembski and Marks II, 2009)
and proved the difficulty of finding a good search al-
gorithm for a fixed problem (Dembski and Marks II,
2010), through their concept of a search for a search
(S4S). Eventually, their work expanded into a formal
general theory of search, characterizing the informa-
tion costs associated with success (Dembski et al.,
2013), which served as an inspiration for later devel-
opments in machine learning (Monta
˜
nez, 2017b).
Others have worked towards meaningful bounds
on algorithmic success through different approaches.
Sterkenburg approached this concept from the per-
spective of Putnam, who originally claimed that a uni-
versal learning machine is impossible through the use
of a diagonalization argument (Sterkenburg, 2019).
Sterkenburg follows up on this claim, attempting to
find a universal inductive rule by exploring a measure
which cannot be diagonalized. Even when attempting
to evade Putnam’s original diagonalization, Sterken-
burg is able to apply a new diagonalization that rein-
forces Putnam’s original claim of the impossibility of
a universal learning machine.
There has also been work on proving learn-
ing bounds for specific problems. Kumagai and
Kanamori analyzed the theoretical bounds of param-
eter transfer algorithms and self-taught learning (Ku-
magai and Kanamori, 2019). By looking at the local
stability, or the degree to which a feature is affected
by shifting parameters, they developed a definition for
parameter transfer learnability, which describes the
probability of effective transfer.
2.1 Distinctions from Prior Work
The expected per-query probability of success metric
previously defined in the algorithmic search frame-
work (Monta
˜
nez, 2017b) tells us, for a given infor-
mation resource, algorithm, and target set, how often
(in expectation) our algorithm will successfully locate
elements of the target set. While this metric is useful
when making general claims about the performance
of an algorithm or the favorability of an algorithm
and information resource to the target set, it lacks
the specificity to make claims about similar perfor-
mance and favorability on a per-iteration basis. This
trade-off calls for a more general metric that can be
used to make both general and specific (per iteration)
claims. For instance, in transfer learning tasks, the
performance and favorability of the last pre-transfer
iteration is more relevant than the overall expected
per-query probability of success. The general proba-
bility of success, which we will define as a particular
decomposable probability-of-success metric, is a tool
through which we can make claims at specific and rel-
evant steps.
ICAART 2020 - 12th International Conference on Agents and Artificial Intelligence
786
3 BACKGROUND
In this section, we will present definitions for the main
framework that we will use throughout this paper.
3.1 The Search Framework
Monta
˜
nez describes a framework which formalizes
search problems in order to analyze search and learn-
ing algorithms (Monta
˜
nez, 2017a). To that end, he
casts various machine learning algorithms, such as
regression, classification, and clustering, into this
search framework. This ML-as-search framework is
valuable because it provides a structure to understand
and reason about different machine learning problems
within the same formalism. For example, we can un-
derstand regression as a search through a space of
possible regression functions, and parameter estima-
tion as a search through possible vectors for a black-
box process (Monta
˜
nez, 2017b). Therefore, we can
apply results about search to any machine learning
problem we can cast into the search framework. Fur-
thermore, this framework lets us more easily analyze
the necessary factors for success in a machine learn-
ing problem by viewing it as a search problem.
There are three components to a search problem.
The first is the finite discrete search space, , which
is the set of elements to be examined. (Finiteness and
discreteness follows from finite precision numerical
representation, so the loss of generality is not great.)
Next is the target set, T , which is a nonempty sub-
set of the search space that we are trying to find.
Finally, we have an external information resource,
F, which provides an evaluation of elements of the
search space. Typically, there is a tight relationship
between the target set and the external information
resource, as the resource is expected to lead to or de-
scribe the target set in some way, such as the target set
being elements which meet a certain threshold under
the external information resource.
Within the framework, we have an iterative algo-
rithm which searches for elements of the target set,
shown in Figure 1. The algorithm is a black-box that
has access to a search history and produces a proba-
bility distribution over the search space. At each step,
the algorithm samples over the search space using the
probability distribution, evaluates that element using
the information resource, adds the result to the search
history, and determines the next probability distribu-
tion through its own internal rules and logic. The ab-
straction of finding the next probability distribution as
a black-box algorithm allows the search framework to
work with all types of search problems.
Ω
P
BLACK-BOX
ALGORITHM
HISTORY
ω₀, F(ω₀)
ω₃, F(ω₃)
ω₈, F(ω₈)
ω₅, F(ω₅)
ω₂, F(ω₂)
i
i 6
i 5
i 4
i 3
i 2
i 1
ω₆, F(ω₆)
CHOOSE NEXT POINT AT TIME STEP i
ω, F(ω)
Figure 1: Black-box search algorithm. We iteratively popu-
late the history with samples from a distribution that is de-
termined by the black-box at each iteration, using the his-
tory (Monta
˜
nez, 2017b).
3.2 Expected Per-query Probability of
Success
In order to compare search algorithms, Monta
˜
nez de-
fined the expected per-query probability of success,
q(t, f ) = E
˜
P,H
1
|
˜
P|
|
˜
P|
i=1
P
i
(w t)
f
= P(X t| f )
(3.1)
where
˜
P is the sequence of probability distributions
generated by the black box, H is the search history,
and t and f are the target set and information re-
source of the search problem, respectively (Monta
˜
nez,
2017a). This metric of success is particularly useful
because it can be shown that q(t, f ) = t
>
P
f
, where
P
f
is the average of the vector representation of the
probability distribution from the search algorithm at
each step, conditioned on an information resource f .
Measuring success using the expected per-
query probability of success, Monta
˜
nez demon-
strated bounds on the success of any search algo-
rithm (Monta
˜
nez, 2017a). The Famine of Forte states
that for a given algorithm, the proportion of target
set-information resource pairs yielding a success level
above a given threshold is inversely related to the
threshold. Thus, the greater the threshold for success,
the fewer problems you can be successful on, regard-
less of the algorithm. The expected per-query proba-
bility of success can also be used to prove a version
of the No Free Lunch theorem, demonstrating that all
algorithms perform the same averaged over all target
sets and information resources, as is done in Theo-
rem 1 of the current manuscript.
3.3 Bias
Using the search framework, Monta
˜
nez defined a
measure of bias between a distribution over infor-
mation resources and a fixed target (Monta
˜
nez et al.,
Decomposable Probability-of-Success Metrics in Algorithmic Search
787
2019). For a distribution D over a collection of possi-
ble information resources F , with F D, and a fixed
k-hot
1
target t, the bias between the distribution and
the target is defined as
Bias(D, t) = E
D
[t
>
P
F
]
k
||
(3.2)
= t
>
E
D
[P
F
]
ktk
2
||
(3.3)
= t
>
Z
F
P
f
D( f )d f
ktk
2
||
. (3.4)
Recall from above that P
f
was the averaged probabil-
ity distribution over from a search.
The bias term measures the performance of an al-
gorithm in expectation (over a given distribution of in-
formation resources) compared to uniform sampling.
Mathematically, this is computed by taking the dif-
ference between the expected value of the average
performance of an algorithm and the performance of
uniform sampling. The distribution D captures what
information resources (e.g., datasets) one is likely to
encounter.
For a non-mathematical example of the effect of
bias, suppose we are searching for parking space
within a parking lot. If we randomly choose parking
spaces to check, we are searching without bias. How-
ever, if we consider the location of the parking spaces,
we may find that parking spaces furthest from the en-
trance are usually free, and could find an open parking
space with a higher probability. Here, the information
resource telling us the distance of each parking space
from the entrance and our belief that parking spaces
further from the entrance tend to be open creates a dis-
tribution over possible parking spaces, favoring those
that are further away for being checked first.
4 PRELIMINARIES
In this section, we introduce a new property of success
metrics called decomposability, which allows us to
generalize concepts of success and bias. We provide
a number of prelimimary lemmata, with full proofs
given in the Appendix (available online (Sam et al.,
2020)).
4.1 Decomposability
We now give a formal definition for a decompos-
able probability-of-success metric, which will be used
throughout the rest of the paper.
1
k-hot vectors are binary vectors of length || with ex-
actly k ones.
Definition 4.1. A probability-of-success metric φ is
decomposable if and only if there exists a P
φ, f
such
that
φ(t, f ) = t
>
P
φ, f
= P
φ
(X t| f ), (4.1)
where P
φ, f
is not a function of t, being conditionally
independent of it given f .
As we stated previously, what makes the expected
per-query probability of success particularly useful
is that it can be represented as a linear function of
a probability distribution. This definition allows us
to reference any probability-of-success metric having
this property.
As a first example, we show that the expected
per-query probability of success is a decomposable
probability-of-success metric.
Lemma 4.2 (Decomposability of the Expected Per–
Query Probability of Success). The expected per-
query probability of success is decomposable, namely,
q(t, f ) = t
>
P
f
. (4.2)
Our goal is to show that the theorems proved for the
expected per-query probability of success hold for all
decomposable metrics. Showing that the expected
per-query probability of success is decomposable sug-
gests that these theorems may be generalizable to any
metrics sharing that property.
4.1.1 The General Probability of Success
While the expected per-query probability of success
averages the probability of success over each of the
queries in a search history, we may care more about
a specific query in the search history, e.g., the final
query of a sequence. Thus, we can generalize the ex-
pected per-query probability of success by replacing
the averaging with an arbitrary distribution α over the
probability distributions in the search history. We de-
fine the General Probability of Success as
q
α
(t, f ) = E
˜
P,H
|
˜
P|
i=1
α
i
P
i
(w t)
f
= P
α
(X t| f )
(4.3)
where P
α
is a valid probability distribution on the
search space and α
i
is the weight allocated to the
ith probability distribution in our sequence. This for-
mula allows us to consider a wide variety of success
metrics as being instances of the general probability
of success metric. For example, the expected per-
query probability of success is equivalent to setting
P
α, f
= P
f
, with α
i
= 1/|
˜
P|. Similarly, a metric of
success which only cares about the final query can
be represented by letting P
α, f
= P
n, f
where n is the
length of the sequence of queries, and P
n, f
is the av-
erage of the distributions from the nth iteration of our
search.
ICAART 2020 - 12th International Conference on Agents and Artificial Intelligence
788
It should be noted that α within the expectation
will be random, being defined over the random num-
ber of steps within
˜
P. Our operative definition of the
α distribution, however, will allow us to generate the
corresponding distribution for the needed number of
steps, such as when we place all mass on the nth it-
eration of the search. With a slight abuse of notation,
we thus let α signify both the process by which the
distribution is generated as well as the particular dis-
tribution produced for a given number of steps.
As the general probability of success provides
a layer of abstraction above the expected per-query
probability of success, if we prove that results about
the expected per-query probability of success also
hold for the general probability of success, we gain
a more powerful tool set. To do so, we must first
demonstrate that the general probability of success is
a decomposable probability-of-success metric.
Lemma 4.3 (Decomposability of the General Proba-
bility of Success Metric). The general probability of
success is decomposable, namely,
q
α
(t, f ) = t
>
P
α, f
. (4.4)
These lemmata allow us to apply later theorems about
decomposable metrics to these two useful metrics.
Given a metric of interest, performing a similar proof
of decomposability will allow for the application of
the subsequent theorems.
Lemma 4.4 (Decomposability closed under expec-
tation). Given a set S = {φ
i
} of decomposable
probability-of-success metrics and a distribution D
over S, it holds that
φ
0
(t, f ) = E
D
[φ(t, f )] (4.5)
is also a decomposable probability-of-success metric.
Lemma 4.4 gives us an easy way to construct a new
decomposable metric from a set of known decompos-
able metrics. Note that not every success metric is
decomposable; we can create non-decomposable suc-
cess metrics by taking non-convex combinations of
decomposable probability-of-success metrics.
4.2 Generalization of Bias
Our definition of decomposability allows us to re-
define bias in terms of any decomposable metric,
φ(T, F). We replace P
F
with P
φ,F
and obtain
Bias
φ
(D, t) = E
D
[t
>
P
φ,F
]
k
||
. (4.6)
= t
>
E
D
[P
φ,F
]
ktk
2
||
(4.7)
= t
>
Z
F
P
φ, f
D( f )d f
ktk
2
||
. (4.8)
Because φ(t, f ) is decomposable, it is equal to t
>
P
φ, f
.
This makes results about the bias particularly inter-
esting, since they relate directly to any probability-
of-success metric we create, so long as the metric is
decomposable.
5 RESULTS
Monta
˜
nez proved a number of results and bounds on
the success of machine learning algorithms relative to
the expected per-query probability of success, along
with its corresponding definition of bias (Monta
˜
nez,
2017b; Monta
˜
nez et al., 2019). We now generalize
these to apply to any decomposable probability-of-
success metric, with full proofs given in the Appendix
(available online (Sam et al., 2020)).
5.1 No Free Lunch for Search
First, we prove a version of the No Free Lunch The-
orems for any decomposable probability-of-success
metric within the search framework.
Theorem 1 (No Free Lunch for Search and Ma-
chine Learning). For any pair of search/learning al-
gorithms A
1
, A
2
operating on discrete finite search
space , any closed under permutation set of target
sets τ, any set of information resources B, and de-
composable probability-of-success metric φ,
tτ
f B
φ
A
1
(t, f ) =
tτ
f B
φ
A
2
(t, f ). (5.1)
This means that performance, in terms of our decom-
posable probability-of-success metric, is conserved in
the sense that increased performance of one algorithm
over another on some information resource-target pair
comes at the cost of a loss in performance elsewhere.
5.2 The Fraction of Favorable Targets
Monta
˜
nez proved that for a fixed information re-
source, a given algorithm A will perform favor-
ably relative to uniform random sampling on only a
few target sets, under the expected per-query prob-
ability of success (Monta
˜
nez, 2017b). We gener-
alize this result with a decomposable probability-
of-success metric and define a version of active in-
formation of expectations for decomposable metrics
I
φ(t, f )
:= log
2
p
φ(t, f )
. This transforms the ratio of
success probabilities into bits where p = |t|/||, the
per-query probability of success for uniform random
sampling with replacement. I
φ(t, f )
denotes the advan-
tage A has over uniform random sampling with re-
placement, in bits.
Decomposable Probability-of-Success Metrics in Algorithmic Search
789
Theorem 2 (The Fraction of Favorable Targets). Let
τ = {t | t }, τ
b
= {t |
/
0 6= t ,I
φ(t, f )
b} for
decomposable probability-of-success metric φ. Then
for b 3,
|τ
b
|
|τ|
2
b
. (5.2)
Thus, the scarcity of b-bit favorable targets still holds
under for any decomposable probability-of-success
metric.
5.3 The Famine of Favorable Targets
Following up on the previous result, we can show a
similar bound in terms of the success of a given algo-
rithm, for targets of a fixed size.
Theorem 3 (The Famine of Favorable Targets). For
fixed k N, fixed information resource f , and decom-
posable probability-of-success metric φ, define
τ = {T | T ,|T | = k}, and
τ
q
min
= {T | T ,|T | = k,φ(T, f ) q
min
}.
Then,
|τ
q
min
|
|τ|
p
q
min
(5.3)
where p =
k
||
.
Here, we compare success not against uniform
sampling but against a fixed constant q
min
. This the-
orem thus upper bounds the number of targets for
which the probability of success of the search is
greater than q
min
.
5.4 Famine of Forte
We generalize the Famine of Forte (Monta
˜
nez,
2017b), showing a bound that holds in the k-sparse
case using any decomposable probability-of-success
metric.
Theorem 4 (The Famine of Forte). Define
τ
k
= {T | T ,|T | = k N}
and let B
m
denote any set of binary strings, such that
the strings are of length m or less. Let
R = {(T, F) | T τ
k
,F B
m
}, and
R
q
min
= {(T, F) | T τ
k
,F B
m
,φ(T, F) q
min
},
where φ(T,F) is the decomposable probability-of-
success metric for algorithm A on problem (,T, F).
Then for any m N,
|R
q
min
|
|R|
p
q
min
. (5.4)
This demonstrates that for any decomposable met-
ric there is an upper bound on the proportion of prob-
lems an algorithm is successful on. Here, we measure
success as being above a certain threshold with re-
spect to a decomposable metric, and the upper bound
is inversely related to this threshold.
5.5 Learning Under Dependence
While the previous theorems highlight cases where
an algorithm is unlikely to succeed, we now consider
the conditions that make an algorithm likely to suc-
ceed. To begin, we consider how the target and infor-
mation resource can influence an algorithm’s success
by generalizing the Learning Under Dependence the-
orem (Monta
˜
nez, 2017a).
Theorem 5 (Learning Under Dependence). Define
τ
k
= {T | T , |T | = k N} and let B
m
denote
any set of binary strings (information resources), such
that the strings are of length m or less. Define q as
the expected decomposable probability of success un-
der the joint distribution on T τ
k
and F B
m
for
any fixed algorithm A, such that q := E
T,F
[φ(T, F)],
namely,
q = E
T,F
P
φ
(ω T |F)
= Pr(ω T ; A).
Then,
q
I(T ; F) + D(P
T
kU
T
) + 1
I
(5.5)
where I
= logk/||, D(P
T
kU
T
) is the Kullback-
Liebler divergence between the marginal distribution
on T and the uniform distribution on T , and I(T ; F)
is the mutual information. Alternatively, we can write
Pr(ω T ; A)
H(U
T
) H(T | F) + 1
I
(5.6)
where H(U
T
) = log
||
k
.
The value of q defined here represents the ex-
pected single-query probability of success of an algo-
rithm relative to a randomly selected target and infor-
mation resource, distributed according to some joint
distribution. The probability of success for a single
query (marginalized over information resources) is
equivalent to the expectation of the conditional proba-
bility of success, conditioned on the random informa-
tion resource. Upper-bounding this value states that
regardless of the choice of decomposable probability-
of-success metric, the probability of success depends
on the amount of information regarding the target
contained within the information resource, as mea-
sured by the mutual information.
ICAART 2020 - 12th International Conference on Agents and Artificial Intelligence
790
5.6 Famine of Favorable Information
Resources
We now demonstrate the effect of the general bias
term defined earlier on the probability of a suc-
cess of an algorithm. We begin with a general-
ization of the Famine of Favorable Information Re-
sources (Monta
˜
nez et al., 2019).
Theorem 6 (Famine of Favorable Information Re-
sources). Let B be a finite set of information re-
sources and let t be an arbitrary fixed k-size tar-
get set with corresponding target function t. Define
B
q
min
= { f | f B, φ(t, f ) q
min
},
where φ(t, f ) is an arbitrary decomposable
probability-of-success metric for algorithm A
on search problem (,t, f ), and q
min
(0,1] repre-
sents the minimally acceptable probability of success.
Then,
|B
q
min
|
|B|
p + Bias
φ
(B,t)
q
min
(5.7)
where p =
k
||
.
This result demonstrates the mathematical effect
of bias, of which we have previously provided one
hypothetical example (car parking). In particular, we
can show that the bias of our expected information re-
sources towards the target will upper bound the prob-
ability of a given information resource leading to a
successful search.
5.7 Futility of Bias-Free Search
We can also use our definition of bias to generalize
the Futility of Bias-Free Search (Monta
˜
nez, 2017b),
which demonstrates the inability of an algorithm to
perform better than uniform random sampling with-
out bias, defined with respect to the expected per-
query probability of success. Our generalization
proves that the theorem holds for bias defined with
respect to any decomposable probability-of-success
metric.
Theorem 7 (Futility of Bias-Free Search). For any
fixed algorithm A, fixed target t with correspond-
ing target function t, and distribution over informa-
tion resources D, if Bias
φ
(D, t) = 0, then
Pr(ω t; A) = p (5.8)
where Pr(ω t; A) represents the expected decompos-
able probability of successfully sampling an element
of t using A, marginalized over information resources
F D, and p is the single-query probability of suc-
cess under uniform random sampling.
This result demonstrates that, regardless of how
we measure the success of an algorithm with respect
to a decomposable metric, it cannot perform better
than uniform random sampling without bias.
5.8 Famine of Favorable Biasing
Distributions
Monta
˜
nez proved that the percentage of minimally fa-
vorable distributions (biased over some threshold to-
wards some specific target) is inversely proportional
to the threshold value and directly proportional to
the bias between the information resource and target
function (Monta
˜
nez, 2017b). We will show that this
scarcity of favorable biasing distributions holds, in
general, for bias under any decomposable probability-
of-success metric.
Theorem 8 (Famine of Favorable Biasing Distribu-
tions). Given a fixed target function t, a finite set
of information resources B, a distribution over in-
formation resources D, and a set P = {D | D
R
|B|
,
f B
D( f ) = 1} of all discrete |B|-dimensional
simplex vectors,
µ(G
t,q
min
)
µ(P )
p + Bias
φ
(B,t)
q
min
(5.9)
where G
t,q
min
= {D | D P ,Bias
φ
(D, t) q
min
}, p =
k
, and µ is Lebesgue measure.
This result shows that the more bias there is be-
tween our set of information resources B and the tar-
get function t, the easier it is to find a minimally fa-
vorable distribution, and the higher the threshold for
what qualifies as a minimally favorable distribution,
the harder our search becomes. Thus, unless we want
to suppose that we begin with a set of information
resources already favorable towards our fixed target,
finding a highly favorable distribution is difficult.
6 CONCLUSION
Casting machine learning problems as search pro-
vides a common formalism within which to prove
bounds and impossibility results for a wide variety
of learning algorithms and tasks. In this paper, we
introduce a property of probability-of-success met-
rics called decomposability, and show that the ex-
pected per-query probability of success and general
probability of success are decomposable. To demon-
strate the value of this property, we prove that a num-
ber of existing algorithmic search framework results
continue to hold for all decomposable probability-
of-success metrics. These results provide a number
Decomposable Probability-of-Success Metrics in Algorithmic Search
791
of useful insights: we show that algorithmic perfor-
mance is conserved with respect to all decomposable
probability-of-success metrics, favorable targets are
scarce no matter your decomposable probability-of-
success metric, and that without the generalized bias
defined here, an algorithm will not perform better than
uniform random sampling.
The goal of this work is to offer additional ma-
chinery within the search framework, allowing for
more general application. To that end, we can de-
velop decomposable probability-of-success metrics
for problems concerned with the state of an algo-
rithm at specific steps, and leverage existing results
as a foundation for additional insight into those prob-
lems. One application is transfer learning. To do so,
we use a decomposable probability-of-success met-
ric that utilizes only the state of the algorithm at the
last step to represent the information learned from a
source problem.
ACKNOWLEDGEMENTS
This work was supported by the Walter Bradley Cen-
ter for Natural and Artificial Intelligence. We thank
Dr. Robert J. Marks II (Baylor University) for pro-
viding support and feedback. We also thank Harvey
Mudd College’s Department of Computer Science for
their continued resources and support.
REFERENCES
Dembski, W. A., Ewert, W., and Marks II, R. J. (2013). A
general theory of information cost incurred by suc-
cessful search. In Biological Information: New Per-
spectives, pages 26–63. World Scientific.
Dembski, W. A. and Marks II, R. J. (2009). Con-
servation of information in search: measuring the
cost of success. IEEE Transactions on Systems,
Man, and Cybernetics-Part A: Systems and Humans,
39(5):1051–1061.
Dembski, W. A. and Marks II, R. J. (2010). The search for a
search: Measuring the information cost of higher level
search. JACIII, 14(5):475–486.
G
¨
ulc¸ehre, C¸ . and Bengio, Y. (2016). Knowledge matters:
Importance of prior information for optimization. The
Journal of Machine Learning Research, 17(1):226–
257.
Kumagai, W. and Kanamori, T. (2019). Risk bound of
transfer learning using parameric feature mapping and
its application to sparse coding. Machine Learning,
108:1975–2008.
Mitchell, T. M. (1980). The need for biases in learning gen-
eralizations. Technical report, Computer Science De-
partment, Rutgers University, New Brunswick, MA.
Mitchell, T. M. (1982). Generalization as Search. Artificial
intelligence, 18(2):203–226.
Monta
˜
nez, G. D. (2017a). The Famine of Forte: Few Search
Problems Greatly Favor Your Algorithm. In Systems,
Man, and Cybernetics (SMC), 2017 IEEE Interna-
tional Conference on, pages 477–482. IEEE.
Monta
˜
nez, G. D. (2017b). Why Machine Learning Works.
PhD thesis, Carnegie Mellon University.
Monta
˜
nez, G. D., Hayase, J., Lauw, J., Macias, D., Trikha,
A., and Vendemiatti, J. (2019). The Futility of Bias-
Free Learning and Search. In 32nd Australasian Joint
Conference on Artificial Intelligence, pages 277–288.
Springer.
Sam, T., Williams, J., Tadesse, A., Sun, H., and Montanez,
G. (2020). Decomposable probability-of-success met-
rics in algorithmic search. CoRR, abs/2001.00742.
Schaffer, C. (1994). A Conservation Law for Generalization
Performance. Machine Learning Proceedings 1994,
1:259–265.
Sterkenburg, T. F. (2019). Putnam’s Diagonal Argument
and the Impossibility of a Universal Learning Ma-
chine. Erkenntnis, 84(3):633–656.
Valiant, L. (1984). A Theory of the Learnable. Communi-
cations of the ACM, 27:1134–1142.
Wolpert, D. H. and Macready, W. G. (1997). No free lunch
theorems for optimization. IEEE Trans. Evolutionary
Computation, 1:67–82.
ICAART 2020 - 12th International Conference on Agents and Artificial Intelligence
792