Nonparametric Bayesian Line Detection
Towards Proper Priors for Robotic Computer Vision
Anne C. van Rossum
1,3
, Hai Xiang Lin
2,3
, Johan Dubbeldam
2
and H. Jaap van den Herik
3
1
Almende B. V. and Distributed Organisms B. V. (DoBots), Rotterdam, The Netherlands
2
Delft University of Technology, Delft, The Netherlands
3
Leiden University, Leiden, The Netherlands
Keywords:
Bayesian Nonparametrics, Line Detection.
Abstract:
In computer vision there are many sophisticated methods to perform inference over multiple lines, however
they are quite ad-hoc. In this paper a fully Bayesian approach is used to fit multiple lines to a point cloud
simultaneously. Our model extends a linear Bayesian regression model to an infinite mixture model and
uses a Dirichlet process as a prior for the partition. We perform Gibbs sampling over non-unique parameters
as well as over clusters to fit lines of a fixed length, a variety of orientations, and a variable number of data
points. The performance is measured using the Rand Index, the Adjusted Rand Index, and two other clustering
performance indicators. This paper is mainly meant to demonstrate that general Bayesian methods can be
used for line estimation. Bayesian methods, namely, given a model and noise, perform optimal inference over
the data. Moreover, rather than only demonstrating the concept as such, the first results are promising with
respect to the described clustering performance indicators. Further research is required to extend the method
to inference over multiple line segments and multiple volumetric objects that will need to be built on the
mathematical foundation that has been laid down in this paper.
1 INTRODUCTION
In computer vision and particularly in robotics, tradi-
tionally the task of line detection has been performed
through sophisticated, but ad-hoc methods. We will
give two examples of such methods. RANSAC
(Bolles and Fischler, 1981) is a method that itera-
tively tests a hypothesis. A line is fitted through a
subset of points. Then other points that are in consen-
sus with this line (according to a certain loss function)
are added to the subset. This procedure is repeated till
a certain performance level is obtained. The Hough
transform (Hough, 1962) is a deterministic approach
which maps points in the image space to curves in
the so-called Hough space of slopes and intercepts. A
line is extracted by getting the maximum in the Hough
space.
There are four main problems with these meth-
ods. First, the extension of RANSAC or Hough to the
detection of multiple lines is nontrivial (Zhang and
Kseck
´
a, 2007; Gallo et al., 2011; Chen et al., 2001).
Second, the noise level is hardcoded into model pa-
rameters and it is not possible to incorporate knowl-
edge about the nature of the noise. Third, it is hard to
extend the model to hierarchical forms, for example,
to lines that form more complicated structures such as
squares or volumetric forms. Fourth, there are no re-
sults known with respect to any form of optimality of
the mentioned algorithms.
Bayesian methods (Fienberg et al., 2006) are
nowadays commonplace to solve ill-posed problems.
A problem is defined by a likelihood function and
by postulating a prior. Bayes rule subsequently gives
the unique, optimal solution of combining the likeli-
hood with the prior to obtain the posterior from the
viewpoint of information processing (Zellner, 1988).
Note, that optimality about the inference procedure
does not say anything about the correctness of the
likelihood function or the postulated prior.
The detection task of multiple lines might seem
a rather straightforward problem, but a proper defi-
nition will be useful for many application domains.
In robotics depth sensors generate large point clouds
of data that are difficult to process in its raw form.
Compression of this data into lines, planes, and vol-
umetric objects (Kwon et al., 2004) is of paramount
importance to accelerate the inference in, for exam-
ple, simultaneous localization and mapping (Vasude-
Rossum, A., Lin, H., Dubbeldam, J. and Herik, H.
Nonparametric Bayesian Line Detection - Towards Proper Priors for Robotic Computer Vision.
DOI: 10.5220/0005673301190127
In Proceedings of the 5th International Conference on Pattern Recognition Applications and Methods (ICPRAM 2016), pages 119-127
ISBN: 978-989-758-173-1
Copyright
c
2016 by SCITEPRESS Science and Technology Publications, Lda. All rights reserved
119
van et al., 2007). A method that is able to infer mul-
tiple lines simultaneously can be extended to perform
inference over multiple planes and objects. Moreover,
the Bayesian approach will allow for setting intrigu-
ing priors, that for example introduce a prevalence for
certain horizontal and vertical angles in man-made
environments compared to more natural scenes (as
seems to the case for the number of unique objects
(Sudderth and Jordan, 2009).
In this paper we will postulate a method to per-
form inference over the number of lines and over the
fitting of points on that line. To achieve this, we
require methods from the field of Bayesian nonpa-
rameterics. Probabilistic and even Bayesian exten-
sions to the Hough transform exist (Bonci et al., 2005;
Dahyot, 2009), but until now researchers have not
been separating the model used to infer an individual
line from the model to infer a number of them.
2 BAYESIAN NONPARAMETRICS
In machine learning there are many methods that re-
quire a predefined figure for the number of items to
be recognized. The most well known example is the
parameter “k” in k-means clustering which fixes the
number of clusters to search for. The Bayesian ap-
proach towards a multi-object estimation problem is
to provide a prior on the number of clusters that al-
lows this number to be (in theory) from one to infinity.
A naive interpretation would require an integral over
an infinite number of models. This can be prevented
by performing inference over partitions of the data.
The data will be finite for all practical applications.
Apart from a prior over the number of partitions,
there should also be a prior formulated with respect to
the distribution of points over these partitions. Note,
that our line detection task is actually a partition prob-
lem. We are interested in which points belong to
which line and we want to know the parameters of
each line. However, we have no preferred index for
the lines themselves. They are neither ordered in a
specific manner, nor do they have labels. This prop-
erty of a partition is called exchangeability.
2.1 Dirichlet Process
The exchangeability (de Finetti, 1992) property is
related to conditional independence by de Finetti’s
theorem. The theorem states that for exchangeable
observations there is some hidden random variable
that make the observations conditionally independent
(and have the same joint probability distribution). De
Finetti’s theorem does only reveal the existence of this
random variable, nothing more. In our case we will
see that for infinitely exchangeable sequences the so-
called Dirichlet process is a set of such random vari-
ables.
The Dirichlet process is a distribution over func-
tion spaces in which these function spaces are proba-
bility measures in their own right. Suppose we have a
parameter set Θ = {θ
0
,...,θ
N
}, with θ
i
correspond-
ing to observation w
i
(in our case an observation w
i
exists of a tuple {X
i
,y
i
}), then we describe the Dirich-
let process as follows:
G DP(α,H) (1)
This means that for every (finite measurable) par-
tition {A
0
,...,A
k
} of the parameter set Θ, the random
distribution G is a Dirichlet process with base distri-
bution H and concentration parameter α:
{G(A
0
),...,G(A
k
)} Dir(αH(A
0
),...,αH(A
k
))
(2)
It is important to pay close attention to indices.
The Dirichlet process samples from a continuous base
distribution H. However, the samples themselves can
be discrete in the sense that parameter θ
j
tied to ob-
servation j can be exactly the same as parameter θ
k
tied to observation k.
2.2 Dirichlet Mixture Model
The Dirichlet process can be used as a mixture model
(Antoniak, 1974; Escobar and West, 1995; MacEach-
ern and M
¨
uller, 1998) in which it generates (non-
unique) parameters that subsequently generate obser-
vations:
G DP(α,H)
θ
i
| G G
w
i
| θ
i
F(θ
i
)
(3)
Here F describes the mapping from parameters θ
i
to observations w
i
. It is possible to integrate over G
and sample the parameters directly from the base dis-
tribution H.
2.3 Gibbs Sampling of Parameters
Gibbs sampling requires the conditional probabilities
of all entities involved (Geman and Geman, 1984).
Gibbs sampling just as other Markov chain Monte
Carlo methods generates a sequence of correlated
samples. Subsequently, if necessary, the Maximum A
Posteriori estimation of a value can be found through
picking the mode (most common occurring value) of
a parameter.
ICPRAM 2016 - International Conference on Pattern Recognition Applications and Methods
120
w
i
z
i
πα
θ
k
λ
0
N
Figure 1: The Bayesian linear regression model for multiple lines in plate notation (Buntine, 1994). A nice name might
be the Infinite Line Model. The Dirichlet process is defined at the left with concentration parameter α. It generates the
partitions (π
1
,... ,π
k
) with assignment parameters z
i
that denote which observation i belongs to which cluster k. The cluster
is summarized through the parameter set θ
k
and has λ as its hyperparameter.
The derivation of the conditional probabilities of
parameters with respect to the remaining parame-
ters has been described in the literature (Neal, 2000).
Such a derivation uses an important property of the
Dirichlet process, namely that it is the conjugate prior
of the multinomial distribution. Thanks to conju-
gacy the following equations have closed-form de-
scriptions. The conditional probabilities are sampled
from the base distribution G
0
and the other parameters
θ
i
in the following way:
θ
n+1
| θ
1
...θ
n1
1
α + n
(αG
0
+
n
i=1
δ
θ
i
) (4)
If we include the observations themselves, we
need to include the likelihood as well:
θ
i
| θ
i
,w
i
C
(
i6= j
F(w
i
,θ
j
)δ
θ
j
+ αH
i
Z
F(w
i
,θ)dH(θ)
)
(5)
The constant C is a normalization factor to make
the above a proper probability density (summing to
one). The entity H
i
is the posterior density of θ given
H as prior and y
i
as observation. The notation θ
i
de-
scribes the set of all parameters Θ with θ
i
excluded.
The integral over dH(θ) is a Lebesgue-Stieltjes inte-
gral that weighs the contribution of F(w
i
,θ) with the
base distribution H(θ).
Equation 5 can be used to perform inference di-
rectly with all (non-unique) parameters θ
i
tied to ob-
servations w
i
. Details on inference will be provided
in Sect. 3.
2.4 Gibbs Sampling of Clusters
It is also possible to iterate only over the clusters. The
derivation takes a few steps (Neal, 2000) but leads to
a simple update for the component indices that only
depends on the number of data items per cluster, the
parameter α, and the data at hand.
The probability to sample from a cluster de-
pends on the number of items in that cluster (except
the data item at hand). This is expressed in equation 6.
p(c
i
= c and c
i
= c
j
and i 6= j | c
i
,w
i
,α,θ)
n
c,i
α + n 1
F(w
i
| θ
i
) (6)
The probability to sample a new cluster only de-
pends on α and the total number of data items. This
is described in equation 7.
p(c
i
(c) and c
i
6= c
j
and i 6= j | c
i
,α)
α
α + n 1
Z
F(w
i
| θ
i
)dH(θ) (7)
Here (c) denotes all admitted values for c
i
.
The importance of conjugacy is obvious from
Eq. 7, it will lead to an analytic form of the integral.
The inference method using equations 6 and 7 is de-
scribed in section 3.
3 MODEL
The proposed model extends the Bayesian linear re-
gression to multiple lines using a Dirichlet process as
a prior for the partitioning of points over lines and
the number of lines overall. We will name this model
the “Infinite Line Mixture Model”. This name fol-
lows the naming convention for other models in the
nonparametric Bayesian literature (Rasmussen, 1999;
Ghahramani and Griffiths, 2005; Gael et al., 2009). in
particular, “infinite” means that there are an infinite
number of lines to be inferred (see Figure 1).
3.1 Bayesian Linear Regression Model
Let us first reiterate the Bayesian linear regression
model for a single line (Box and Tiao, 2011). A line
is assumed to have Gaussian noise. For the individual
points i we can write this as a Normal distribution:
y
i
N (x
i
β,σ
2
) (8)
The coordinate (column) vector β maps the (row)
vector with independent variables x
i
to the depen-
dent variable y. The noise is normally distributed
with standard deviation σ along the dimension of the
dependent variable. In a computer vision task with
Nonparametric Bayesian Line Detection - Towards Proper Priors for Robotic Computer Vision
121
Algorithm 1: Gibbs sampling over parameters θ
i
.
1: procedure GIBBS ALGORITHM 1(w, λ
0
,α) Accepts points w, hyperparameters λ
0
,α and returns k line
coordinates
2: for all t = 1 : T do
3: for all i = 1 : N do
4: for all j = 1 : N, j 6= i do
5: L
j
= likelihood(w
i
,θ
j
) Update likelihood for all theta (except θ
i
) given observation w
i
6: end for
7: P
i
= post pred(w
i
,λ
0
) Posterior predictive of w
i
given hyper parameters
8: p(new) =
αP
i
αP
i
+
i
L
i
Sample new or old?
9: if p(new) then Informal notation for sampling with probability p(new)
10: λ
temp
= update(w
i
,λ
0
) Update sufficient statistics with observation w
i
11: θ
i
NIG(λ
temp
) Sample θ
i
from NIG
12: else
13: θ
i
sampled from existing clusters Sample old cluster
14: end if
15: end for
16: end for
17: return summary on θ
k
for k lines
18: end procedure
images x
i
= [1, x
value
] and y
i
is the y
value
. The x-
coordinate value is transformed to obtain a value for
the intersect for β
0
.
All observations that belong to the same single
line lead to a likelihood function that corresponds to a
normally distributed random variable with y and X as
parameters:
p(y | X, β, σ
2
)
σ
n
exp
1
2σ
2
(y Xβ)
T
(y Xβ)
(9)
The dependent variable is now a column vector
of values y and each observation has a row of inde-
pendent variables in X . The coordinate vector β and
the standard deviation σ are shared across all obser-
vations.
3.2 Conjugate Prior for the Bayesian
Linear Regression Model
The conjugate prior has the form of Eq. 9 which can
be composed out of a separate prior for the stan-
dard deviation p(σ) and the conditional probability
of the line coefficients given the standard deviation
p(β | σ
2
).
p(σ
2
,β) = p(σ
2
)p(β | σ
2
) (10)
The standard deviation σ is sampled from an
Inverse-Gamma (IG) distribution:
p(σ) (σ
2
)
(ν
0
/2+1)
exp(
1
2σ
2
ν
0
s
2
0
) (11)
This is an IG(a,b) with a = ν
0
/2 and b = 1/2ν
0
s
2
0
.
The conditional with respect to the line coefficients
has a normal distribution as prior:
p(β | σ
2
) σ
n
exp
1
2σ
2
(β µ
0
)
T
Λ
0
(β µ
0
)
(12)
3.2.1 Sufficient Statistics
Due to the fact that it is a conjugate distribution we
have a simplified description for updating the param-
eters at once, given a set of observations. The suffi-
cient statistics are updated (Minka, 2000) according
to:
Λ
n
= (X
T
X + Λ
0
)
µ
n
= Λ
1
n
(Λ
0
µ
0
+ X
T
y)
a
n
= a
0
+ n/2
b
n
= b
0
+ 1/2(y
T
y + µ
T
0
Λ
0
µ
0
µ
T
n
Λ
n
µ
n
)
(13)
Naturally, removing observations does lead to
similar updates for the sufficient statistics:
Λ
0
= (Λ
n
X
T
X)
µ
0
= Λ
1
0
(Λ
n
µ
n
X
T
y)
a
0
= a
n
n/2
b
0
= b
n
1/2(y
T
y + µ
T
0
Λ
0
µ
0
µ
T
n
Λ
n
µ
n
)
(14)
Later on we will use the term “downdate” to refer
to this adjustment by removing observations.
ICPRAM 2016 - International Conference on Pattern Recognition Applications and Methods
122
3.2.2 Posterior Predictive
The posterior predictive of the Normal-Inverse-
Gamma (NIG) describes the probability of y
given
all previous observations Y which can be summarized
directly through the sufficient statistics:
p(y
| y) =
Z
p(y
| β,σ
2
)p(β,σ
2
| y)dβdσ
2
= Student
2a
n
(Xµ
n
,
b
n
a
n
(I + XΛ
1
n
X
T
)) (15)
The Student-t distribution is of the multivariate type:
Student
ν
(β,µ,Σ) =
Γ(
1
2
(ν + d))
Γ(ν/2)π
d/2
|νΣ|
1/2
1 +
1
ν
(β µ)
T
Σ
1
(β µ)
1
2
(ν+d)
(16)
Here d is the dimension of β and µ.
And for completeness sake, the Student-t distribu-
tion amounts to:
log p = logΓ(
1
2
(ν + d)) log Γ(ν/2)
d
2
log(π)
1
2
logdet(νΣ)
1
2
(ν + d)log[1 +
1
ν
(β µ)
T
Σ
1
(β µ)] (17)
Note, that in our case Σ is not a matrix, but a scalar.
Also observe that we consistently, collect the in-
dependent and dependent variables (x
i
,y
i
) of a single
observation by one random variable w
i
.
3.2.3 Sample from the NIG Distribution
To sample from a Normal-Inverse-Gamma distribu-
tion, we sample the standard deviation using the
Gamma distribution with a and b as hyperparameters:
τ G(a,b) (18)
Then σ = τ
1/2
. The line coefficients are sampled
from a Normal distribution:
µ N (µ
0
,σ
2
Λ
1
) (19)
3.3 Extension to Multiple Lines
The extension of Bayesian linear regression can be
visualized (Fig. 1) through plate notation (Buntine,
1994). There is Bayesian line regression in parallel
for k lines (with k in theory up to infinity). The likeli-
hood function of the full model:
p(π | α)
k
i
p(z
i
| π)p(w
i
| θ
k
)p(θ
k
| λ
0
) (20)
In the plate model it can be seen that the cluster
proportions π are not integrated out. The Dirichlet
process generates a partition π. The partition con-
sists out of indices z
0
,...,z
N
that link the observa-
tions w
0
,...,w
N
with the parameters θ
0
,...,θ
K
. The
probability p(w
i
| θ
k
) corresponds to the likelihood
equations 8 and 9 with w
i
the tuple of x
i
and y
i
and
θ
k
the line parameters σ
2
k
and β
k
. The probability
p(θ
k
| λ
0
) corresponds to the prior from equation 10.
The parameters θ
k
(that is, σ
2
k
and β
k
) are gener-
ated from hyperparameters λ
0
. The hyperparame-
ters λ
0
= {µ
0
,Λ
0
,a,b} are the parameters from the
Normal-Inverse-Gamma prior.
3.4 Gibbs Sampling Parameters
We now consider the Gibbs sampling of the parame-
ters, by which we mean, the sampling of all parame-
ters tied to the observations (not just the unique ones
tied to each cluster). The individual steps are de-
scribed in detail in Algorithm 1. This Gibbs algorithm
is known as Algorithm 1 (Neal, 2000).
We perform a loop in which for T iterations each
θ
i
belonging to observation w
i
is updated in sequence.
First, the likelihood L
i
for all θ
i
given w
i
is calcu-
lated. Second, the posterior predictive for w
i
given the
hyperparameters p(w
i
| φ
0
) is calculated. The fraction
with the Dirichlet process concentration parameter α
subsequently defines if θ
i
will be sampled from a new
cluster or if one of the existing clusters will be sam-
pled. If a new cluster is sampled, the sufficient statis-
tics are updated with information on w
i
and thereafter
θ is sampled from a Normal-Inverse-Gamma distribu-
tion with the updated hyperparameters.
3.5 Gibbs Sampling Clusters
Directly sampling over the clusters is known as Algo-
rithm 2 (Neal, 2000).
Rather than updating each θ
i
per observation w
i
,
an entire cluster θ
k
is updated. In Algorithm 1 the
update of a cluster would require a first observation
to generate a new cluster at θ
j
and then moving all
observations of the old cluster θ
i
to θ
j
.
Algorithm 2 follows the same procedure in
excluding w
i
from calculating the likelihood. This
requires the previously mentioned “downdate” from
the corresponding sufficient statistics. In Algorithm 2
after all observations have been iterated over and
assigned the corresponding cluster k, an outer loop
iterates over all clusters to obtain new parameters θ
from the NIG prior.
Nonparametric Bayesian Line Detection - Towards Proper Priors for Robotic Computer Vision
123
Algorithm 2: Gibbs sampling over clusters c
k
.
1: procedure GIBBS ALGORITHM 2(w, λ
0
,α) Accepts points w and hyperparameters λ
0
and α, returns k line
coordinates
2: for all t = 1 : T do
3: for all i = 1 : N do
4: c = cluster(w
i
) Get cluster c currently assigned to observation w
i
5: λ
c
= downdate(w
i
,λ
c
) Adjust sufficient statistics for cluster c by removing observation w
i
6: m
c
= m
c
1 Adjust cluster size m
c
(observation w
i
removed reduces it with one)
7: for all k = 1 : K do
8: L
k
= m
k
likelihood(w
i
,θ
k
) Update likelihood for cluster k given observation w
i
9: end for
10: P
i
= post pred(w
i
,λ
0
) Posterior predictive of w
i
given hyper parameters
11: p(new) =
αP
i
αP
i
+
k
L
k
Sample new or old?
12: if p(new) then
13: λ
k
= update(w
i
,λ
0
) Update sufficient statistics with observation w
i
14: θ
i
NIG(λ) Sample θ
i
from NIG
15: else
16: k sampled from existing clusters
17: λ
k
= update(w
i
,λ
k
) Restore sufficient statistics with observation w
i
18: end if
19: m
k
= m
k
+ 1 Increment cluster size m
k
20: end for
21: for all k = 1 : K do
22: θ
k
NIG(λ
k
) Sample θ
k
from NIG
23: end for
24: end for
25: return summary on θ
k
for k lines
26: end procedure
4 RESULTS
The Infinite Line Mixture Model (see section 3) is
able to fit an infinite number of lines through a point
cloud in two dimensions. These lines are no line seg-
ments, but infinite lines. However, to test the model a
variable number of lines are generated of a length that
is considerably larger compared to the spread caused
by the standard deviation of points from that line.
As described before, Gibbs sampling leads to cor-
related samples. We choose to get the Maximum A
Posterior estimates for our clusters by picking the me-
dian values for all the parameters involved.
4.1 Clustering Performance
The results are measured using conventional metrics
for clustering performance. For example the Rand
Index describes the accuracy of cluster assignments
(Rand, 1971):
R =
a + b
a + b + c + d
(21)
Here a numbers the pair of points that belong to
the same cluster, both at ground truth as well as after
the inference procedure. Likewise b numbers the pair
of points that belong to different clusters in both sets.
The values c and d describe discrepancies between the
ground truth and the results after inference. A Rand
Index of one means that there have been no mistakes.
The clustering performance is separate from the
line estimation performance. If the points are not
properly assigned, the line will not be estimated cor-
rectly. Due to the fact that line estimation has this
secondary effect, this performance is not taken into
account. Moreover, from lines that generated only a
single, or very few points, we can extract point as-
signments, but line coefficients are impossible to de-
rive. This would lead to introducing a threshold for
the number of points per cluster. Moreover, the per-
formance would then need to be measured by weight-
ing the fitting versus the assignment.
The performance of Algorithm 1 can be seen in
Fig. 2 and is rather disappointing. On average the in-
ference procedure agrees upon the ground truth for
75% of the cases considering the Rand Index. More-
ICPRAM 2016 - International Conference on Pattern Recognition Applications and Methods
124
over, if we adjust for chance as with the Adjusted
Rand Index, the performance drops to only having
25% correct!
Figure 2: The performance of Algorithm 1 with respect to
clustering is measured using the Rand Index, the Adjusted
Rand Index, the Mirvin metric, and the Hubert metric. A
figure of 1 means perfect clustering for all metrics, except
Mirvin’s where 0 denotes perfect clustering.
Algorithm 2 leads to stellar performance measures
(Fig. 3). Apparently updating entire clusters at once
with respect to their parameter values leads at times to
perfect clustering, bringing the performance metrics
close to their optimal values.
Figure 3: The performance of Algorithm 2 with respect to
clustering is measured using the Rand Index, the Adjusted
Rand Index, the Mirvin metric, and the Hubert metric. A
figure of 1 means perfect clustering for all metrics, except
Mirvin’s where 0 denotes perfect clustering.
The lack of performance of Algorithm 1 is not
only caused by slower mixing (time required to reach
the steady state distribution). Also when allowing it
ten times the number of iterations of Algorithm 2,
it still does not reach the same performance levels.
A line seems to form local regions of high probabil-
ity making it difficult for points to postulate slightly
changed line coordinates.
4.2 Some Examples
In the following we show a few examples to under-
stand the inference process better. Figure 4 shows the
assignment after a single Gibbs step in Algorithm 1.
There is a single line that is represented by two clus-
ters. Algorithm 1 does not have merge or split steps to
group these clusters at once, it thus has to move each
data point one by one. By the way, there are split-
merge algorithms that take these more sophisticated
Gibbs steps into account (Jain and Neal, 2004).
Figure 4: One of the Gibbs steps in the inference of two par-
ticular lines. The points are more or less distributed accord-
ing to the lines, but one line exists out of two large clusters.
The line coordinates are visualized by a double circle. The
x-coordinate is the y-intercept of the line, the y-coordinate
is the slope.
The example in Fig. 5 shows that a single point
as an outlier is not a problem for our method. A sin-
gle point might throw off Bayesian linear regression,
but because there are multiple lines to be estimated in
our Infinite Line Mixture Model, this single point is
assigned its own line.
The extension to more points as outliers would of
course require us to postulate a distribution for these
outlier points as well. A uniform distribution might
for example be used in tandem with the proposed
model. This however would lead to a non-conjugate
model and hence different inference methods.
Nonparametric Bayesian Line Detection - Towards Proper Priors for Robotic Computer Vision
125
Figure 5: The assignment of a line to a single point. There
are three clusters found, rather than only the obvious two.
5 CONCLUSIONS
The Infinite Line Mixture Model that is proposed ex-
tends the familiar Bayesian linear regression model
to an infinite number of lines using a Dirichlet Pro-
cess as prior. The model is a full Bayesian method
to detect multiple lines. A full Bayesian method, in
contrast to ad-hoc methods such as the Hough trans-
form or RANSAC, means optimal inference (Zellner,
1988) given the model and noise definition.
Results in section 4 show high values for differ-
ence performance metrics for clustering, such as the
Rand Index, the Adjusted Rand Index, and other met-
rics. The Bayesian model is solved through two types
of algorithms. Algorithm 1 iterates over all obser-
vations and suffers from slow mixing. The individ-
ual updates makes it hard to reassign large number of
points at the same time. Algorithm 2 iterates over en-
tire clusters. This allows updates for groups of points
leading to much faster mixing. Note, that even opti-
mal inference results in occasional misclassifications.
The dataset is generated by a random process. Hence,
occassionally two lines are generated with almost the
same slope and intercept. Points on these lines are
impossible to assign to the proper line.
The essential contribution of this paper is the in-
troduction of a fully Bayesian method to infer lines
and there are two ways in which the postulated model
can to be extended for full-fledged inference in com-
puter vision as required in robotics. First, the exten-
sion of lines in 2D to planes in 3D. This is quite a
trivial extension that does not change anything of the
model except for the dimension of the data points.
Second, somehow a prior needs to be incorporated to
limit the lines of infinite length, to line segments. To
restrict points on the lines to a uniform distribution of
points over a line segment, a symmetric Pareto distri-
bution can be used as prior (for the end points). This
would subsequently allow for a hierarchical model in
which these end points are in their turn part of more
complicated objects. Hence, the Infinite Line Mix-
ture Model is an essential step towards the use of
Bayesian methods (and thus properly formulated pri-
ors) for robotic computer vision.
REFERENCES
Antoniak, C. E. (1974). Mixtures of Dirichlet processes
with applications to Bayesian nonparametric prob-
lems. The annals of statistics, pages 1152–1174.
Bolles, R. C. and Fischler, M. A. (1981). A RANSAC-based
approach to model fitting and its application to finding
cylinders in range data. In IJCAI, volume 1981, pages
637–643.
Bonci, A., Leo, T., and Longhi, S. (2005). A bayesian ap-
proach to the hough transform for line detection. Sys-
tems, Man and Cybernetics, Part A: Systems and Hu-
mans, IEEE Transactions on, 35(6):945–955.
Box, G. E. and Tiao, G. C. (2011). Bayesian inference in
statistical analysis, volume 40. John Wiley & Sons.
Buntine, W. L. (1994). Operations for learning with graph-
ical models. JAIR, 2:159–225.
Chen, H., Meer, P., and Tyler, D. E. (2001). Robust regres-
sion for data with multiple structures. In Computer
Vision and Pattern Recognition, 2001. CVPR 2001.
Proceedings of the 2001 IEEE Computer Society Con-
ference on, volume 1, pages I–1069. IEEE.
Dahyot, R. (2009). Statistical hough transform. Pattern
Analysis and Machine Intelligence, IEEE Transac-
tions on, 31(8):1502–1509.
de Finetti, B. (1992). Foresight: Its logical laws, its sub-
jective sources. In Breakthroughs in statistics, pages
134–174. Springer.
Escobar, M. D. and West, M. (1995). Bayesian density es-
timation and inference using mixtures. Journal of the
american statistical association, 90(430):577–588.
Fienberg, S. E. et al. (2006). When did Bayesian inference
become “Bayesian”? Bayesian analysis, 1(1):1–40.
Gael, J. V., Teh, Y. W., and Ghahramani, Z. (2009). The
infinite factorial hidden markov model. In Advances in
Neural Information Processing Systems, pages 1697–
1704.
Gallo, O., Manduchi, R., and Rafii, A. (2011). CC-
RANSAC: Fitting planes in the presence of multiple
surfaces in range data. Pattern Recognition Letters,
32(3):403–410.
Geman, S. and Geman, D. (1984). Stochastic relaxation,
gibbs distributions, and the bayesian restoration of
images. Pattern Analysis and Machine Intelligence,
IEEE Transactions on, (6):721–741.
Ghahramani, Z. and Griffiths, T. L. (2005). Infinite la-
tent feature models and the indian buffet process. In
ICPRAM 2016 - International Conference on Pattern Recognition Applications and Methods
126
Advances in neural information processing systems,
pages 475–482.
Hough, P. V. (1962). Method and means for recognizing
complex patterns. Technical report.
Jain, S. and Neal, R. M. (2004). A split-merge markov chain
monte carlo procedure for the dirichlet process mix-
ture model. Journal of Computational and Graphical
Statistics, 13(1).
Kwon, S.-W., Bosche, F., Kim, C., Haas, C. T., and Liapi,
K. A. (2004). Fitting range data to primitives for rapid
local 3D modeling using sparse range point clouds.
Automation in Construction, 13(1):67–81.
MacEachern, S. N. and M
¨
uller, P. (1998). Estimating mix-
ture of Dirichlet process models. Journal of Compu-
tational and Graphical Statistics, 7(2):223–238.
Minka, T. (2000). Bayesian linear regression. Technical
report, Citeseer.
Neal, R. M. (2000). Markov chain sampling methods for
Dirichlet process mixture models. Journal of compu-
tational and graphical statistics, 9(2):249–265.
Rand, W. M. (1971). Objective criteria for the evaluation of
clustering methods. Journal of the American Statisti-
cal association, 66(336):846–850.
Rasmussen, C. E. (1999). The infinite gaussian mixture
model. In NIPS, volume 12, pages 554–560.
Sudderth, E. B. and Jordan, M. I. (2009). Shared segmen-
tation of natural scenes using dependent Pitman-Yor
processes. In Advances in Neural Information Pro-
cessing Systems, pages 1585–1592.
Vasudevan, S., G
¨
achter, S., Nguyen, V., and Siegwart, R.
(2007). Cognitive maps for mobile robots - an object
based approach. Robotics and Autonomous Systems,
55(5):359–371.
Zellner, A. (1988). Optimal information processing
and Bayes’s theorem. The American Statistician,
42(4):278–280.
Zhang, W. and Kseck
´
a, J. (2007). Nonparametric estima-
tion of multiple structures with outliers. In Dynamical
Vision, pages 60–74. Springer.
Nonparametric Bayesian Line Detection - Towards Proper Priors for Robotic Computer Vision
127