A Convex Framework for High Resolution 3D Reconstruction
Min Li, Changyu Diao, Song Lv and Dongming Lu
College of Computer Science and Technology, Zhejiang University, Hangzhou, Zhejiang, China
Keywords:
Structured-light, Photometric Stereo, Convex Optimization, 3D Reconstruction.
Abstract:
We present a convex framework to acquire high resolution surfaces. It is typical to couple a structure-light
setup and a photometric method to reconstruct a high resolution 3D surface. Previous methods often get
stuck in a local minima for the appearance of occasional outliers. To address this issue, we develop a convex
variational model by incorporating a total variation (TV) regularization term with a data term to generate
the surface. Through relaxing the model to an equivalent high dimensional variational problem, we obtain a
global minimizer of the proposed problem. Results on both synthetic and real-world data show an excellent
performance by utilizing our convex variational model.
1 INTRODUCTION
The highly detailed reconstruction of 3D shape has
been one of the classic topics in computer vision,
from computer graphics, to reverse rendering, and to
the digital preservation of cultural heritage materials.
It is a challenging task especially when we have to
reconstruct millions of 3D points. In our paper, we
intend to reconstruct high resolution detailed surfaces
via a convex framework, which make us avoid get-
ting stuck in local minima and obtain a high quality
surface.
The fundamental difficulty of highly detailed sur-
face reconstruction is that it is impossible to acquire
dense and accurate samples of a surface via only one
method. While methods of laser scanners or struc-
tured light often obtains accurate surfaces, it is dif-
ficult to perform highly detailed reconstructions due
to limitations of hardware. In contrast, surface recon-
struction from gradient fields is capable of obtaining
detailed surfaces, however, there are still some prob-
lems with them. Typically, it is possible to capture
high-resolution images via methods such as photo-
metric stereo or shape from shading, while the re-
sulting gradient filed of the above methods is usually
non-integrable due to gradient manipulations, pres-
ence of noises or outliers in gradient estimation. Con-
sequently, it is difficult to reconstruct an accurate sur-
face only through photometric stereo or shape from
shading on account of the lack of global information.
To overcome limitations of structured light and
photometric stereo, hybrid methods have been pro-
posed for a decade. Similar to (Lu et al., 2010),
We adopt an approach which combines structured
light and normal information estimated by photomet-
ric stereo. Rather than employing the classical `-2
methods such as the least square method, we present
a new convex framework to overcome the problem
of `-2 methods and maintain high-quality surface de-
tails, through which we avoid the influence of outliers
and improve the results substantially. Existing work
dealt with sample differences of only 4× resolution
between the detailed surface and the low-resolution
geometry (Nehab et al., 2005), while our resolution
differences are much more than that. A recent work
(Lu et al., 2010) presented a framework to deal with
the ultra-high-resolution 3D reconstruction, however,
the algorithm is sensitive to outliers and the estimate
is skewed by outliers for the reason that they have em-
ployed the least square method in a multi-resolution
surface reconstruction scheme.
In contrast with `-2 methods such as the least
quare method, global optimization strategies such as
convex optimization overcome the problem of local
minima by a global optimization. Classic computer
vision problems are usually defined on a discrete do-
main, keeping them away from convex properties.
Nonetheless, a recent work (Pock et al., 2008) demon-
strate that mulit-label problems such as stereo match-
ing and image restoration in computer vision areas
can utilize convex optimization by relaxing the orig-
inal problem from a discrete domain to a continuous
one.
Inspired by the idea of matching, we consider the
317
Li M., Diao C., Lv S. and Lu D..
A Convex Framework for High Resolution 3D Reconstruction.
DOI: 10.5220/0005306503170324
In Proceedings of the 10th International Conference on Computer Vision Theory and Applications (VISAPP-2015), pages 317-324
ISBN: 978-989-758-091-8
Copyright
c
2015 SCITEPRESS (Science and Technology Publications, Lda.)
problem of fusing geometry information and normal
resolution as a multi-label problem. By incorporating
the TV regularization term, we develop a convex vari-
ational framework that takes advantage of both global
geometry information and local normal information.
Previous the least square method has regarded the low
resolution geometry as an initialization of the surface
reconstruction from gradient method. Instead, while
we observe that we are able to acquire gradients from
both high resolution normal information and low res-
olution geometry information, we propose a convex
framework that match gradient information between
the high resolution normal and the low resolution ge-
ometry, which turns into a variational problem. In or-
der to solve the variational problem, we employ the
level-sets method through which we lift the original
problem to a higher dimensional space, leading to that
our algorithm is more robust and effective than the ex-
isting ones.
The remainder of this paper is organized as fol-
lows: Section 2 discusses related works; Section 3
presents our main surface construction problem; Sec-
tion 4 presents our results. We give our conclusion in
Section 5.
2 RELATED WORK
A variety of approaches (Herbort and W
¨
ohler, 2011;
Scharstein and Szeliski, 2002; Seitz et al., 2006; Salvi
et al., 2010) have been developed for 3D reconstruc-
tion. We usually classify the 3D reconstruction meth-
ods as passive methods such as conventional stereo
(Bernardini et al., 2002) and shape from shading
(Horn and Brooks, 1989), and active methods such
as structured light (Salvi et al., 2010) and photomet-
ric stereo (Woodham, 1980). Moreover, hybrid meth-
ods, focusing on combining position information with
normal information are presented to acquire dense 3D
points, such as ((Nehab et al., 2005; Lu et al., 2010;
Banerjee et al., 1992; Bernardini et al., 2002; Lange,
1999; Aliaga and Xu, 2010; Wu et al., 2011; Birkbeck
et al., 2006)) or visual hull with normals (Hern
´
andez
et al., 2008), and so on.
Among the above approaches, a popular one was
presented by (Nehab et al., 2005) which fused posi-
tion information and normal information into a lin-
ear formulation. Although they have obtained bet-
ter results compared with the single reconstruction
approach, the position data and the normal informa-
tion have approximately the same resolution in their
work. As a result, it is impossible to apply their ap-
proach into the high resolution reconstruction prob-
lem directly. To overcome the asymmetry between
the high resolution normal information and low reso-
lution position information, (Lu et al., 2010) proposed
a multi-resolution pyramid framework and used the
least square method in every layer of the pyramid. We
already discussed that the least square method is sen-
sitive to outliers, leading to a local minima problem.
However, errors are also magnified during the propa-
gation from one layer to another in Lu’s method.
Opposite to the proposed methods, global opti-
mization such as graph cuts (Sinha and Pollefeys,
2005; Hornung and Kobbelt, 2006; Ladikos et al.,
2008; Higo et al., 2009; Yu et al., 2006; Vogiatzis
et al., 2007) or convex optimization (Kolev et al.,
2010; Pock et al., 2008) overcomes the limitation
of local minima problem. Here, only representative
examples are mentioned. We reconsider the hybrid
method as a multi-label problem, which in general
cannot be globally minimized. However, (Ishikawa,
2003) showed that one can compute the exact solu-
tion of the multi-label problem if the pairwise inter-
actions are convex in terms of a linearly ordered la-
bel set. Based on this, researchers shifted the discrete
multi-label problem to its continuous counterpart, the
variational approach (Yuan et al., 2010; Pock et al.,
2008). It is well known that if the energy functional
is convex and the minimization is carried out on a
convex set, the globally optimal solution can be com-
puted. Thus (Pock et al., 2008) shifted the original
variational model to a higher dimensional space and
developed a convex formulation.
There is great potential for all the above men-
tioned hybrid methods to implement global optimiza-
tion by adding a regularization term. Using the
setup which is similar to (Lu et al., 2010), we in-
tend to present a convex framework and adopt the
sub-pixel continuous formulation proposed by (Pock
et al., 2008), which makes use of continuous opti-
mization techniques.
3 SURFACE RECONSTRUCTION
WITH A CONVEX
FRAMEWORK
3.1 Conventional Surface
Reconstruction from a Gradient
Field
Estimating Surface Normals. According to conven-
tional photometric stereo (Woodham, 1980), given a
lambertian surface, we are able to estimate surface
VISAPP2015-InternationalConferenceonComputerVisionTheoryandApplications
318
Figure 1: Our setup. Our experimental setup consists of a
digital camera, four lights and a digital light projector. We
use the camera for both photometric stereo and structured-
light scan.
normals from the following equation:
I = ρL · n (1)
where n is the normal we want to compute, I is the im-
age intensity, ρ is the surface albedo which is a con-
stant here, and L is the light source direction which
can be calibrated by a mirrored sphere.
Surface from Gradient Field. Consider the equa-
tion of an object as Z = f (x,y) and R is the im-
age domain. x = (x,y)
T
is the pixel coordinate.
Let (p,q) denote the observed gradient field over this
surface. Then we can easily get n = (
Z
x
,
Z
y
,1) =
(p,q,1). We estimate p and q by photometric
stereo. A common approach to obtain surface Z is to
minimize the LS function by (Horn, 1990; Agrawal
et al., 2006). Then we get:
D(p,q, Z) =
Z
((
Z
x
p)
2
+ (
Z
y
q)
2
)dx (2)
The associated Euler-Lagrange differential equation
of (2) is the poisson equation
2
Z = div(p,q), where
div is the divergence operator. This is the well-known
poisson solver method. However, we can discretize
the problem as a least problem and solve it using
Gauss-Seidel iteration (Lu et al., 2010).
3.2 Low Geometry Constraints with a
Convex Framework
In practice, the gradient field obtained from photo-
metric stereo is rarely integrable due to the inherent
noise in the estimation process, manipulation of gra-
dient fields or ambiguities in the solution. Conse-
quently, many surface from gradient algorithms do
not reconstruct the surface geometry accurately. As
Figure 2: Representation of u. Red dots represent locations
of each point on the computed by the surface-from-gradient
algorithm (left red dot) and structured-light (right one).
discussed before, we can overcome this by incorporat-
ing positional information in the reconstruction pro-
cess. Specifically, we estimate the orientation consis-
tency between the gradient field acquired from photo-
metric stereo and the real orientation.
Typically, we are able to acquire accurate surface
orientation via photometric stereo as well as accurate
position information through triangular methods such
as stereo (Bernardini et al., 2002) and structured light
(Salvi et al., 2010). As has mentioned in section 3.1,
we consider the surface as Z = f (x,y) while (p, q)
denote the observed gradient filed over the surface.
Inspired by other computer vision problems such
as stereo estimation and image segmentation, which
are usually treated as labeling problems, we convert
our fusing problem into a labeling problem, which in-
cludes The regularization term R (.) and the data term
D(.). Our goal is to minimize the following energy
functional:
min
u
{R (u)+ D(u)} (3)
Specifically, for our problem, we utilize the offset u
in the z axis position, with which the depth of each
pixel on the image domain shifts up and down in or-
der to match the related gradients (Fig.2). We utilize
the TV regularization term as the regularization term
in order to obtain smooth results while we update (2)
as the Data term D(p, q, u). Then our low resolution
constraint is modeled as the following variational es-
timation:
min
u
E(u) =
Z
|
u(x)
|
dx + λ
Z
ρ(x,u)dx (4)
where the right term is D(u) to measure the orienta-
tion consistency and defined as:
ρ(x,u) =
(Z + u)
x
d(p)
+
(Z + u)
y
d(q)
(5)
In (4) and (5), due to the prior of low resolution
geometry Z(x, y), we take the depth to shift along z
AConvexFrameworkforHighResolution3DReconstruction
319
(a) image of object (b) surface normals (c) 3D reconstruction (Our) (d) 3D reconstruction (LS)
Figure 3: 3D reconstruction of the venus figure. The reconstructions show our method is more robust to outliers than the LS
method.
coordinate up and down to match the orientation ob-
tained from photometric stereo. u(x) denote the off-
sets of the depth, which is somewhat similar to the
disparity in stereo. d(·) is the downsample operator to
match the low geometry and the high resolution and
| · | is the `-1 norm.
Generally, the variational model (4) is not convex
due to the non-convex of the data term D(u). How-
ever, we can develop a convex formulation via lift
variational model (4) to a higher dimensional space
by representing u in terms of its level sets, which al-
lows us to compute the exact solution of the original
non-convex problem.
We utilize the functional lifting method of (Pock
et al., 2008). For simplify, We employ the expres-
sions of the paper in (Pock et al., 2008), where l
u>γ
is
the indicator for the γ-super-levels of u and φ(x, γ) =
l
u>γ
(x) denotes the binary function to resemble the
graph of u. The variational model (4) is equivalent to
the following high dimensional variational problem:
min
φD
{|∇φ(x,γ)| + ρ(x,γ)|
γ
φ(x,γ)|dΣ} (6)
In (6), D is the relaxed feasible set of φ from binary
interval{0,1} to [0,1] as:
D = {φ : Σ [0,1]|φ(x,γ
min
) = 1,
φ(x,γ
max
) = 0}
(7)
Consequently, (6) is convex in φ and minimization is
carried over D, which is convex, the overall problem
is convex. (Pock et al., 2008)
Then we solve the associated Euler-Lagrange
function of (4), to find its global minimizer, which
we compute as:
div
∇φ
|
∇φ
|
γ
ρ
γ
φ
γ
φ
!
= 0,s.t. φ D (8)
To avoid that denominators of (6) become zeros,
we replace them with the robust function:
f (|s|) =
q
s
2
+ ξ
2
(9)
where ξ is a small constant.
3.3 Discretization
Considering a three-dimensional regular cartesian
grid in our numerical implementation, we get:
D
s
=
n
(i · x, j · y,k · ∆γ)
0 i < M,0 j < N, 0 k < O
o
(10)
where M × N denotes the grids of image domain and
O is the range of depth value. We utilize standard
forward differences to approximate the gradient oper-
ator, which is:
(
3
φ)
i, j,k
=
φ
i+1, j,k
φ
i, j,k
x
,
φ
i, j+1,k
φ
i, j,k
y
,
φ
i, j,k+1
φ
i, j,k
∆γ
T
(11)
and the divergence operator is:
(div
3
(
3
φ))
i, j,k
=
(
3
φ)
1
i, j,k+1
(
3
φ)
1
i, j,k
x
+
(
3
φ)
2
i, j,k+1
(
3
φ)
2
i, j,k
y
+
(
3
φ)
3
i, j,k+1
(
3
φ)
3
i, j,k
∆γ
(12)
and we use center differences to approximate the par-
tial derivative as the following:
γ
φ =
φ
i, j,k+1/2
φ
i, j,k1/2
∆γ
(13)
where x, y denote the width of spatial discretization
and ∆γ denotes the height of the depth discretization.
4 RESULTS
We estimate the high-resolution surface normals
through photometric stereo images captured by four
VISAPP2015-InternationalConferenceonComputerVisionTheoryandApplications
320
(a) image of object (b) surface normals (c) 3D model (Our) (d) 3D model (LS)
Figure 5: 3D reconstruction of the vase figure.
(a) zoom1 (Our) (b) zoom1 (LS) (c) Zoom2 (Our) (d) Zoom2 (LS)
Figure 6: Details of vase.
(a) Zoom (Our) (b) Zoom (LS)
Figure 4: “Nose” details of venus. The zoom details show
our method acquires better results than the LS method.
LED light sources. We estimate the low-resolution
geometry using a typical structured-light system,
which includes a Benq MP624 projector and a M5D
camera (Fig.1). This section shows results of both
synthetic data-set and data captured by our sys-
tem including a 5D camera and a Benq projector
(1024×768).
In order to address the difference between the
surface normals and the geometry, we also employ
the multi-resolution pyramid approach. However, we
handle the initial depth manually in a certain degree
to ensure the position of every point stands not so
far away from its accurate position. We find that re-
sults of least square (LS) method are nosier than our
method since the LS method is much more sensitive
to outliers than ours. After the simple deposing step,
we confine the offset u of z axis value to a particu-
lar range instead of shifting the unknown values from
zero. Besides, we discretize depth value with a sub-
pixel interval so as to acquire the final reconstructed
surface as accurate as possible.
Fig.3 shows a 3D reconstruction of the “venus”
statue which is approximately 30cm wide and Fig.5
shows a china of about 35cm high, both captured by
our system. The initial model contains tens of thou-
sands points and the final results is more than twenty
times than that. All of these results shows that our
method is much less sensitive to the outliers than the
LS method. Meanwhile we also retain the local sur-
face details excellently. In Fig.4 and Fig.6, it reveals
that we obtain more delicate details especially in the
sharp positrons.
Finally, in order to demonstrate that our method
is also able to achieve ultra-high resolution, we make
(a) Our method (b) LS method
Figure 7: The top of the head of venus.
AConvexFrameworkforHighResolution3DReconstruction
321
(a) image of object (b) surface normals
(c) 3D model (Our) (d) 3D model (LS)
(e) zoom 1 (Our) (f) zoom 2 (Our) (g) zoom 1 (LS) (h) zoom 2 (LS)
(i) double zoom 1 (Our) (j) double zoom 2 (Our) (k) double zoom 1 (LS) (l) double zoom 2 (LS)
Figure 8: 3D reconstruction of the buddha figure. And we show “zoom” and “double-zoom” surfaces.
VISAPP2015-InternationalConferenceonComputerVisionTheoryandApplications
322
use 3DMAX to render data of a 14cm-high buddha.
We compare the results in Fig.8. The original 3D
model we use to render is sampled about 150 sam-
ples per mm
2
and we promote the sampling rate of
our synthetic data to 510 samples per mm
2
. The ob-
ject is reconstructed about 5.7 millions of 3D points
and we double zoom the buddha to show that both
our method and LS method can reveal the delicate de-
tails. However, our results are much insensitive to the
outliers, especially on some boundary areas, which is
shown in Fig.7 for example.
5 CONCLUSIONS
We demonstrate that we are able to reconstruct high-
quality surfaces through global optimized framework
and to represent good surface details at the same time.
Although certain hybrid systems have been presented
to address the problem of fusing normal information
and geometry data, only few of them attempted to sur-
pass the resolution of structured-light systems to the
ultra-high resolution level. However, the results of
the ultra-high resolution construction are inevitable
to influence by outliers due to the inherent nature of
LS method. To overcome these issues, we introduce
a convex framework to ensure that the high-quality
surface is reconstructed progressively and the error is
reasonably under control. Consequently, we are able
to implement ultra high resolution 3D reconstruction
while retaining subtle details close to local methods
such as the LS method.
Even our method do handle quite a vast of materi-
als, the normal estimation process still needs to be im-
proved for the extremely specular surfaces. In future
we will improve the normal information by utilizing
an alternative BRDF model and incorporate color in-
formation to make the estimation more reliable. In
addition, we also consider to develop a more effi-
cient numerical algorithm to minimize our variational
model, so as to avoid the “zero-denominator” prob-
lem of the fixed point algorithm and to accelerate the
convergence rate.
ACKNOWLEDGEMENTS
This work is supported by Issue of National Science
and Technology Support Plan (No. 2012BAH03F02,
No. 2013BAK08B08), Important Project Subject to
Tender of The National Social Science Fund (No.
12&ZD32), and Science and Technology Project
on Heritage Conservation (Detection and Simulation
Technology for Heritage Preservation Environment in
Archaeological Sites), Zhejiang Province.
REFERENCES
Agrawal, A., Raskar, R., and Chellappa, R. (2006). What
is the range of surface reconstructions from a gradient
field? In ECCV, pages 578–591. Springer.
Aliaga, D. G. and Xu, Y. (2010). A self-calibrating method
for photogeometric acquisition of 3d objects. IEEE
Transactions. Pattern Analysis and Machine Intelli-
gence, 32(4):747–754.
Banerjee, S., Sastry, P., and Venkatesh, Y. (1992). Surface
reconstruction from disparate shading: An integration
of shape-from-shading and stereopsis. IAPR, 1:141–
144.
Bernardini, F., Rushmeier, H., Martin, I. M., Mittleman, J.,
and Taubin, G. (2002). Building a digital model of
michelangelo’s florentine pieta. IEEE Transactions.
Computer Graphics and Applications, 22(1):59–67.
Birkbeck, N., Cobzas, D., Sturm, P., and Jagersand, M.
(2006). Variational shape and reflectance estimation
under changing light and viewpoints. In ECCV, pages
536–549. Springer.
Herbort, S. and W
¨
ohler, C. (2011). An introduction to
image-based 3d surface reconstruction and a survey of
photometric stereo methods. 3D Research, 2(3):1–17.
Hern
´
andez, C., Vogiatzis, G., and Cipolla, R. (2008). Multi-
view photometric stereo. IEEE Transactions. Pattern
Analysis and Machine Intelligence, 30(3):548–554.
Higo, T., Matsushita, Y., Joshi, N., and Ikeuchi, K. (2009).
A hand-held photometric stereo camera for 3-d mod-
eling. In ICCV, pages 1234–1241. IEEE.
Horn, B. K. (1990). Height and gradient from shading. In-
ternational Journal of Computer Vision, 5(1):37–75.
Horn, B. K. and Brooks, M. J. (1989). Shape from shading.
MIT press.
Hornung, A. and Kobbelt, L. (2006). Hierarchical volu-
metric multi-view stereo reconstruction of manifold
surfaces based on dual graph embedding. In CVPR,
volume 1, pages 503–510. IEEE.
Ishikawa, H. (2003). Exact optimization for markov random
fields with convex priors. IEEE Transactions. Pat-
tern Analysis and Machine Intelligence, 25(10):1333–
1336.
Kolev, K., Pock, T., and Cremers, D. (2010). Anisotropic
minimal surfaces integrating photoconsistency and
normal information for multiview stereo. In ECCV,
pages 538–551. Springer.
Ladikos, A., Benhimane, S., and Navab, N. (2008). Multi-
view reconstruction using narrow-band graph-cuts
and surface normal optimization. In BMVC, pages 1–
10.
Lange, H. (1999). Advances in the cooperation of shape
from shading and stereo vision. In Second Interna-
tional Conference on 3-D Digital Imaging and Mod-
eling, pages 46–58. IEEE.
AConvexFrameworkforHighResolution3DReconstruction
323
Lu, Z., Tai, Y.-W., Ben-Ezra, M., and Brown, M. S. (2010).
A framework for ultra high resolution 3d imaging. In
CVPR, pages 1205–1212. IEEE.
Nehab, D., Rusinkiewicz, S., Davis, J., and Ramamoor-
thi, R. (2005). Efficiently combining positions and
normals for precise 3d geometry. ACM Transactions.
Graphics, 24(3):536–543.
Pock, T., Schoenemann, T., Graber, G., Bischof, H., and
Cremers, D. (2008). A convex formulation of contin-
uous multi-label problems. In ECCV, pages 792–805.
Springer.
Salvi, J., Fernandez, S., Pribanic, T., and Llado, X. (2010).
A state of the art in structured light patterns for surface
profilometry. Pattern recognition, 43(8):2666–2680.
Scharstein, D. and Szeliski, R. (2002). A taxonomy and
evaluation of dense two-frame stereo correspondence
algorithms. International Journal of Computer Vision,
47(1-3):7–42.
Seitz, S. M., Curless, B., Diebel, J., Scharstein, D., and
Szeliski, R. (2006). A comparison and evaluation
of multi-view stereo reconstruction algorithms. In
CVPR, volume 1, pages 519–528. IEEE.
Sinha, S. N. and Pollefeys, M. (2005). Multi-view recon-
struction using photo-consistency and exact silhouette
constraints: A maximum-flow formulation. In ICCV,
volume 1, pages 349–356. IEEE.
Vogiatzis, G., Hern
´
andez, C., Torr, P. H., and Cipolla, R.
(2007). Multiview stereo via volumetric graph-cuts
and occlusion robust photo-consistency. IEEE Trans-
actions. Pattern Analysis and Machine Intelligence,
29(12):2241–2246.
Woodham, R. J. (1980). Photometric method for determin-
ing surface orientation from multiple images. Optical
Engineering, 19(1):191139–191139.
Wu, C., Liu, Y., Dai, Q., and Wilburn, B. (2011). Fus-
ing multiview and photometric stereo for 3d re-
construction under uncalibrated illumination. IEEE
Transactions. Visualization and Computer Graphics,
17(8):1082–1095.
Yu, T., Ahuja, N., and Chen, W.-C. (2006). Sdg cut: 3d
reconstruction of non-lambertian objects using graph
cuts on surface distance grid. In CVPR, volume 2,
pages 2269–2276. IEEE.
Yuan, J., Bae, E., and Tai, X.-C. (2010). A study on con-
tinuous max-flow and min-cut approaches. In CVPR,
pages 2217–2224. IEEE.
VISAPP2015-InternationalConferenceonComputerVisionTheoryandApplications
324