A Convex Framework for High Resolution 3D Reconstruction

Min Li, Changyu Diao, Song Lv and Dongming Lu

College of Computer Science and Technology, Zhejiang University, Hangzhou, Zhejiang, China

Keywords:

Structured-light, Photometric Stereo, Convex Optimization, 3D Reconstruction.

Abstract:

We present a convex framework to acquire high resolution surfaces. It is typical to couple a structure-light

setup and a photometric method to reconstruct a high resolution 3D surface. Previous methods often get

stuck in a local minima for the appearance of occasional outliers. To address this issue, we develop a convex

variational model by incorporating a total variation (TV) regularization term with a data term to generate

the surface. Through relaxing the model to an equivalent high dimensional variational problem, we obtain a

global minimizer of the proposed problem. Results on both synthetic and real-world data show an excellent

performance by utilizing our convex variational model.

1 INTRODUCTION

The highly detailed reconstruction of 3D shape has

been one of the classic topics in computer vision,

from computer graphics, to reverse rendering, and to

the digital preservation of cultural heritage materials.

It is a challenging task especially when we have to

reconstruct millions of 3D points. In our paper, we

intend to reconstruct high resolution detailed surfaces

via a convex framework, which make us avoid get-

ting stuck in local minima and obtain a high quality

surface.

The fundamental difﬁculty of highly detailed sur-

face reconstruction is that it is impossible to acquire

dense and accurate samples of a surface via only one

method. While methods of laser scanners or struc-

tured light often obtains accurate surfaces, it is dif-

ﬁcult to perform highly detailed reconstructions due

to limitations of hardware. In contrast, surface recon-

struction from gradient ﬁelds is capable of obtaining

detailed surfaces, however, there are still some prob-

lems with them. Typically, it is possible to capture

high-resolution images via methods such as photo-

metric stereo or shape from shading, while the re-

sulting gradient ﬁled of the above methods is usually

non-integrable due to gradient manipulations, pres-

ence of noises or outliers in gradient estimation. Con-

sequently, it is difﬁcult to reconstruct an accurate sur-

face only through photometric stereo or shape from

shading on account of the lack of global information.

To overcome limitations of structured light and

photometric stereo, hybrid methods have been pro-

posed for a decade. Similar to (Lu et al., 2010),

We adopt an approach which combines structured

light and normal information estimated by photomet-

ric stereo. Rather than employing the classical `-2

methods such as the least square method, we present

a new convex framework to overcome the problem

of `-2 methods and maintain high-quality surface de-

tails, through which we avoid the inﬂuence of outliers

and improve the results substantially. Existing work

dealt with sample differences of only 4× resolution

between the detailed surface and the low-resolution

geometry (Nehab et al., 2005), while our resolution

differences are much more than that. A recent work

(Lu et al., 2010) presented a framework to deal with

the ultra-high-resolution 3D reconstruction, however,

the algorithm is sensitive to outliers and the estimate

is skewed by outliers for the reason that they have em-

ployed the least square method in a multi-resolution

surface reconstruction scheme.

In contrast with `-2 methods such as the least

quare method, global optimization strategies such as

convex optimization overcome the problem of local

minima by a global optimization. Classic computer

vision problems are usually deﬁned on a discrete do-

main, keeping them away from convex properties.

Nonetheless, a recent work (Pock et al., 2008) demon-

strate that mulit-label problems such as stereo match-

ing and image restoration in computer vision areas

can utilize convex optimization by relaxing the orig-

inal problem from a discrete domain to a continuous

one.

Inspired by the idea of matching, we consider the

317

Li M., Diao C., Lv S. and Lu D..

A Convex Framework for High Resolution 3D Reconstruction.

DOI: 10.5220/0005306503170324

In Proceedings of the 10th International Conference on Computer Vision Theory and Applications (VISAPP-2015), pages 317-324

ISBN: 978-989-758-091-8

 2015 SCITEPRESS (Science and Technology Publications, Lda.)

problem of fusing geometry information and normal

resolution as a multi-label problem. By incorporating

the TV regularization term, we develop a convex vari-

ational framework that takes advantage of both global

geometry information and local normal information.

Previous the least square method has regarded the low

resolution geometry as an initialization of the surface

reconstruction from gradient method. Instead, while

we observe that we are able to acquire gradients from

both high resolution normal information and low res-

olution geometry information, we propose a convex

framework that match gradient information between

the high resolution normal and the low resolution ge-

ometry, which turns into a variational problem. In or-

der to solve the variational problem, we employ the

level-sets method through which we lift the original

problem to a higher dimensional space, leading to that

our algorithm is more robust and effective than the ex-

isting ones.

The remainder of this paper is organized as fol-

lows: Section 2 discusses related works; Section 3

presents our main surface construction problem; Sec-

tion 4 presents our results. We give our conclusion in

Section 5.

2 RELATED WORK

A variety of approaches (Herbort and W

ohler, 2011;

Scharstein and Szeliski, 2002; Seitz et al., 2006; Salvi

et al., 2010) have been developed for 3D reconstruc-

tion. We usually classify the 3D reconstruction meth-

ods as passive methods such as conventional stereo

(Bernardini et al., 2002) and shape from shading

(Horn and Brooks, 1989), and active methods such

as structured light (Salvi et al., 2010) and photomet-

ric stereo (Woodham, 1980). Moreover, hybrid meth-

ods, focusing on combining position information with

normal information are presented to acquire dense 3D

points, such as ((Nehab et al., 2005; Lu et al., 2010;

Banerjee et al., 1992; Bernardini et al., 2002; Lange,

1999; Aliaga and Xu, 2010; Wu et al., 2011; Birkbeck

et al., 2006)) or visual hull with normals (Hern

andez

et al., 2008), and so on.

Among the above approaches, a popular one was

presented by (Nehab et al., 2005) which fused posi-

tion information and normal information into a lin-

ear formulation. Although they have obtained bet-

ter results compared with the single reconstruction

approach, the position data and the normal informa-

tion have approximately the same resolution in their

work. As a result, it is impossible to apply their ap-

proach into the high resolution reconstruction prob-

lem directly. To overcome the asymmetry between

the high resolution normal information and low reso-

lution position information, (Lu et al., 2010) proposed

a multi-resolution pyramid framework and used the

least square method in every layer of the pyramid. We

already discussed that the least square method is sen-

sitive to outliers, leading to a local minima problem.

However, errors are also magniﬁed during the propa-

gation from one layer to another in Lu’s method.

Opposite to the proposed methods, global opti-

mization such as graph cuts (Sinha and Pollefeys,

2005; Hornung and Kobbelt, 2006; Ladikos et al.,

2008; Higo et al., 2009; Yu et al., 2006; Vogiatzis

et al., 2007) or convex optimization (Kolev et al.,

2010; Pock et al., 2008) overcomes the limitation

of local minima problem. Here, only representative

examples are mentioned. We reconsider the hybrid

method as a multi-label problem, which in general

cannot be globally minimized. However, (Ishikawa,

2003) showed that one can compute the exact solu-

tion of the multi-label problem if the pairwise inter-

actions are convex in terms of a linearly ordered la-

bel set. Based on this, researchers shifted the discrete

multi-label problem to its continuous counterpart, the

variational approach (Yuan et al., 2010; Pock et al.,

2008). It is well known that if the energy functional

is convex and the minimization is carried out on a

convex set, the globally optimal solution can be com-

puted. Thus (Pock et al., 2008) shifted the original

variational model to a higher dimensional space and

developed a convex formulation.

There is great potential for all the above men-

tioned hybrid methods to implement global optimiza-

tion by adding a regularization term. Using the

setup which is similar to (Lu et al., 2010), we in-

tend to present a convex framework and adopt the

sub-pixel continuous formulation proposed by (Pock

et al., 2008), which makes use of continuous opti-

mization techniques.

3 SURFACE RECONSTRUCTION

WITH A CONVEX

FRAMEWORK

3.1 Conventional Surface

Reconstruction from a Gradient

Field

Estimating Surface Normals. According to conven-

tional photometric stereo (Woodham, 1980), given a

lambertian surface, we are able to estimate surface

VISAPP2015-InternationalConferenceonComputerVisionTheoryandApplications

318

Figure 1: Our setup. Our experimental setup consists of a

digital camera, four lights and a digital light projector. We

use the camera for both photometric stereo and structured-

light scan.

normals from the following equation:

I = ρL · n (1)

where n is the normal we want to compute, I is the im-

age intensity, ρ is the surface albedo which is a con-

stant here, and L is the light source direction which

can be calibrated by a mirrored sphere.

Surface from Gradient Field. Consider the equa-

tion of an object as Z = f (x,y) and Ω ∈ R is the im-

age domain. x = (x,y)

∈ Ω is the pixel coordinate.

Let (p,q) denote the observed gradient ﬁeld over this

surface. Then we can easily get n = (

∂Z

∂x

∂Z

∂y

,1) =

(p,q,−1). We estimate p and q by photometric

stereo. A common approach to obtain surface Z is to

minimize the LS function by (Horn, 1990; Agrawal

et al., 2006). Then we get:

D(p,q, Z) =

Ω

((

∂Z

∂x

− p)

+ (

∂Z

∂y

− q)

)dx (2)

The associated Euler-Lagrange differential equation

of (2) is the poisson equation ∇

Z = div(p,q), where

div is the divergence operator. This is the well-known

poisson solver method. However, we can discretize

the problem as a least problem and solve it using

Gauss-Seidel iteration (Lu et al., 2010).

3.2 Low Geometry Constraints with a

Convex Framework

In practice, the gradient ﬁeld obtained from photo-

metric stereo is rarely integrable due to the inherent

noise in the estimation process, manipulation of gra-

dient ﬁelds or ambiguities in the solution. Conse-

quently, many surface from gradient algorithms do

not reconstruct the surface geometry accurately. As

Figure 2: Representation of u. Red dots represent locations

of each point on the computed by the surface-from-gradient

algorithm (left red dot) and structured-light (right one).

discussed before, we can overcome this by incorporat-

ing positional information in the reconstruction pro-

cess. Speciﬁcally, we estimate the orientation consis-

tency between the gradient ﬁeld acquired from photo-

metric stereo and the real orientation.

Typically, we are able to acquire accurate surface

orientation via photometric stereo as well as accurate

position information through triangular methods such

as stereo (Bernardini et al., 2002) and structured light

(Salvi et al., 2010). As has mentioned in section 3.1,

we consider the surface as Z = f (x,y) while (p, q)

denote the observed gradient ﬁled over the surface.

Inspired by other computer vision problems such

as stereo estimation and image segmentation, which

are usually treated as labeling problems, we convert

our fusing problem into a labeling problem, which in-

cludes The regularization term R (.) and the data term

D(.). Our goal is to minimize the following energy

functional:

min

{R (u)+ D(u)} (3)

Speciﬁcally, for our problem, we utilize the offset u

in the z axis position, with which the depth of each

pixel on the image domain shifts up and down in or-

der to match the related gradients (Fig.2). We utilize

the TV regularization term as the regularization term

in order to obtain smooth results while we update (2)

as the Data term D(p, q, u). Then our low resolution

constraint is modeled as the following variational es-

timation:

min

E(u) =

Ω

∇u(x)

dx + λ

Ω

ρ(x,u)dx (4)

where the right term is D(u) to measure the orienta-

tion consistency and deﬁned as:

ρ(x,u) =



∂(Z + u)

∂x

− d(p)



∂(Z + u)

∂y

− d(q)



(5)

In (4) and (5), due to the prior of low resolution

geometry Z(x, y), we take the depth to shift along z

AConvexFrameworkforHighResolution3DReconstruction

319

(a) image of object (b) surface normals (c) 3D reconstruction (Our) (d) 3D reconstruction (LS)

Figure 3: 3D reconstruction of the venus ﬁgure. The reconstructions show our method is more robust to outliers than the LS

method.

coordinate up and down to match the orientation ob-

tained from photometric stereo. u(x) denote the off-

sets of the depth, which is somewhat similar to the

disparity in stereo. d(·) is the downsample operator to

match the low geometry and the high resolution and

| · | is the `-1 norm.

Generally, the variational model (4) is not convex

due to the non-convex of the data term D(u). How-

ever, we can develop a convex formulation via lift

variational model (4) to a higher dimensional space

by representing u in terms of its level sets, which al-

lows us to compute the exact solution of the original

non-convex problem.

We utilize the functional lifting method of (Pock

et al., 2008). For simplify, We employ the expres-

sions of the paper in (Pock et al., 2008), where l

u>γ

the indicator for the γ-super-levels of u and φ(x, γ) =

u>γ

(x) denotes the binary function to resemble the

graph of u. The variational model (4) is equivalent to

the following high dimensional variational problem:

min

φ∈D

{|∇φ(x,γ)| + ρ(x,γ)|∂

φ(x,γ)|dΣ} (6)

In (6), D is the relaxed feasible set of φ from binary

interval{0,1} to [0,1] as:

D = {φ : Σ → [0,1]|φ(x,γ

min

) = 1,

φ(x,γ

max

) = 0}

(7)

Consequently, (6) is convex in φ and minimization is

carried over D, which is convex, the overall problem

is convex. (Pock et al., 2008)

Then we solve the associated Euler-Lagrange

function of (4), to ﬁnd its global minimizer, which

we compute as:

−div



∇φ



− ∂

∂



∂



= 0,s.t. φ ∈ D (8)

To avoid that denominators of (6) become zeros,

we replace them with the robust function:

f (|s|) =

+ ξ

(9)

where ξ is a small constant.

3.3 Discretization

Considering a three-dimensional regular cartesian

grid in our numerical implementation, we get:

(i · ∆x, j · ∆y,k · ∆γ)



0 ≤ i < M,0 ≤ j < N, 0 ≤ k < O

(10)

where M × N denotes the grids of image domain and

O is the range of depth value. We utilize standard

forward differences to approximate the gradient oper-

ator, which is:

(∇

φ)

i, j,k



i+1, j,k

− φ

i, j,k

∆x

i, j+1,k

− φ

i, j,k

∆y

i, j,k+1

− φ

i, j,k

∆γ



(11)

and the divergence operator is:

(div

(∇

φ))

i, j,k

(∇

φ)

i, j,k+1

− (∇

φ)

i, j,k

∆x

(∇

φ)

i, j,k+1

− (∇

φ)

i, j,k

∆y

(∇

φ)

i, j,k+1

− (∇

φ)

i, j,k

∆γ

(12)

and we use center differences to approximate the par-

tial derivative as the following:

∂

φ =

i, j,k+1/2

− φ

i, j,k−1/2

∆γ

(13)

where ∆x, ∆y denote the width of spatial discretization

and ∆γ denotes the height of the depth discretization.

4 RESULTS

We estimate the high-resolution surface normals

through photometric stereo images captured by four

VISAPP2015-InternationalConferenceonComputerVisionTheoryandApplications

320

(a) image of object (b) surface normals (c) 3D model (Our) (d) 3D model (LS)

Figure 5: 3D reconstruction of the vase ﬁgure.

(a) zoom1 (Our) (b) zoom1 (LS) (c) Zoom2 (Our) (d) Zoom2 (LS)

Figure 6: Details of vase.

(a) Zoom (Our) (b) Zoom (LS)

Figure 4: “Nose” details of venus. The zoom details show

our method acquires better results than the LS method.

LED light sources. We estimate the low-resolution

geometry using a typical structured-light system,

which includes a Benq MP624 projector and a M5D

camera (Fig.1). This section shows results of both

synthetic data-set and data captured by our sys-

tem including a 5D camera and a Benq projector

(1024×768).

In order to address the difference between the

surface normals and the geometry, we also employ

the multi-resolution pyramid approach. However, we

handle the initial depth manually in a certain degree

to ensure the position of every point stands not so

far away from its accurate position. We ﬁnd that re-

sults of least square (LS) method are nosier than our

method since the LS method is much more sensitive

to outliers than ours. After the simple deposing step,

we conﬁne the offset u of z axis value to a particu-

lar range instead of shifting the unknown values from

zero. Besides, we discretize depth value with a sub-

pixel interval so as to acquire the ﬁnal reconstructed

surface as accurate as possible.

Fig.3 shows a 3D reconstruction of the “venus”

statue which is approximately 30cm wide and Fig.5

shows a china of about 35cm high, both captured by

our system. The initial model contains tens of thou-

sands points and the ﬁnal results is more than twenty

times than that. All of these results shows that our

method is much less sensitive to the outliers than the

LS method. Meanwhile we also retain the local sur-

face details excellently. In Fig.4 and Fig.6, it reveals

that we obtain more delicate details especially in the

sharp positrons.

Finally, in order to demonstrate that our method

is also able to achieve ultra-high resolution, we make

(a) Our method (b) LS method

Figure 7: The top of the head of venus.

AConvexFrameworkforHighResolution3DReconstruction

321

(a) image of object (b) surface normals

(e) zoom 1 (Our) (f) zoom 2 (Our) (g) zoom 1 (LS) (h) zoom 2 (LS)

(i) double zoom 1 (Our) (j) double zoom 2 (Our) (k) double zoom 1 (LS) (l) double zoom 2 (LS)

Figure 8: 3D reconstruction of the buddha ﬁgure. And we show “zoom” and “double-zoom” surfaces.

VISAPP2015-InternationalConferenceonComputerVisionTheoryandApplications

322

use 3DMAX to render data of a 14cm-high buddha.

We compare the results in Fig.8. The original 3D

model we use to render is sampled about 150 sam-

ples per mm

and we promote the sampling rate of

our synthetic data to 510 samples per mm

. The ob-

ject is reconstructed about 5.7 millions of 3D points

and we double zoom the buddha to show that both

our method and LS method can reveal the delicate de-

tails. However, our results are much insensitive to the

outliers, especially on some boundary areas, which is

shown in Fig.7 for example.

5 CONCLUSIONS

We demonstrate that we are able to reconstruct high-

quality surfaces through global optimized framework

and to represent good surface details at the same time.

Although certain hybrid systems have been presented

to address the problem of fusing normal information

and geometry data, only few of them attempted to sur-

pass the resolution of structured-light systems to the

ultra-high resolution level. However, the results of

the ultra-high resolution construction are inevitable

to inﬂuence by outliers due to the inherent nature of

LS method. To overcome these issues, we introduce

a convex framework to ensure that the high-quality

surface is reconstructed progressively and the error is

reasonably under control. Consequently, we are able

to implement ultra high resolution 3D reconstruction

while retaining subtle details close to local methods

such as the LS method.

Even our method do handle quite a vast of materi-

als, the normal estimation process still needs to be im-

proved for the extremely specular surfaces. In future

we will improve the normal information by utilizing

an alternative BRDF model and incorporate color in-

formation to make the estimation more reliable. In

addition, we also consider to develop a more efﬁ-

cient numerical algorithm to minimize our variational

model, so as to avoid the “zero-denominator” prob-

lem of the ﬁxed point algorithm and to accelerate the

convergence rate.

ACKNOWLEDGEMENTS

This work is supported by Issue of National Science

and Technology Support Plan (No. 2012BAH03F02,

No. 2013BAK08B08), Important Project Subject to

Tender of The National Social Science Fund (No.

12&ZD32), and Science and Technology Project

on Heritage Conservation (Detection and Simulation

Technology for Heritage Preservation Environment in

Archaeological Sites), Zhejiang Province.

REFERENCES

Agrawal, A., Raskar, R., and Chellappa, R. (2006). What

is the range of surface reconstructions from a gradient

ﬁeld? In ECCV, pages 578–591. Springer.

Aliaga, D. G. and Xu, Y. (2010). A self-calibrating method

for photogeometric acquisition of 3d objects. IEEE

Transactions. Pattern Analysis and Machine Intelli-

gence, 32(4):747–754.

Banerjee, S., Sastry, P., and Venkatesh, Y. (1992). Surface

reconstruction from disparate shading: An integration

of shape-from-shading and stereopsis. IAPR, 1:141–

144.

Bernardini, F., Rushmeier, H., Martin, I. M., Mittleman, J.,

and Taubin, G. (2002). Building a digital model of

michelangelo’s ﬂorentine pieta. IEEE Transactions.

Computer Graphics and Applications, 22(1):59–67.

Birkbeck, N., Cobzas, D., Sturm, P., and Jagersand, M.

(2006). Variational shape and reﬂectance estimation

under changing light and viewpoints. In ECCV, pages

536–549. Springer.

Herbort, S. and W

ohler, C. (2011). An introduction to

image-based 3d surface reconstruction and a survey of

photometric stereo methods. 3D Research, 2(3):1–17.

Hern

andez, C., Vogiatzis, G., and Cipolla, R. (2008). Multi-

view photometric stereo. IEEE Transactions. Pattern

Analysis and Machine Intelligence, 30(3):548–554.

Higo, T., Matsushita, Y., Joshi, N., and Ikeuchi, K. (2009).

A hand-held photometric stereo camera for 3-d mod-

eling. In ICCV, pages 1234–1241. IEEE.

Horn, B. K. (1990). Height and gradient from shading. In-

ternational Journal of Computer Vision, 5(1):37–75.

Horn, B. K. and Brooks, M. J. (1989). Shape from shading.

MIT press.

Hornung, A. and Kobbelt, L. (2006). Hierarchical volu-

metric multi-view stereo reconstruction of manifold

surfaces based on dual graph embedding. In CVPR,

volume 1, pages 503–510. IEEE.

Ishikawa, H. (2003). Exact optimization for markov random

ﬁelds with convex priors. IEEE Transactions. Pat-

tern Analysis and Machine Intelligence, 25(10):1333–

1336.

Kolev, K., Pock, T., and Cremers, D. (2010). Anisotropic

minimal surfaces integrating photoconsistency and

normal information for multiview stereo. In ECCV,

pages 538–551. Springer.

Ladikos, A., Benhimane, S., and Navab, N. (2008). Multi-

view reconstruction using narrow-band graph-cuts

and surface normal optimization. In BMVC, pages 1–

10.

Lange, H. (1999). Advances in the cooperation of shape

from shading and stereo vision. In Second Interna-

tional Conference on 3-D Digital Imaging and Mod-

eling, pages 46–58. IEEE.

AConvexFrameworkforHighResolution3DReconstruction

323

Lu, Z., Tai, Y.-W., Ben-Ezra, M., and Brown, M. S. (2010).

A framework for ultra high resolution 3d imaging. In

CVPR, pages 1205–1212. IEEE.

Nehab, D., Rusinkiewicz, S., Davis, J., and Ramamoor-

thi, R. (2005). Efﬁciently combining positions and

normals for precise 3d geometry. ACM Transactions.

Graphics, 24(3):536–543.

Pock, T., Schoenemann, T., Graber, G., Bischof, H., and

Cremers, D. (2008). A convex formulation of contin-

uous multi-label problems. In ECCV, pages 792–805.

Springer.

Salvi, J., Fernandez, S., Pribanic, T., and Llado, X. (2010).

A state of the art in structured light patterns for surface

proﬁlometry. Pattern recognition, 43(8):2666–2680.

Scharstein, D. and Szeliski, R. (2002). A taxonomy and

evaluation of dense two-frame stereo correspondence

algorithms. International Journal of Computer Vision,

47(1-3):7–42.

Seitz, S. M., Curless, B., Diebel, J., Scharstein, D., and

Szeliski, R. (2006). A comparison and evaluation

of multi-view stereo reconstruction algorithms. In

CVPR, volume 1, pages 519–528. IEEE.

Sinha, S. N. and Pollefeys, M. (2005). Multi-view recon-

struction using photo-consistency and exact silhouette

constraints: A maximum-ﬂow formulation. In ICCV,

volume 1, pages 349–356. IEEE.

Vogiatzis, G., Hern

andez, C., Torr, P. H., and Cipolla, R.

(2007). Multiview stereo via volumetric graph-cuts

and occlusion robust photo-consistency. IEEE Trans-

actions. Pattern Analysis and Machine Intelligence,

29(12):2241–2246.

Woodham, R. J. (1980). Photometric method for determin-

ing surface orientation from multiple images. Optical

Engineering, 19(1):191139–191139.

Wu, C., Liu, Y., Dai, Q., and Wilburn, B. (2011). Fus-

ing multiview and photometric stereo for 3d re-

construction under uncalibrated illumination. IEEE

Transactions. Visualization and Computer Graphics,

17(8):1082–1095.

Yu, T., Ahuja, N., and Chen, W.-C. (2006). Sdg cut: 3d

reconstruction of non-lambertian objects using graph

cuts on surface distance grid. In CVPR, volume 2,

pages 2269–2276. IEEE.

Yuan, J., Bae, E., and Tai, X.-C. (2010). A study on con-

tinuous max-ﬂow and min-cut approaches. In CVPR,

pages 2217–2224. IEEE.

VISAPP2015-InternationalConferenceonComputerVisionTheoryandApplications

324