Global Patch Search Boosts Video Denoising

Thibaud Ehret, Pablo Arias and Jean-Michel Morel

CMLA, ENS Cachan, Cachan, France

{thibaud.ehret, pablo.arias, morel}@cmla.ens-cachan.fr

Keywords:

Video Denoising, Patch-based Methods, Patch Search, Nearest Neighbors Search.

Abstract:

With the increasing popularity of mobile imaging devices and the emergence of HdR video surveillance, the

need for fast and accurate denoising algorithms has also increased. Patch-based methods, which are currently

state-of-the-art in image and video denoising, search for similar patches in the signal. This search is generally

performed locally around each target patch for obvious complexity reasons. We propose here a new and

efﬁcient approximate patch search algorithm. It permits for the ﬁrst time to evaluate the impact of a global

search on the video denoising performance. A global search is particularly justiﬁed in video denoising, where

a strong temporal redundancy is often available. We ﬁrst verify that the patches found by our new approximate

search are far more concentrated than those obtained by exact local search, and are obtained in comparable

time. To demonstrate the potential of the global search in video denoising, we take two patch-based image

denoising algorithms and apply them to video. While with a classical local search their performance is poor,

with the proposed global search they even improve the latest state-of-the-art video denoising methods.

1 INTRODUCTION

Patch-based methods are among the state-of-the-art

both in image (Dabov et al., 2007d; Lebrun et al.,

2013a; Mairal et al., 2009) and video denoising

(Dabov et al., 2007a; Maggioni et al., 2012a; Li et al.,

2011; Buades et al., 2016). These methods exploit the

self-similarity of images and videos and ﬁlter together

groups of similar patches which are then aggregated

to create an estimate of the clean signal. Most contri-

butions in the area have focused on how to model and

ﬁlter the groups of similar patches, but little attention

has been given to how these groups are built. Typi-

cally similar patches are grouped by selecting a refer-

ence patch and searching exhaustively for the similar

ones in a local 2D, or 3D for videos, neighborhood.

The size of the search region is a parameter of the al-

gorithm which trades off quality of the result for com-

putational cost.

In the case of a single image, a local search region

is justiﬁed by the fact that similar patches are likely

to be close to each other in the image domain. Videos

however, have an additional strong source of redun-

dancy given by the temporal consistency. A patch is

expected to have similar patches along its motion tra-

jectory, even in distant frames. It seems intuitive that

patch-based methods should beneﬁt from this larger

set of similar exemplars. Some methods estimate the

motion in the video to tackle this problem. A mo-

tion compensated search window can track the patch

trajectories for a certain number of frames (Liu and

Freeman, 2010b; Buades et al., 2016). Nevertheless,

the size of these search regions is still limited by the

computational cost and the accumulation of errors in

the estimated motion.

In this work we focus on the patch search. We

present an efﬁcient global approximate search tech-

nique and demonstrate its impact on video denoising.

To that end we take two patch-based image denoising

methods, namely BM3D (Dabov et al., 2007d) and

NL-Bayes (Lebrun et al., 2013a) and adapt them to

video (simply by searching similar patches in multi-

ple frames instead of just the current one). We provide

an extensive experimental evaluation in grayscale and

color sequences. Our results show that substantial

gains in performance are obtained by searching glob-

ally in the video sequence, indicating that video de-

noising still has signiﬁcant room for improvement

by using clever global search methods. In partic-

ular, the NL-Bayes method with global search out-

performs state-of-the-art methods such as V-BM3D

(Dabov et al., 2007b) and V-BM4D (Maggioni et al.,

2012b) by a signiﬁcant margin, and the recently pro-

posed SPTWO (Buades et al., 2016) by a lower mar-

gin.

In recent years several efﬁcient techniques for ap-

proximate nearest neighbor search have been pro-

124

Ehret T., Arias P. and Morel J.

Global Patch Search Boosts Video Denoising.

DOI: 10.5220/0006175601240134

In Proceedings of the 12th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP 2017), pages 124-134

ISBN: 978-989-758-225-7

Figure 1: The plots show the position in the spatio-temporal video domain of the matches found for a sample patch query for

different search methods. From left to right: the best matches found with a global exhaustive search, with a local exhaustive

search in a window centered at the query, and with the VPLR search, the heuristic proposed in this paper. Notice how the

latter discovers the patch trajectories similar to those of the global exhaustive search.

posed, after the introduction of the PatchMatch algo-

rithm by Barnes et al. (Barnes et al., 2009). These

methods compute a nearest neighbor ﬁeld for patches

located in a dense or semi-dense grid, and use heuris-

tics that beneﬁt from the overlap of adjacent patches

in the grid. Most of these works focus on ﬁnding the

nearest neighbor, but they can be extended to handle

k nearest neighbors (Barnes et al., 2010). In practice

k is kept small since even with these efﬁcient tech-

niques, computing a large number k of nearest neigh-

bors for a dense grid of patches remains too costly.

In this work we focus on a signiﬁcantly different

problem: compute a large number, namely k, of near-

est neighbors for a single query patch. By allowing

independent queries, our technique is more ﬂexible,

and is straightforward to apply to patch-based denois-

ing methods. In particular, this is useful for certain de-

noising algorithms which save computations by pro-

cessing a sparse set of reference patches dynamically

determined during the execution of the algorithm, and

also makes parallelization easier.

The rest of the paper is organized as follows: in

§2 we describe brieﬂy two state-of-the-art image de-

noising algorithms (NL-Bayes (Lebrun et al., 2013a)

and BM3D (Dabov et al., 2007c)) which we selected

to demonstrate our global search. In §3 we pro-

pose a new heuristic to accelerate approximate near-

est neighbor search for patches in images and videos.

A comparison with state-of-the-art methods is per-

formed in §4. Concluding remarks are given in §5.

2 NL-Bayes AND BM3D

BM3D and NL-Bayes are patch based methods which

follow the same overall framework to denoise an im-

age. For each patch in a set of patches to be de-

noised, they ﬁrst search for similar patches inside the

image. This search region is usually rectangular and

centered on the query patch. The patches found dur-

ing the search are then processed to compute an es-

timate of the corresponding clean patches. The de-

noised patches are then aggregated on an basic esti-

mate of the clean image. This process is then iterated

once. In the second step, the basic estimate is used as

a pilot for the patch search and the processing of the

similar patches.

To test the proposed global search, we consider

extensions to video of these algorithms by searching

for similar 2D patches in a spatio-temporal volume

in the video, as proposed in (Dabov et al., 2007b),

(Arias and Morel, 2015). This framework is presented

in Algorithm 1.

The main difference between BM3D and NL-

Bayes algorithms lies in the processing of the set of

similar patches. NL-Bayes learns a Gaussian a priori

model for the set of patches and computes the patches

as the maximum a posteriori estimate. BM3D stacks

the patches in a 3D signal which is denoised using

shrinkage on a transformed domain.

We use a slight modiﬁcation of video NL-Bayes

(Arias and Morel, 2015) which caps the rank of the

patch covariance matrices for the groups of similar

patches. This improves both performance and speed

of this algorithm, and adds a rank parameter r to each

step.

We modiﬁed the search (corresponding to steps

2 and 7 in Algorithm 1) by considering three differ-

ent approaches: a local approach which uses the lo-

cal search in a spatio-temporal volume centered at

the reference patch for both denoising iterations; a

global approach which searches in the full video vol-

ume for both steps of the algorithm; and mixed ap-

proach which uses the local search in the ﬁrst step

and the global in the second iteration (step 7 in the

pseudocode). The global patch search heuristic is pre-

sented in Section 3.

In the Section 4 we shall compare these ap-

proaches among them and also with other video

extensions of BM3D and NL-Bayes which use so-

Global Patch Search Boosts Video Denoising

125

Algorithm 1: Image/video denoising framework.

Require: Noisy image/video v, noise level σ

Ensure: Estimate of noiseless image/video ˆv

1: for all patch q in v do

2: Retrieve n nearest neighbors to q

3: Process the set of similar patches and compute

a denoised estimate q

of q

4: Aggregate estimated patches on v

to compute

the basic estimate

5: end for

6: for all patch q in v

7: Retrieve n nearest neighbors to q

8: Process the set of similar patches and compute

a denoised estimate ˆq of q

9: Aggregate estimated patches on ˆv to compute

the ﬁnal estimate

10: end for

11: return ˆv

phisticated local patch search regions, V-BM3D and

SPTWO.

V-BM3D (Dabov et al., 2007b) is the direct ex-

tension of BM3D to video using predictive block

matching to deﬁne the search region. An initial set

of matches is ﬁrst found in a 7 × 7 neighborhood

of the reference patch. Then, for the adjacent for-

ward/backward frames, the search is carried out in

the union of 5 × 5 neighborhoods centered around

the position of the matches found in the preced-

ing/subsequent frame. This scheme can track patches

on moving objects as long as the displacement be-

tween frames is smaller than two pixels while keeping

the search region small. V-BM4D (Maggioni et al.,

2012b) is an extension of V-BM3D for 3D patches.

SPTWO (Buades et al., 2016) is based on NL-

Bayes but performs a more elaborate search. First,

the optical ﬂow towards the adjacent 6 frames is com-

puted and used to warp them to the reference frame.

A s ×s ×13 3D patch is then associated to each s ×s

patch in the reference frame by extending its tempo-

ral dimension on the volume deﬁned by the warped

frames. Then, k extended patches closest to the ref-

erence one are searched for in a local neighborhood.

The ﬁnal set of matches is given by the 13k 2D slices

from the newly found extended patches (some of them

might be discarded by an occlusion detection step).

The usage of these extended patches reduces the noise

in the distance while still keeping a small spatial patch

(the patch spatial size is s = 5). Furthermore, the

patches are warped by the optical ﬂow. This is use-

ful in cases where the motion is not translational. On

the downside, this method relies heavily on the optical

ﬂow.

3 HEURISTICS FOR GLOBAL

PATCH SEARCH

There are already a number of efﬁcient patch search

techniques which exploit the so-called image co-

herency: Neighboring query patches, since they over-

lap, have high chances of having neighboring matches

in the database image. Thus knowing the position of

a good match for a patch helps in determining good

matches for its neighbors. This idea was ﬁrst ap-

plied by (Barnes et al., 2009) to compute a nearest

neighbor ﬁeld (NNF), assigning the closest k patches

to each patch in the image. The original algorithm

was presented for images but can easily be extended

to video. More recent works reported signiﬁcant im-

provements (between one and two orders of magni-

tude) by combining PatchMatch with more classic

search data structures such as partition trees (more

speciﬁcally KD-trees) (Olonetsky and Avidan, 2012;

He and Sun, 2012) and locality sensitive hashing (Ko-

rman and Avidan, 2011; Barnes et al., 2015).

All these methods compute a dense k-NNF, typ-

ically for a small k. The reason is that when k is

large (e.g. k > 20) computing a dense NNF is too

costly. In such cases it is preferable (if the application

allows to) to compute the k nearest neighbors for a

small set of patches. In particular, for image/video de-

noising, a common speed-up strategy of patch-based

methods is to reduce the number of query patches. For

instance, in (Dabov et al., 2007d) the query patches

form a regular subgrid, and in (Lebrun et al., 2013a)

the query patches are irregularly located and are de-

termined during the evolution of the algorithm. To

handle such cases, it would be desirable to be able to

conduct independent patch queries. There is a vast lit-

erature on data structures for nearest neighbor search

on generic metric spaces or vector spaces; but these

classical tools do not exploit the image coherency. In

this section we brieﬂy review one of such approaches,

namely partition trees, and show a simple yet effective

modiﬁcation for a fast approximate k-nearest neigh-

bor search of image patches.

3.1 Partition Trees

A partition tree is an inductive data structure encoding

the position of a set of n points in R

(the database).

Once the partition tree has been built it is used to

search for the nearest neighbors of a point (the query).

Nodes in the tree can either be leaves (also called

bins) containing a maximum number of elements, or

a split value between two subtrees: the “left” subtree

and the “right” subtree. A partition tree splits recur-

sively the data space by applying a simple split at each

VISAPP 2017 - International Conference on Computer Vision Theory and Applications

126

node (with the exception of the leaves). At each split-

ting operation, the set of elements in the current sub-

tree is split in two equally sized subsets which are then

used to construct the child subtrees. The construction

of the tree is fully speciﬁed by a split value function

and a split function. The split value function assigns

a split value to a set of elements, whereas the split

function assigns one of two groups to an element.

A partition tree can be directly used to search for

the exact k-nearest neighbors with expected complex-

ity of O(log(n)), where n is the number of points. In

practice when the dimensionality of the elements is

too large, its performance drops and becomes compa-

rable to a linear search, this problem was shown with

image patches by Kumar et al. in (Kumar et al., 2008).

This is indeed the case for image and video patches,

so we shall rule out exact search and settle with the

so-called ﬁrst bin heuristic: The candidates for the k-

nearest neighbors are taken only from the bin of the

query patch. On its own, the ﬁrst bin heuristic does

not sufﬁce to provide good quality matches, but this

will be solved by combining it with other heuristics

that exploit the fact that patches lie on images.

3.2 Partition Tree Search with Local

Reﬁnement

We propose to use the ﬁrst bin search in a partition

tree (or a forest) to obtain a ﬁrst set of m ≈ k initial

candidates matches, and reﬁne this set of candidates

as follows: For each of the elements of the candidate

list, we search in a small local region in the image cen-

tered at the candidate. We call the resulting strategy

Partition Tree with Local Reﬁnement (PTRL). The

complete pseudocode for this technique is presented

in Algorithm 2.

Note that the proposed approximate search heuris-

tic can in principle be applied in conjunction to other

data structures for nearest neighbor search such as

those based on hashing (Andoni and Indyk, 2006; An-

doni et al., 2014). We do not pursue this in the present

work.

3.3 Search Parameters

The choice of the partition tree has a strong impact on

the performance. The most common partition tree is

the KD-tree (Bentley, 1975). However, it has been

shown that VP-trees (Yianilos, 1993) produce bet-

ter results when working with image patches (Kumar

et al., 2008).

The VP-tree is characterized by the split value be-

ing a hyperball and the split function being the indi-

cator function of this ball. The VP-tree splits the

Algorithm 2: PTLR search heuristic.

Require: v an input video, F a Partition tree forest

constructed with the patches from v, p a request

patch from v, κ×κ the size of local search region

Ensure: A list of matches for p

1: Retrieve the list {ϕ

,...,ϕ

} of k best matches

from the forest F using the retrieval algorithm

from a partition tree forest

2: for i = 1 to k do

3: Search in the image region of size κ ×κ cen-

tered in ϕ

for better matches

4: end for

5: return the list {ϕ

,. . .,ϕ

} of k best matches af-

ter the update using the PTLR search

data set according to the distance of each point to a

vantage point. The vantage point is one of the data

points, chosen according to some criteria. By ran-

domizing the construction of these trees, forests can

be built, therefore improving the quality of the el-

ements retrieved using the ﬁrst bin search. Forests

of VP-trees where the vantage point is chosen at ran-

dom have been shown to have a good retrieval power

(O’Hara et al., 2013).

For our experiments, we used a forest of four VP-

trees, randomized as in (O’Hara et al., 2013). We set

the size of the bins to 2n, where n is the number of

nearest neighbors of the query. The trees are con-

structed using all patches in the video. For the query,

the local reﬁnement area is of size 8×8×3. We found

these parameters to give a reasonable trade-off be-

tween computational cost and search accuracy. In the

following, we will use the abbreviation VPLR instead

PTLR to remind that the partition tree is speciﬁcally

a VP-tree.

0

0.01

0.02

0.03

0.04

0.05

0.06

0 50 100 150 200 250 300

Globalexhaustivesearch

Localsearch

VP-treeforestsearch

VPLRsearch

Figure 2: Comparison of the quality of the matches. The

plots show the normalized distance to the ith nearest neigh-

bor i = 1,...,300, averaged over 1000 query patches sam-

pled randomly in the bus video.

In Figure 2 we compare the results obtained with

Global Patch Search Boosts Video Denoising

127

the local search, the “ﬁrst bin” search, and the pro-

posed VPLR search for the classical test sequence

bus

. The plot shows the Euclidean distances to the

n = 300 nearest neighbors averaged over 1000 query

patches randomly chosen in the same video. The

patches are of size s ×s with s = 10, RGB (thus their

dimensionality is 300) and their distance is rescaled

between 0 and 1 as d(p,q) = kp −qk/(255

√

3s).

The ﬁrst conclusion is that the local exhaustive

search ﬁnds worse matches than the approximate

search methods (the search window of the local search

is of size 45 ×45 ×5). The best method is the VPLR

search, performing much better than the basic VP-

forest “ﬁrst bin” approximate search. This result

shows the effectiveness of the proposed search heuris-

tic. We computed the corresponding plot on the other

videos and always found the same qualitative behav-

ior.

It is also interesting to visualize the position of the

matches found by each method in the spatio-temporal

domain of the video. Figure 1 presents the position of

the nearest patches found for a speciﬁc query in the

bus video. Note how the matches found by searching

globally are organized in trajectories.

For the same parameters, the number of distance

computations for each method is 10125 and around

20000 for respectively the local search and the VPLR

search for the ﬁrst step of the algorithm (an equivalent

local search region would be close to 63 ×63 ×5).

4 EXPERIMENTAL RESULTS

We evaluated the effect of the global search on the

BM3D and NL-Bayes image denoising algorithms.

Since the source code of BM3D is not public we use

the implementation available in (Lebrun, 2012). We

adapted it to process image sequences and modify

only the patch search as explained in §2. For NL-

Bayes (referred as NLB in the following), we built

our implementation upon (Lebrun et al., 2013b) , and

modiﬁed to limit the rank of the a priori covariance

matrix (see §2).

For each method we considered the three versions

depending on the type of search used in each step:

the local and global versions use the correspond-

ing search in both denoising steps, and an additional

mixed approach, which uses the local search in the

ﬁrst step and the global one in the second. The reason

for this will be explained later.

We compared these methods against V-BM3D

(Dabov et al., 2007b), V-BM4D (Maggioni et al.,

https://media.xiph.org/video/derf/

2012b) and SPTWO (Buades et al., 2016), which rep-

resent the current state-of-the-art in video denoising.

Regarding the parameters, we considered 2D

patches of size 10 ×10 for NL-Bayes and 8 ×8 for

BM3D. For the local search we used a window of size

45 ×45 ×5 for NL-Bayes and 32 ×32 ×5 for BM3D.

The remaining parameters for NL-Bayes are the

number of similar patches used in each step n

and

the maximum ranks of the a priori covariance matrix

. For BM3D we also needed to specify n

, in

addition to the hard threshold in the ﬁrst step λ

well as the distance threshold in both steps, τ

and τ

The values for λ

, τ

and τ

are the same as the ones

in VBM3D. The rest of the parameters were tuned by

optimizing the PSNR on a training set consisting of

short videos. The optimal parameters depend on the

noise level. Table 1 synthesizes the different parame-

ters as a function of the noise.

32.5

33.5

34.5

35.5

36.5

0 20 40 60 80 100 120 140 160

VBM3D

VBM4D

NL-Bayes local

NL-Bayes global

BM3D local

BM3D mix

SPTWO

Figure 3: Comparison of the PSNR frame by frame for dif-

ferent methods on the grayscale bus sequence with noise

10.

29.5

30.5

31.5

32.5

33.5

0 20 40 60 80 100 120 140 160

VBM4D

NL-B local

NL-B global

BM3D local

BM3D mix

Figure 4: Comparison of the PSNR frame by frame for dif-

ferent methods on color mobile sequence with noise 20.

These patches are somewhat larger than the typ-

ical sizes used by patch-based methods. The reason

for this is that the global search is more sensitive to

the noise in the patch distance. Consider for example

a noisy ﬂat image. Since the image is ﬂat, the closest

VISAPP 2017 - International Conference on Computer Vision Theory and Applications

128

Table 1: Parameters used for the algorithms used.

Method σ n

NLB-local grayscale

all 160 40 +

(σ −6) max{8, 48 −

(σ −6)} max{8, 16 −

(σ −6)}

NLB-global grayscale

6 10 100 60 16 16

> 10 100 60 16 8

NLB-local/global color

all 350 150 full rank 40 −(σ −10)/3

BM3D-mix

6 20 4 ×2

σ/10

32 - -

> 20 32 32 - -

BM3D-global

all 32 32 - -

neighbors to a patch are the ones with the closest noise

pattern. If enough nearest neighbors share the same

noise pattern, it will be interpreted as a signal com-

ponent and will not be ﬁltered out. The global search

increases the probability of ﬁnding a large number of

matches with a similar noise pattern. In practice, this

becomes an issue for patches in homogeneous regions

and high noise values. This problem can be mitigated

by involving a larger number of similar patches. An

interesting aspect of global search is that it is still pos-

sible to ﬁnd many similar patches even for large patch

sizes.

Our quantitative comparison criterion was the

PSNR. We ﬁrst evaluated the effect of the global

search for NL-Bayes and BM3D. Tables 2 and 3 show

the gain in PSNR with a global search, on grayscale

and color sequences.

. In almost all cases the global

search performed signiﬁcantly better than the local

for NL-Bayes. This is also true, but to a minor ex-

tent for BM3D. For NL-Bayes the highest gain was

obtained using the global search in both steps of the

denoising algorithm: The average gain between NLB-

global and NLB-local is of around 1dB for grayscale

sequences and of 1.5dB for color sequences. This

gain is consistent across the different noise levels we

used in our tests. For BM3D the best alternative is

BM3D-mix, which uses the global search only in the

second step. The performance of BM3D-global is su-

perior for σ = 10, but drops severely when the noise

increases, becoming comparable or even worse than

BM3D-local for the highest levels of noise.

A possible reason for this is that BM3D uses a

much smaller number of similar patches n than NL-

Bayes. As explained before, for patches with low

SNR, the global search increases the risk of ﬁnding

a set of nearest neighbors sharing a similar noise pat-

tern, particularly for small n.

The color sequences are from https://media.xiph.org/

video/derf/. The grayscale sequences are from http://www.

cs.tut.ﬁ/

∼

foi/GCF-BM3D/, except for football and mobile,

which have been obtained by averaging the channels from

the corresponding RGB sequences.

Note also that the performance of BM3D is worse

than that of NL-Bayes, and in most cases worse than

the authors’ implementation, denoted V-BM3D see

Table 4. Our version of BM3D is an adaptation to

video of the one published in (Lebrun, 2012). In par-

ticular, the search strategies of our BM3D-local and

V-BM3D differ, since V-BM3D uses the predictive

block matching described in §2.

We compared the performance of NLB-local and

NLB-global with the state-of-the-art methods V-

BM3D (Dabov et al., 2007b), V-BM4D (Maggioni

et al., 2012b) for grayscale (Table 4) and color (Table

5) videos. The sequences used in these tables were the

same as in Tables 2 and 3. The results of V-BM3D

and V-BM4D were computed using the authors’ im-

plementation.

For grayscale sequences and σ = 10,

NLB-global has the best performance. The exception

is tennis, where both NLB variants show problems in

reconstructing the texture of a wallpaper present in the

scene (possibly due to the use of too large patches.)

On average NLB-global has a PSNR .78dB higher

than V-BM4D. When the noise increases, the gap be-

tween NLB-global and the V-BMxD methods closes.

For most sequences, better results can be obtained

with NLB using a larger patch for higher noise lev-

els. This suggest that the problem comes from the

distance estimation in these high noise cases.

We also include a comparison with SPTWO in Ta-

ble 6. We computed the result of our algorithm for

some of the sequences used in (Buades et al., 2016).

Note that the sequences in Table 6 have 30 frames and

that the values shown correspond to the PSNR of the

central frame of the sequence. The results depends

largely on the sequence. In particular SPTWO per-

forms better in tennis, and bus. The fact that the bus

sequence has a very fast motion which can be easily

estimated might explain the perfomance gap in favor

of motion estimation on this sequence.

For V-BM3D we show only grayscale results since

there are no Linux binaries for the color version of V-

BM3D.

http://www.cs.tut.ﬁ/

∼

foi/GCF-BM3D/

Global Patch Search Boosts Video Denoising

129

Table 2: Comparison between search strategies on grayscale sequences. For each sequence and noise level, we show the

PSNR obtained with the local search, and the difference in PSNR between each global search strategy and the local search.

σ Method Bus Fore. Sales. Tennis Foot. Mobi. Ave.

NLB-local 34.85 36.33 35.87 33.94 35.29 34.29 35.10 ± 0.92

NLB-mix 0.54 0.36 0.90 0.42 0.06 1.30 0.60 ± 0.44

NLB-global 0.94 0.50 2.01 0.64 -0.02 1.60 0.95 ± 0.75

BM3D-local 34.25 35.77 35.79 33.55 35.15 32.97 34.58 ± 1.18

BM3D-mix 0.15 0.55 1.08 0.09 -0.08 0.81 0.43 ± 0.46

BM3D-global 0.09 0.92 1.66 -0.03 -0.19 1.54 0.67 ± 0.82

NLB-local 30.75 32.59 32.06 30.12 31.36 29.80 31.11 ± 1.09

NLB-mix 0.53 0.62 1.15 0.50 0.11 1.91 0.80 ± 0.64

NLB-global 0.70 0.69 1.96 0.70 -0.05 2.46 1.08 ± 0.94

BM3D-local 30.28 32.17 31.84 29.90 31.23 29.02 30.74 ± 1.21

BM3D-mix 0.17 0.52 0.74 0.19 0.05 0.81 0.41 ± 0.32

BM3D-global 0.05 0.50 0.78 0.13 -0.14 1.52 0.47 ± 0.61

NLB-local 28.46 30.53 29.86 27.99 29.26 26.95 28.84 ± 1.30

NLB-mix 0.46 0.06 0.55 0.63 -0.14 2.24 0.63 ± 0.84

NLB-global 0.41 0.04 0.84 0.73 -0.33 2.68 0.73 ± 1.05

BM3D-local 28.06 29.92 29.38 28.12 29.18 26.47 28.52 ± 1.24

BM3D-mix 0.23 0.48 0.63 0.45 0.15 0.65 0.43 ± 0.20

BM3D-global 0.00 0.07 0.38 0.34 -0.19 0.92 0.25 ± 0.39

Table 3: Comparison between search strategies on color sequences. For each sequence and noise level, we show the PSNR

obtained with the local search, and the difference in PSNR between each global search strategy and the local search.

σ Method Bus City Cont. Mobile Tennis Fore. Coast. Ave.

NLB-local 36.47 37.35 38.29 34.76 35.30 38.34 36.60 36.73 ± 1.38

NLB-mix 0.64 1.37 1.16 1.46 0.74 1.18 0.52 1.01 ± 0.37

NLB-global 0.83 2.06 1.72 2.25 0.97 1.64 0.75 1.46 ± 0.61

BM3D-local 35.57 36.50 37.26 33.59 34.61 37.60 35.73 35.84 ± 1.43

BM3D-mix 0.22 0.73 1.08 0.44 0.40 0.57 0.22 0.52 ± 0.31

BM3D-global 0.12 1.02 1.44 0.76 0.19 0.70 0.21 0.63 ± 0.49

NLB-local 32.42 33.31 34.47 30.74 31.52 35.09 32.69 32.89 ± 1.54

NLB-mix 0.73 1.76 1.77 1.58 0.67 1.04 0.67 1.17 ± 0.51

NLB-global 0.90 2.34 2.34 2.58 0.84 1.37 0.86 1.60 ± 0.79

BM3D-local 31.72 32.75 33.68 29.75 30.87 34.39 31.99 32.16 ± 1.60

BM3D-mix 0.21 0.71 1.30 0.55 0.24 0.72 0.26 0.57 ± 0.39

BM3D-global -0.01 0.43 1.68 1.21 -0.04 0.62 0.17 0.58 ± 0.65

NLB-local 28.52 29.24 30.79 26.62 28.25 32.18 29.09 29.24 ± 1.80

NLB-mix 0.71 1.32 2.33 1.93 0.64 0.83 0.75 1.22 ± 0.67

NLB-global 0.73 1.28 2.78 2.93 0.59 0.82 0.82 1.42 ± 1.00

BM3D-local 27.89 28.45 30.23 26.07 27.62 31.04 28.34 28.52 ± 1.66

BM3D-mix 0.35 0.48 1.49 0.80 0.53 1.00 0.47 0.73 ± 0.40

BM3D-global -0.13 -0.54 1.79 1.28 0.25 0.54 0.11 0.47 ± 0.81

Two examples of denoised results are presented in

Figures 5 and 6. Comparing the different results of

denoising, we can see that the VPLR search allows

a better detail reconstruction. In particular, in Fig-

ure 5 the numbers of the calendar do not show a blur

around them compared to the other methods based on

VISAPP 2017 - International Conference on Computer Vision Theory and Applications

130

Table 4: Comparison with V-BM3D and V-BM4D on grayscale sequences. PSNR of the full sequence. See text for details.

Results with a star were computed using the binary provided by the author.

σ Method Bus Fore. Sales. Tennis Foot. Mobi. Ave.

V-BM3D* 33.32 36.02 37.21 34.68 34.82 34.09 35.02 ± 1.39

V-BM4D-mp* 33.85 36.36 37.48 34.78 34.95 34.11 35.26 ± 1.40

NLB-local 34.85 36.33 35.87 33.94 35.29 34.29 35.09 ± 0.91

NLB-global 35.79 36.83 37.88 34.58 35.27 35.89 36.04 ± 1.07

V-BM3D* 29.57 32.87 34.04 31.20 31.04 30.35 31.51 ± 1.65

V-BM4D-mp* 30.00 33.11 33.46 30.70 31.06 30.49 31.47 ± 1.45

NLB-local 30.75 32.59 32.06 30.12 31.36 29.80 31.11 ± 1.09

NLB-global 31.45 33.28 34.02 30.82 31.31 32.26 32.19 ± 1.24

V-BM3D* 27.59 30.85 31.68 29.22 29.04 27.85 29.37 ± 1.62

V-BM4D-mp* 27.96 31.06 31.02 28.74 28.98 27.99 29.29 ± 1.41

NLB-local 28.46 30.53 29.86 27.99 29.26 26.95 28.84 ± 1.30

NLB-global 28.87 30.57 30.70 28.72 28.93 29.63 29.57 ± 0.88

Table 5: Comparison with V-BM4D on color sequences. PSNR of the full sequence. See text for details. Results with a star

were computed using the binary provided by the author.

σ Method Bus City Cont. Mobile Tennis Fore. Coast. Ave.

V-BM4D-mp* 35.39 37.14 38.78 34.18 35.91 37.95 36.05 36.49 ± 1.57

NLB-local 36.47 37.35 38.29 34.76 35.30 38.34 36.60 36.73 ± 1.38

NLB-global 37.30 39.41 40.01 37.01 36.27 39.98 37.35 38.19 ± 1.56

V-BM4D-mp* 31.35 33.41 34.94 30.47 31.99 34.53 32.14 32.69 ± 1.66

NLB-local 32.42 33.31 34.47 30.74 31.52 35.09 32.69 32.89 ± 1.54

NLB-global 33.32 35.65 36.81 33.32 32.36 36.46 33.55 34.50 ± 1.77

V-BM4D-mp* 29.04 31.04 32.63 28.35 29.73 32.54 29.97 30.47 ± 1.66

NLB-local 30.10 30.92 32.32 28.33 29.53 33.37 30.55 30.73 ± 1.69

NLB-global 30.92 32.79 34.97 31.17 30.24 34.42 31.38 32.27 ± 1.83

V-BM4D-mp* 27.44 29.31 30.94 26.79 28.15 31.08 28.49 28.89 ± 1.65

NLB-local 28.52 29.24 30.79 26.62 28.25 32.18 29.09 29.24 ± 1.80

NLB-global 29.25 30.52 33.57 29.55 28.84 33.00 29.91 30.66 ± 1.87

V-BM4D-mp* 26.24 27.97 29.60 25.52 27.03 29.90 27.34 27.66 ± 1.63

NLB-local 27.33 27.98 29.59 25.29 27.31 31.22 27.99 28.10 ± 1.88

NLB-global 27.99 28.77 32.39 28.22 27.86 31.87 28.79 29.41 ± 1.90

Table 6: Comparison with SPTWO. As in (Buades et al., 2016), only the ﬁrst 30 frames of the sequence are considered,

and the shown PSNRs corresponding to the frame 15 of each sequences (indexing starts with 1). The values in each cell

correspond to SPTWO, NLB-local and NLB-global.

σ Bus Tennis Salesman Bike Average

10 36.07 34.89 35.60 34.69 32.86 34.31 36.38 35.92 37.10 36.74 37.10 38.30 35.97 35.19 36.33

20 32.24 30.75 31.02 30.59 28.23 28.71 32.95 32.02 34.45 33.01 33.39 35.56 32.20 31.10 32.44

30 30.05 28.48 28.87 27.48 26.24 26.63 30.95 29.92 31.49 31.62 31.17 33.28 30.02 28.95 30.07

a local search. For the example from grayscale bus,

the improvements can be seen mostly for the woman

inside the bus, who is more distinct with the global

search than with the local search; but also on the ad

where most details of the singer’s face are better re-

constructed.

In section 3.3, we brieﬂy discussed the compu-

tation complexity (in number of distance computa-

Global Patch Search Boosts Video Denoising

131

Figure 5: Results of denoising for mobile (zoom on frame 37, noise 20). Top: ground truth, NL-Bayes local, BM3D local,

VBM4D; bottom: Noisy, NL-Bayes global and BM3D mix.

Figure 6: Results of denoising for grayscale bus (zoom on frame 70, noise 20). Top: ground truth, NL-Bayes local, BM3D

local, VBM4D; bottom: Noisy, NL-Bayes global, BM3D mix and VBM3D.

tion) of each search method per query. When these

searches are integrated into the denoising algorithm,

the full computation time (including the construction

of the VP-tree) is of the same order of magnitude than

the one when using the local search. Nevertheless,

NL-Bayes based methods are reasonably slower than

V-BM3D and V-BM4D when using the ”normal pro-

ﬁle”.

5 CONCLUSIONS

We studied the performance gain obtained by ex-

panding the local patch search into a global one for

patch-based video denoising algorithm. To the best

of our knowledge, this is the ﬁrst time that denois-

ing results using global patch search were reported

in videos with hundreds of frames. With the global

search the patches found can follow long trajectories

in the video, thus fully beneﬁting from the temporal

redundancy of videos.

Our analysis of the most common patch search al-

gorithms showed that an approach based on a global

tree structure, more speciﬁcally based on a VP-tree,

performed very well compared to the local search.

Exact global search in the VP-tree is still too costly

for the denoising application, which is why we pro-

posed a simple heuristic for efﬁcient approximate

search, the VPLR search (VP-tree search with local

reﬁnement).

VISAPP 2017 - International Conference on Computer Vision Theory and Applications

132

We then applied it to extend to video BM3D and

NL-Bayes, two image denoising algorithms, to video.

We obtained a signiﬁcant boost on the denoising per-

formance. This performance boost is only slightly

more costly than a local exhaustive search, including

the time spent building the tree thanks to an easy par-

allelization.

Latest contributions in video denoising advocate

for the use of 3D patches as a mechanism to im-

pose temporal consistency in the video (Protter and

Elad, 2009; Liu and Freeman, 2010a; Maggioni et al.,

2012a). Yet, in this work we showed that state-of-

the-art results can be obtained with 2D patches, using

global search. The results obtained are visually better

frame-by-frame, but can suffer from a ﬂickering arti-

fact due to the lack of temporal consistency. This is

most noticeable for higher values of noise. Ongoing

work focuses on extending the current results to 3D

patches and video speciﬁc algorithms. One of the cur-

rent limiting factors associated to the global search is

that it increases the risk of matching the noise pattern

for patches with low SNR. We were able to alleviate

this problem in most cases by using large 2D patches,

but this causes problems with random, low-contrasted

textures which are better denoised with small patches.

3D patches can reduce the spatial patch size while still

keeping accurate distances (same dimension than the

2D patches), and therefore be more appropriate for

these types of textures.

The proposed heuristics for approximate global

patch search are not limited to the denoising appli-

cation, and could be useful for other applications re-

quiring a large number of nearest neighbors but not

requiring a dense or semi-dense NNF.

ACKNOWLEDGEMENTS

This work is supported by the ”IDI 2016” project

funded by the IDEX Paris-Saclay, ANR-11-IDEX-

0003-02. This Work is also partly founded by

BPIFrance and R

egion Ile de France in the FUI 18

Plein Phare project, the Ofﬁce of Naval research

(ONR grant N00014-14-1-0023).

REFERENCES

Andoni, A. and Indyk, P. (2006). Near-optimal hashing al-

gorithms for approximate nearest neighbor in high di-

mensions. In Foundations of Computer Science, 2006.

FOCS’06. 47th Annual IEEE Symposium on, pages

459–468. IEEE.

Andoni, A., Indyk, P., Nguyen, H. L., and Razenshteyn, I.

(2014). Beyond locality-sensitive hashing. In Pro-

ceedings of the Twenty-Fifth Annual ACM-SIAM Sym-

posium on Discrete Algorithms, pages 1018–1028.

SIAM.

Arias, P. and Morel, J.-M. (2015). Towards a bayesian video

denoising method. In ACIVS, volume 9386 of Lecture

Notes in Computer Science, pages 107–117. Springer.

Barnes, C., Shechtman, E., Finkelstein, A., and Goldman,

D. (2009). Patchmatch: a randomized correspondence

algorithm for structural image editing. ACM Transac-

tions on Graphics-TOG, 28(3):24.

Barnes, C., Shechtman, E., Goldman, D. B., and Finkel-

stein, A. (2010). The generalized patchmatch cor-

respondence algorithm. In Computer Vision–ECCV

2010, pages 29–43. Springer.

Barnes, C., Zhang, F.-L., Lou, L., Wu, X., and Hu, S.-M.

(2015). Patchtable: Efﬁcient patch queries for large

datasets and applications. In ACM Transactions on

Graphics (Proc. SIGGRAPH).

Bentley, J. L. (1975). Multidimensional binary search trees

used for associative searching. Communications of the

ACM, 18(9):509–517.

Buades, A., Lisani, J. L., and Miladinovi, M. (2016). Patch-

based video denoising with optical ﬂow estimation.

IEEE Transactions on Image Processing, 25(6):2573–

2586.

Dabov, K., Foi, A., and Egiazarian, K. (2007a). Video de-

noising by sparse 3D transform-domain collaborative

ﬁltering. In EUSIPCO, pages 145–149.

Dabov, K., Foi, A., and Egiazarian, K. (2007b). Video de-

noising by sparse 3D transform-domain collaborative

ﬁltering. In Proc. 15th European Signal Processing

Conference, volume 1, page 7.

Dabov, K., Foi, A., Katkovnik, V., and Egiazarian, K.

(2007c). Image denoising by sparse 3-D transform-

domain collaborative ﬁltering. Image Processing,

IEEE Transactions on, 16(8):2080–2095.

Dabov, K., Foi, A., Katkovnik, V., and Egiazarian, K.

(2007d). Image denoising by sparse 3d transform-

domain collaborative ﬁltering. IEEE Trans. on IP,

16(8):2080–2095.

He, K. and Sun, J. (2012). Computing Nearest-Neighbor

Fields via Propagation-Assisted KD-trees. In Com-

puter Vision and Pattern Recognition (CVPR), 2012

IEEE Conference on, pages 111–118. IEEE.

Korman, S. and Avidan, S. (2011). Coherency sensitive

hashing. In Computer Vision (ICCV), 2011 IEEE In-

ternational Conference on, pages 1607–1614. IEEE.

Kumar, N., Zhang, L., and Nayar, S. (2008). What is a

good nearest neighbors algorithm for ﬁnding similar

patches in images? In Computer Vision–ECCV 2008,

pages 364–378. Springer.

Lebrun, M. (2012). An Analysis and Implementation of the

BM3D Image Denoising Method. Image Processing

On Line, 2:175–213.

Lebrun, M., Buades, A., and Morel, J.-M. (2013a). A Non-

local Bayesian Image Denoising Algorithm. SIAM

Journal on Imaging Sciences, 6(3):1665–1688.

Lebrun, M., Buades, A., and Morel, J.-M. (2013b). Imple-

mentation of the “Non-Local Bayes” (NL-Bayes) Im-

Global Patch Search Boosts Video Denoising

133

age Denoising Algorithm. Image Processing On Line,

3:1–42.

Li, W., Zhang, J., and Dai, Q.-H. (2011). Video denoising

using shape-adaptive sparse representation over simi-

lar spatio-temporal patches. Signal Processing: Image

Communication, 26:250–265.

Liu, C. and Freeman, W. T. (2010a). A high-quality video

denoising algorithm based on reliable motion estima-

tion. In ECCV, pages 706–719.

Liu, C. and Freeman, W. T. (2010b). A high-quality video

denoising algorithm based on reliable motion estima-

tion. In Computer Vision–ECCV 2010, pages 706–

719. Springer.

Maggioni, M., Boracchi, G., Foi, A., and Egiazarian, K.

(2012a). Video denoising, deblocking, and enhance-

ment through separable 4-D nonlocal spatiotemporal

transforms. IEEE Transactions on Image Processing,

21(9):3952–3966.

Maggioni, M., Boracchi, G., Foi, A., and Egiazarian, K.

(2012b). Video denoising, deblocking, and enhance-

ment through separable 4-d nonlocal spatiotemporal

transforms. Image Processing, IEEE Transactions on,

21(9):3952–3966.

Mairal, J., Bach, F., Ponce, J., Sapiro, G., and Zisserman, A.

(2009). Non-local sparse models for image restora-

tion. In Computer Vision, 2009 IEEE 12th Interna-

tional Conference on, pages 2272–2279.

O’Hara, S., Draper, B., et al. (2013). Are you using the right

approximate nearest neighbor algorithm? In Applica-

tions of Computer Vision (WACV), 2013 IEEE Work-

shop on, pages 9–14. IEEE.

Olonetsky, I. and Avidan, S. (2012). TreeCANN - kd-

tree Coherence Approximate Nearest Neighbor algo-

rithm. In Computer Vision–ECCV 2012, pages 602–

615. Springer.

Protter, M. and Elad, M. (2009). Image sequence denois-

ing via sparse and redundant representations. IEEE

Transactions on Image Processing, 18(1):27–35.

Yianilos, P. N. (1993). Data structures and algorithms for

nearest neighbor search in general metric spaces. In

SODA, volume 93, pages 311–321.

VISAPP 2017 - International Conference on Computer Vision Theory and Applications

134