NOISE REDUCTION AND VOICE SEPARATION ALGORITHMS
APPLIED TO WOLF POPULATION COUNTING
B. Dugnol, C. Fern´andez, G. Galiano and J. Velasco
Dpto. de Matemticas, Universidad de Oviedo, c/ Calvo Sotelo s/n, 33007 Oviedo, Spain
Keywords:
Time-frequency distribution, instantaneous frequency, signal separation, noise reduction, chirplet transform,
partial differential equation, population counting.
Abstract:
We use signal and image theory based algorithms to produce estimations of the number of wolves emitting
howls or barks in a given field recording as an individuals counting alternative to the traditional trace collecting
methodologies. We proceed in two steps. Firstly, we clean and enhance the signal by using PDE based image
processing algorithms applied to the signal spectrogram. Secondly, assuming that the wolves chorus may be
modelled as an addition of nonlinear chirps, we use the quadratic energy distribution corresponding to the
Chirplet Transform of the signal to produce estimates of the corresponding instantaneous frequencies, chirp-
rates and amplitudes at each instant of the recording. We finally establish suitable criteria to decide how such
estimates are connected in time.
1 INTRODUCTION
Wolf is a protected specie in many countries around
the world. Due to their predator character and to
their proximity to human settlements, wolves often
kill cattle interfering in this way in farmers’ econ-
omy. To smooth this interference, authorities reim-
burse the cost of these lost to farmers. Counting the
population of wolves inhabiting a region is, therefore,
not only a question of biological interest but also of
economic interest, since authorities are willing to es-
timate the budget devoted to costs produced by wolf
protection, see for instance (Skonhoft, 2006). How-
ever, estimating the population of wild species is not
an easy task. In particular, for mammals, few and not
very precise techniques are used, mainly based on the
recuperation of field traces, such as steps, excrements
and so on. Our investigation is centered in what it
seems to be a new technique, based on signal and
image theory methods, to estimate the population of
species which fulfill two conditions: living in groups,
for instance, packs of wolves,and emitting some char-
acteristic sounds, howls and barks, for wolves. The
basic initial idea is to produce, from a given record-
ing, some time-frequencydistribution which allows to
identify the different howls corresponding to different
individuals by estimating the instantaneous frequency
(IF) lines of their howls.
Unfortunately, the real situation is somehow more
involved due mainly to the following two factors. On
one hand, since natural sounds, in particular wolf
howling, are composed by a fundamental pitch and
several harmonics, direct instantaneous frequency es-
timation of the multi-signal recording leads to an
over-counting of individuals since various IF lines
correspond to the same individual. Therefore, more
sophisticated methods are indicated for the analysis
of these signals, methods capable of extracting addi-
tional information such as the slope of the IF, which
allows to a better identification of the harmonics of a
given fundamental tone. The use of a Chirplet type
transform (S. Mann, 1995; L. Angrisani, 2002) is in-
vestigated in this article, although an equivalent for-
mulation in terms of the Fourier fractional transform
(H. M. Ozaktas, 2001) could be employed as well.
On the other hand, despite the quality of recording
devices, field recordings are affected for a variety of
undesirable signals which range from low amplitude
broad spectrum long duration signals, like wind, to
signals localized in time, like cattle bells, or localized
77
Dugnol B., Fernández C., Galiano G. and Velasco J. (2008).
NOISE REDUCTION AND VOICE SEPARATION ALGORITHMS APPLIED TO WOLF POPULATION COUNTING.
In Proceedings of the First International Conference on Bio-inspired Systems and Signal Processing, pages 77-84
DOI: 10.5220/0001057000770084
Copyright
c
SciTePress
in spectrum, like car engines. Clearly, the addition
of all these signals generates an unstructured noise
in the background of the wolves chorus which im-
pedes the above mentioned methods to work properly,
and which should be treated in advance. We accom-
plish this task by using PDE-based techniques which
transforms the image of the signal spectrogram into
a smoothed and enhanced approximation to the reas-
signed spectrogram introduced in (K. Kodera, 1978;
F. Auger, 1995) as a spectrogram readability improv-
ing method.
2 SIGNAL ENHANCEMENT
In previous works (B. Dugnol, 2007a; B. Dugnol,
2007d), we investigated the noise reduction and edge
(IF lines) enhancement on the spectrogram image by a
PDE-based image processing algorithm. For a clean
signal, the method allows to produce an approxima-
tion to the reassigned spectrogram through a pro-
cess referred to as differential reassignment, and for a
noisy signal this process is modified by the introduc-
tion of a nonlinear operator which induces isotropic
diffusion (noise smoothing) in regions with low gra-
dient values, and anisotropic diffusion (edge-IF en-
hancement) in regions with high gradient values.
Let x L
2
(R) denote an audio signal and consider
the Short Time Fourier transform (STFT)
G
ϕ
(x;t,ω) =
Z
R
x(s)ϕ(s t)e
iωs
ds, (1)
corresponding to the real, symmetric and normalized
window ϕ L
2
(R). The energy density function or
spectrogram of x corresponding to the window ϕ is
given by
S
ϕ
(x;t,ω) = |G
ϕ
(x;t,ω)|
2
, (2)
which may be expressed also as (Mallat, 1998)
S
ϕ
(x;t,ω) =
Z
R
2
WV(ϕ;
˜
t,
˜
ω)WV(x;t
˜
t,ω
˜
ω)d
˜
td
˜
ω,
(3)
with WV(y;·,·) denoting the Wigner-Ville distribu-
tion of y L
2
(R),
WV(y;t,ω) =
Z
R
y(t +
s
2
)y(t
s
2
)e
iws
ds.
The Wigner-Ville (WV) distribution has received
much attention for IF estimation due to its excel-
lent concentration and many other desirable math-
ematical properties, see (Mallat, 1998). However,
it is well known that it presents high amplitude
sign-varying cross-terms for multi-component signals
which makes its interpretation difficult. Expression
(3) represents the spectrogram as the convolution of
the WV distribution of the signal, x, with the smooth-
ing kernel defined by the WV distribution of the
window, ϕ, explaining the mechanism of attenuation
of the cross-terms interferences in the spectrogram.
However, an important drawback of the spectrogram
with respect to the WV distribution is the broadening
of the IF lines as a direct consequence of the smooth-
ing convolution. To override this inconvenient, it was
suggested in (K. Kodera, 1978) that instead of assign-
ing the averaged energy to the geometric center of the
smoothing kernel, (t,ω), as it is done for the spectro-
gram, one assigns it to the center of gravity of these
energy contributions, (
ˆ
t,
ˆ
ω), which is certainly more
representative of the local energy distribution of the
signal. As deduced in (F. Auger, 1995), the gravity
center may be computed by the following formulas
ˆ
t(x;t,ω) = t
G
Tϕ
(x;t,ω)
G
ϕ
(x;t,ω)
,
ˆ
ω(x;t,ω) = ω +
G
Dϕ
(x;t,ω)
G
ϕ
(x;t,ω)
,
where the STFT’s windows in the numerators are
Tϕ(t) = tϕ(t) and Dϕ(t) = ϕ
(t). The reassigned
spectrogram, RS
ϕ
(x;t,ω), is then defined as the ag-
gregation of the reassigned energies to their cor-
responding locations in the time-frequency domain.
Observe that energy is conserved through the reas-
signment process. Other desirable properties, among
which non-negativity and perfect localization of lin-
ear chirps, are proven in (Auger, 1991). For our ap-
plication, it is of special interest the fact that the re-
allocation vector, r(t,ω) = (
ˆ
t(t,ω) t,
ˆ
ω(t,ω) ω),
may be expressed through a potential related to the
spectrogram (E. Chassandre-Mottin, 1997),
r(t,ω) =
1
2
log(S
ϕ
(x;t,ω)), (4)
when ϕ is a Gaussian window of unit variance. Let
τ 0 denote an artificial time and consider the dy-
namical expression of the reassignment given by
Φ(t,ω,τ) = (t,ω)+τr(t,ω) which, for τ = 0 to τ = 1,
connects the initial point (t,ω) with its reassigned
point (
ˆ
t,
ˆ
ω). Rewriting this expression as
1
τ
(Φ(t, ω,τ) Φ(0,ω,τ)) = r(t,ω),
and taking the limit τ 0, we may identify the dis-
placement vector r as the velocity field of the trans-
formation Φ. In close relation with this approach
is the process referred to as differential reassignment
(E. Chassandre-Mottin, 1997), defined as the transfor-
mation given by the dynamical system corresponding
BIOSIGNALS 2008 - International Conference on Bio-inspired Systems and Signal Processing
78
to such velocity field,
(
dχ
dτ
(t,ω,τ) = r(χ(t,ω,τ)),
χ(t, ω,0) = (t,ω),
(5)
for τ > 0. Observe that, in a first order approximation,
we still have that χ connects (t,ω) with some point in
a neighborhood of (
ˆ
t,
ˆ
ω), since
χ(t, ω,1) χ(t,ω, 0) + r(χ(t,ω,0))
= (t,ω) + r(t,ω) = (
ˆ
t,
ˆ
ω).
In addition, for τ , each particle (t,ω) con-
verges to some local extremum of the potential
log(S
ϕ
(x;·,·)), among them the maxima and ridges
of the original spectrogram. The conservative energy
reassignation for the differential reassignment is ob-
tained by solving the following problem for u(t, ω,τ)
and τ > 0,
u
∂τ
+ div(ur) = 0, (6)
u(·,·,0) = u
0
, (7)
where we introduced the notation u
0
= S
ϕ
(x;·,·) and,
consequently, r =
1
2
log(u
0
). Since in applications
both signal and spectrogram are defined in bounded
domains, we assume (6)-(7) to hold in a bounded
time-frequency domain, , in which we assume non
energy flow conditions on the solution and the data
u· n = 0, r· n = 0 on ∂Ω× R
+
, (8)
being n the unitary outwards normal to ∂Ω. Finally,
observe that the positivity of the spectrogram (Mal-
lat, 1998) and the fact that it is obtained from a
convolution with a C
kernel implies the regularity
u
0
,r C
and, therefore, problem (6)-(8) admits a
unique smooth solution.
As noted in (E. Chassandre-Mottin, 1997), differ-
ential reassignment can be viewed as a PDE based
processing of the spectrogram image in which the en-
ergy tends to concentrate on the initial image ridges
(IF lines). As mentioned above, our aim is not only
to concentrate the diffused IF lines of the spectrogram
but also to attenuate the noise present in our record-
ings. It is clear that noise may distort the reassigned
spectrogram due to the change of the energy distribu-
tion and therefore of the gravity centers of each time-
frequency window. Although even a worse situation
may happen to the differential reassignment, due to
its convergence to spectrogram local extrema (noise
picks among them) an intuitive way to correct this ef-
fect comes from its image processing interpretation.
As shown in (B. Dugnol, 2007a; B. Dugnol, 2007d),
when a strong noise is added to a clean signal better
results are obtained for approximating the clean spec-
trogram if we use a noise reduction edge enhancement
Spectrogram
Time
Frequency
0 0.5 1
300
750
1200
Image transformation
Time
Frequency
0 0.5 1
300
750
1200
Spectrogram
Image transformation
Figure 1: First row: Spectrogram and its transformation
with the PDE model. Subsequent plots: detail of the howl
contained within the range 300750 Hz. We observe the IF
concentration and smoothing effect of the PDE algorithm.
PDE based algorithm than if we simply threshold the
image spectrogram. This is due to the local applica-
tion of gaussian filters in regions of small gradients
(noise, among them) while anisotropic diffusion (in
the orthogonal direction to the gradient) is applied in
regions of large gradients (edges-IF lines). Therefore,
a possible way to improve the image obtained by the
differential reassigned spectrogram is modifying (6)
by adding a diffusive term with the mentioned prop-
erties.
Let us make a final observation before writing the
model we work with. In the derivation of both the re-
assigned and the differential reassigned spectrogram
the property of energy conservation is imposed, im-
plying that energy values on ridges increase. Indeed,
let B be a neighborhood of a point of maximum for
u
0
, in which divr = logu
0
< 0, and let (t
0
,ω
0
) B.
Let χ
0
(t,ω,τ) denote the characteristic defined by (5)
starting at (t
0
,ω
0
). Evaluating Eq. (6) along χ
0
we
obtain
d
dτ
u =
u
∂τ
+ r· u = udivr, (9)
implying that u experiments exponential increase in
NOISE REDUCTION AND VOICE SEPARATION ALGORITHMS APPLIED TO WOLF POPULATION COUNTING
79
B. For image processing, it is desirable the maximum
principle to hold, i.e., that the bounds minu
0
u
maxu
0
hold for any (t,ω,τ) × R
+
, ensuring that
the processed image lies within the range of image
definition ([0,255], usually). A simple way, which we
shall address, to ensure this property is by dropping
the right hand side term of Eq. (9), i.e., replacing Eq.
(6) by the transport equation
u
∂τ
+ r· u = 0. (10)
However, no energy conservation law will apply any-
more (note that u is constant along the character-
istics). The combination of the differential reas-
signment problem with the edge-detection image-
smoothing algorithm (L.
´
Alvarez, 1992) is written as
u
∂τ
+
ε
2
log(u
0
) · u g(|G
s
u|)A(u) = 0, (11)
in × R
+
, together with the boundary data (8) and
the initial condition (7). Parameter ε 0 allows us
to play with different balances between transport and
diffusion effects. The diffusion operator is given by
A(u) = (1h(|u|))u+h(|u|)
j=1,...,n
f
j
(
u
|u|
)
2
u
x
2
j
.
Let us briefly remind the properties and meaning of
the diffusive term components in equation (11):
Function G
s
is a Gaussian of variance s. The vari-
ance is a scale parameter which fixes the minimal
size of the details to be kept in the processed im-
age.
Function g is non-increasing with g(0) = 1 and
g() = 0. It is a contrast function, which allows
to decide whether a detail is sharp enough to be
kept.
The composition of G
s
and g on u rules the
speed of diffusion in the evolution of the image,
controlling the enhancement of the edges and the
noise smoothing.
The diffusion operator A combines isotropic and
anisotropic diffusion. The first smoothes the
image by local averaging while the second en-
forces the diffusion only on the orthogonal di-
rection to u (along the edges). More precisely,
for θ
j
= ( j 1) π/n, j = 1,... , n we define x
j
as the orthogonal to the direction θ
j
, i.e., x
j
=
tsinθ
j
+ ωcosθ
j
. Then, smooth non-negative
functions f
j
(cosθ,sinθ) are designed to be ac-
tive only when θ is close to θ
j
. Therefore, the
anisotropic diffusion is taken in an approximated
direction to the orthogonal of u. The combina-
tion of isotropic and anisotropic diffusions is con-
trolled by function h(s), which is nondecreasing
with
h(s) =
0 for s h
0
,
1 for s 2h
0
,
(12)
being h
0
the enhancement parameter.
In Fig. 1 we show an example of the outcome of our
algorithm for a signal composed by two howls. See
(B. Dugnol, 2007d; B. Dugnol, 2007c) for more de-
tails and other numerical experiments.
3 HOWL TRACKING AND
SEPARATION
A wolves chorus is composed, mainly, by howls and
barks which, from the analytical point of view, may
be regarded as chirp functions. The former has a long
time support and a small frequency range variation,
while the latter is almost punctually localized in time
but posses a large frequency spectrum. It is conve-
nient, therefore, adopting a parametric model to rep-
resent the wolveschorus as an addition of chirps given
by the function f : [0,T] C,
f(t) =
N
n=1
a
n
(t)exp[iφ
n
(t)], (13)
with T the length of the chorus emission, a
n
and φ
n
the chirps amplitude and phase, respectively, and with
N, the number of chirps contained in the chorus. We
notice that N is not necessarily the number of wolves
since, for instance, harmonics of a given fundamental
tone are counted separately.
To identify the unknowns N, a
n
and φ
n
we pro-
ceed in two steps. Firstly, for a time discretization of
the time interval [0,T], sayt
j
, for j = 0,..., J, we pro-
duce estimates of the amplitude a
n
(t
j
) and the phase
φ
n
(t
j
) of the chirps contained at such discrete times.
Secondly, we establish criteria which allow us decid-
ing if the computed estimates at adjacent times do be-
long to the same global chirp or do not.
For the first step we use the Chirplet transform de-
fined by
Ψf (t
o
,ξ,µ;λ) =
Z
f (t)ψ
t
o
,ξ,µ,λ
(t)dt, (14)
with the complex window ψ
t
o
,ξ,µ,λ
given by
ψ
t
o
,ξ,µ,λ
(t) = v
λ
(t t
o
)exp
h
i(ξt +
µ
2
(t t
o
)
2
)
i
.
(15)
Here, v L
2
(R) denotes a real window, v
λ
(·) =
v(·/λ), with λ > 0, and the parameters t
o
,ξ,µ R,
stand for time, instantaneous frequencyand chirp rate,
respectively. The quadratic energy distribution corre-
sponding to the chirplet transform (14) is given by
P
Ψ
f(t
o
,ξ,µ;λ) = |Ψf (t
o
,ξ,µ;λ)|
2
. (16)
BIOSIGNALS 2008 - International Conference on Bio-inspired Systems and Signal Processing
80
For a linear chirp of the form
f(t) = a(t) exp[i(
α
2
(t t
o
)
2
+ β(t t
o
) + γ)],
it is straightforward to prove that the energy dis-
tribution (16) has a global maximum at (α,β), al-
lowing us to determine the IF and chirp rate of a
given linear chirp by localizing the maxima of the en-
ergy distribution. For more general forms of mono-
component chirps we have the following localization
result (B. Dugnol, 2007b)
Theorem 1 Let f (t) = a(t)exp[iφ(t)], with a
L
2
(R) non-negative and φ C
3
(R). For all ε > 0
and ξ,µ R there exists L > 0 such that if λ < L then
P
Ψ
f (t
o
,ξ,µ;λ) 6 ε+ P
Ψ
f
t
o
,φ
(t
o
),φ
′′
(t
o
);λ
.
(17)
In addition,
lim
λ0
P
Ψ
f
t
o
,φ
(t
o
),φ
′′
(t
o
);λ
= a(t
o
)
2
. (18)
In other words, for a general mono-component chirp
the energy distribution maximum provides an arbi-
trarily close approximation to the IF and chirp rate
of the signal. Moreover, its amplitude may also be es-
timated by shrinking the window time support at the
maximum point.
Finally, for a multi-component chirp f (t) =
N
n=1
a
n
(t)exp[iφ
n
(t)] the situation is somehow more
involved since although the energy distribution still
has maxima at (φ
n
(t
o
),φ
′′
n
(t
o
)) for all n such that
a
n
(t
o
) 6= 0, these are now of local nature, and in fact,
spurious local maxima not corresponding to any chirp
may appear due to the energy interaction among the
actual chirps.
4 NUMERICAL EXPERIMENTS
According to the recording quality, we start our algo-
rithm enhancing the signal with the PDE algorithm
explained in Section 2 or directly with the separa-
tion algorithm introduced in Section 3. For details
about the implementation of the former, we refer the
reader to (B. Dugnol, 2007c). Following, we briefly
comment about the separation algorithm implementa-
tion. We start by computing the energy distribution,
P
Ψ
f (τ
m
,ξ,µ;λ) at a set of discrete times τ
m
= m τ
of constant time step, τ, and for a fixed window width
λ. Next we compute the maxima of the energy at each
of these times. When the signal is mono-component
or the various components of the signal are far from
each other relative to the window width, the maxima
of P
Ψ
correspond to some (φ
n
(τ
m
),φ
′′
n
(τ
m
)) which are
0.5 1 1.5 2 2.5
0
1000
2000
3000
4000
Time
Frequency
Figure 2: Above: STFT of a field recorded signal. Be-
low: quadratic energy distribution of the chirplet transform
at t
0
= 2. Maxima correspond to IF and chirp rate chirps
locations. We observe the different behavior in the ξ and µ
directions at these maxima.
then identified as the IF and chirp-rate of a chirp can-
didate. However, when multi-component signals are
close to each other or are crossing, some spurious lo-
cal maxima are produced which do not correspond to
any actual chirp. Therefore, some criterium must be
used to select the correct local maxima at each τ
m
.
Although we lack of an analytical proof, there are ev-
idences suggesting that maxima produced by chirps,
i.e., at points of the type (φ
n
(τ
m
),φ
′′
n
(τ
m
)), decrease
much faster in the ξ direction than in the µ direction,
see Fig. 2, a phenomenon that does not occur at spuri-
ous maxima. We use this fact to choose the candidates
first by selecting ξ
k
, for k = 1,. ..,K, which are max-
ima for
sup
µ
P
Ψ
f (τ
m
,ξ,µ;λ),
and, among them, selecting the maxima with re-
spect to µ of P
Ψ
f (τ
m
,ξ
k
,µ;λ). We finally establish
a threshold parameter to filter out possible local max-
NOISE REDUCTION AND VOICE SEPARATION ALGORITHMS APPLIED TO WOLF POPULATION COUNTING
81
ima located at points that do not correspond to any
φ
n
(τ
m
) but which are close to two of them. We set
this threshold such that the existence of two consecu-
tive maxima is avoided.
In this way we obtain, for each τ
m
, a set of points
(µ
i
m
,ξ
i
m
), for i
m
= 1,..., I
m
, which correspond to the
IF’s and chirp-rates of chirps with time support in-
cluding τ
m
.
The next step is the chirp separation. We note that
if the time step τ = τ
m+1
τ
m
is small enough, then
ξ
j
m+1
τ
µ
j
m+1
2
ξ
i
m
+ τ
µ
i
m
2
.
Introducing a new parameter, ν, we test this property
by imposing the condition
1
v
<
2ξ
j
m+1
τµ
j
m+1
2ξ
i
m
+ τµ
i
m
< ν, (19)
for two points to be in the same chirp. In the experi-
ments we take ν = 2
1/13
1.0548.
Finally, in the case in which test (19) is satisfied
by more than one point, i.e., when there exist points
ξ
j
m+1
,µ
j
m+1
and
ξ
k
m+1
,µ
k
m+1
such that (19) holds
for the same (ξ
i
m
,µ
i
m
), we impose a regularity cri-
terium and choose the point with a closer chirp-rate
to that of (ξ
i
m
,µ
i
m
). This is a situation typically aris-
ing at chirps crossings points.
Summarizing, the chirp separation algorithm is
implemented as follows:
Each point (ξ
i
1
,µ
i
1
), for i
1
= 1,. ..,I
1
, is assumed
to belong to a distinct chirp.
For k = 2,3,..., we use the described criteria to
decide if
ξ
i
k
,µ
i
k
, for : i
k
= 1, ...,I
k
, belongs to
an already detected chirp. On the contrary, it is
established as the starting point of a new chirp.
When the above iteration is finished and to avoid
artifacts due to numerical errors, we disregard
chirps composed by a unique point.
Finally, once the chirps are separated, we use the fol-
lowing approximation, motivated by Theorem 1, to
estimate the amplitude
a(τ
m
)
2
1
λ[bv(0)]
2
P
Ψ
f
t
o
,φ
(τ
m
),φ
′′
(τ
m
);λ
.
Again, to avoid artifacts due to numerical discretiza-
tion, we neglect portions of signals with an amplitude
lower than certain relative threshold , ε (0,1), of the
maximum amplitude of the whole signal, considering
that in this case no chirp is present.
4.1 Experiment 1. A Synthetic Signal
In this first experiment we test our algorithm with a
synthetic signal, f, composed by the addition of three
Time
Frequency
Spectrogram
0 2 4 6 8 10
0
250
500
750
1000
1250
1500
0 2 4 6 8 10
0
250
500
750
1000
1250
1500
Time
Frequency
0 2 4 6 8 10
0
250
500
750
1000
1250
1500
Time
Frequency
Figure 3: Spectrogram of the clean signal and results of
the chirp localization and separation algorithm for clean and
noisy signals, respectively.
nonlinear chirps (spectrogram shown in Fig. 3) and
with the same signal corrupted with an additive noise
of similar amplitude than that of f , i.e., with SNR= 0.
We used the same time step, τ = 0.2 sec, and
window width, λ = 0.1 sec, to process both signals,
while we set the relative threshold amplitude level to
ε = 0.01 for the clean signal and to ε = 0.1 for the
noisy signal. The results of our algorithm of denois-
ing, detection and separation is shown in Fig. 3. We
observe that for the clean signal all chirps are cap-
tured with a high degree of accuracy even at cross-
ing points. We also observe that the main effect of
noise corruption is the lose of information at chirps
BIOSIGNALS 2008 - International Conference on Bio-inspired Systems and Signal Processing
82
low amplitude range. However, the number of them
is correctly computed.
In Fig. 4 we show the amplitude, IF and chirp-
rate estimations of the chirp which is more affected by
the noise corruption, for both clean and noisy signals.
The main effect of noise corruption is observed in the
amplitude computation and in the lose of information
in the tails of the three quantities.
0 2 4 6 8 10
0
0.1
0.2
0.3
0.4
0.5
Chirp 3
Time
Analytic amplitude
0 2 4 6 8 10
0
0.1
0.2
0.3
0.4
0.5
Chirp 3
Time
Analytic amplitude
0 2 4 6 8 10
0
250
500
750
1000
1250
1500
Chirp 3
Time
Instantaneous frequency
0 2 4 6 8 10
−2000
−1000
0
1000
2000
Chirp 3
Time
Chirp−rate
Figure 4: Amplitude, IF and chirp rate for the clean signal
(left column) and noisy signal (right column). Solid lines
correspond to exact values and crosses to computed values.
1 2 3 4
0
500
1000
1500
2000
2500
Time
Frequency
Spectrogram
0 1 2 3
0
500
1000
1500
2000
2500
Time
Frequency
Howl−tracking result
1 1.5 2 2.5
0
500
1000
1500
2000
2500
Time
Frequency
Howl−tracking result
Figure 5: First row: spectrogram of the field recorded signal
utilized in Experiment 2. Second row: result of the chirp lo-
calization and separation algorithm. Third row: a zoom of
the previous plot showing six separated chirps correspond-
ing to five wolves howls.
4.2 Experiment 2. A Field Recorded
Wolves Chorus
In this experiment we analyze a rather complex signal
obtained from field recordings of wolves choruses in
wilderness, (L. Llaneza, ). Due to the noise present
in the recording, we first use the PDE algorithm to
enhance the signal and reduce the noise, see (B. Dug-
nol, 2007c) for details. For the separation algorithm,
we fixed the time step as τ = 0.03 sec, the relative am-
NOISE REDUCTION AND VOICE SEPARATION ALGORITHMS APPLIED TO WOLF POPULATION COUNTING
83
plitude threshold as ε = 0.01 and the window width as
λ = 0.0625 sec.
The algorithm output is composed by 32 chirps
which should correspond to the howls and barks (with
all their harmonics) emitted by the wolves along the
duration of the recording (about five sec). The result
is shown in Fig. 5. Since our aim is giving an estimate
of how many individuals are emitting in a recording,
we plot a zoom of the separating algorithm result for
the time interval (1, 2.5). Here, the number of chirps
reduces to six. However, it seems that one couple of
them are harmonics, the couples formed by the chirp
around 1000 Hz and the highest IF chirp. Therefore,
we may conclude that at least five wolves are emitting
in this interval of time. A similar analysis is carried
out with other time subintervals until all the recorded
signal is analyzed.
5 CONCLUSIONS
A combined algorithm for signal enhancement and
voice separation is utilized for wolf population count-
ing. Although field recorded wolf chorus signals
posses a complex structure due to noise corruption
and nonlinear multi-component character, the out-
come of our algorithm provides us with accurate es-
timates of the number of individuals emitting in a
given recording. Thus, the algorithm seems to be a
good complement or, even, an alternative to existent
methodologies, mainly based in wolf traces collection
or in the intrusive attaching of electronic devices to
the animals. Clearly, our algorithm is not limited to
wolves emissions but to any signal which may reason-
ably be modelled as an addition of chirps, opening its
utilization to other applications. Drawbacks of the al-
gorithm are related to the expert dependent election of
some parameters, such as the amplitude threshold, or
to the execution time when denoising and separation
of long duration signals must be accomplished. We
are currently working in the improvement of these as-
pects as well as in the recognition of components cor-
responding to the same emissor, such as harmonics of
a fundamental chirp, pursuing the full automatization
of the counting algorithm.
ACKNOWLEDGEMENTS
The first three authors are supported by Project
PC0448, Gobierno del Principado de Asturias, Spain.
Third and fourth authors are supported by the Spanish
DGI Project MTM2004-05417.
REFERENCES
Auger, F. (1991). Repr´esentation temps-fr´equence des sig-
naux non-stationnaires: Synthe`ese et contributions.
th`ese de doctorat, Ecole Centrale de Nantes.
B. Dugnol, C. Fern´andez, G. G. (2007a). Wolves counting
by spectrogram image processing. Appl. Math. Com-
put., 186:820–830.
B. Dugnol, C. Fern´andez, G. G. J. V. (2007b). Implementa-
tion of a chirplet transform method for separating and
counting wolf howls. Preprint 1, Dpt. Mathematics,
Univ. of Oviedo, Spain.
B. Dugnol, C. Fern´andez, G. G. J. V. (2007c). Implemen-
tation of a diffusive differential reassignment method
for signal enhancement. an application to wolf pop-
ulation counting. Appl. Math. Comput. To appear.
E-version: doi:10.1016/j.amc.2007.03.086.
B. Dugnol, C. Fern´andez, G. G. J. V. (2007d). On pde-
based spectrogram image restoration. application to
wolf chorus noise reduction and comparison with
other algorithms. In E. Damiani, A. Dipanda, K.
Y. L. L. P. S., editor, Signal processing for image
enhancement and multimedia processing, volume 34
of Multimedia systems and applications. Springer
Verlag. To appear. (e-version in http://www.u-
bourgogne.fr/SITIS/06/Proceedings/index.htm).
E. Chassandre-Mottin, I. Daubechies, F. A. P. F. (1997).
Differential reassignment. IEEE Signal Processing
Letters, 4(10):293–294.
F. Auger, P. F. (1995). Improving the readability of
time-frequency and time-scale representations by the
method of reassignment. IEEE Trans. Signal Process-
ing, 43(5):1068–1089.
H. M. Ozaktas, Z. Zalevsky, M. A. K. (2001). The Frac-
tional Fourier Transform with Applications in Optics
and Signal Processing. Wiley, Chichester.
K. Kodera, R. Gendrin, C. d. V. (1978). Analysis of time-
varying signals with small bt values. IEEE Trans-
actions on Acoustics, Speech and Signal Processing,
26(1):64–76.
L.
´
Alvarez, P. L. Lions, J. M. M. (1992). Image selective
smoothing and edge detection by nonlinear diffusion.
ii. SIAM J. Numer. Anal., 29(3):845–866.
L. Angrisani, M. D. (2002). A measurement method based
on a modified version of the chirplet transform for
instantaneous frequency estimation. IEEE Trans. In-
strum. Meas., 51:704–711.
L. Llaneza, V. P. Field recordings obtained in wilderness
in asturias (spain) in the 2003 campaign. Asesores en
Recursos Naturales, S.L.
Mallat, S. (1998). A wavelet tour of signal processing. Aca-
demic Press, London.
S. Mann, S. H. (1995). The chirplet transform: Physi-
cal considerations. IEEE Trans. Signal Processing,
43(11):2745–2761.
Skonhoft, A. (2006). The costs and benefits of ani-
mal predation: An analysis of scandinavian wolf re-
colonization. Ecological Economics, 58(4):830–841.
BIOSIGNALS 2008 - International Conference on Bio-inspired Systems and Signal Processing
84