Perceptual Active Equalization of Multi-frequency Noise

Juan Estreder, Gema Pi

nero

, Miguel Ferrer

, Maria de Diego

and Alberto Gonzalez

Inst. Telecommunication and Multimedia Applications (iTEAM),

Universitat Polit

ecnica de Val

encia, Spain

Keywords:

Active Noise Equalization, Frequency Masking, Perceptual Equalization, Multi-channel Adaptive Filtering.

Abstract:

In this paper we propose a novel multi-channel active noise equalizer (ANE) when music or speech signals

are present inside the same room. Our perceptual ANE (PANE) can beneﬁt from the masking effect of the

music emitted carrying out a perceptual equalization (PEQ) of the undesired ambient noise. Our PEQ strategy

automatically adapts the spectral proﬁle of the ambient noise recorded at the error microphones to the masking

threshold of the audio signal recorded at that same point. We present a real-time experiment carried out in

our laboratory that simulates the position of a driver in a car to test the PANE with different audio signals.

The experimental results are compared with two alternative strategies: the full cancellation (FC) proﬁle that

corresponds to an active noise cancellation strategy, and the hearing threshold (HT) proﬁle that corresponds

to an ANE system whose gains mimic the human audibility threshold. Both FC and HT proﬁles are indepen-

dent of the music presented in the room. Results show that the noise power measured at the microphones is

higher for the PEQ proﬁle, but always below the masking threshold of the music, getting almost unnoticeable.

However, the emitted anti-noise power in the case of PEQ is 15 dB lower compared to HT and FC proﬁles for

frequencies above 300 Hz. This performance leads to a reduction of noise pollution in the room and a lower

power consumption of the system loudspeakers. In addition, the PEQ proﬁle provided by the novel PANE sys-

tem is a versatile approach that can reduce the perceived noise as much as the user decides, even reaching the

same performance than the HT or FC proﬁles if needed. Therefore, the PANE system is a versatile real-time

alternative to the classic active noise cancellation systems for multi-frequency noise.

1 INTRODUCTION

The way we perceive sounds has been studied from

a systematic point of view over the last few decades

giving rise to the ﬁeld of psychoacoustics (Fastl and

Zwicker, 2007; Pickles, 2012). This approach to

the human hearing system models the process that a

sound undergoes from the moment it enters our outer

ear until it reaches our neurological system and be-

yond. This knowledge has been widely used in sev-

eral signal processing areas, such as coding (Branden-

burg and Johnston, 1990), equalization (V

alim

aki and

Reiss, 2016), personal sound zones (Donley et al.,

2016), or noise cancellation (Wang et al., 2018;

Mosquera-S

anchez et al., 2018). Generally speaking,

the ﬁeld of pyschoacoustics provides a more realis-

tic approach to any sound or noise processing since it

https://orcid.org/0000-0002-8719-8106

https://orcid.org/0000-0002-8743-1887

https://orcid.org/0000-0001-9948-3396

https://orcid.org/0000-0002-6984-3212

takes into account how our hearing system perceives

sounds (Fletcher, 1940).

From this perceptual point of view, we revisit

in this paper the active noise equalizer that was

ﬁrst proposed by Kuo (Kuo and Tsai, 1994). This

very ﬁrst idea of an active noise control system that

could shape the spectrum of the residual noise was

further developed for multi-frequency noise and

named “adaptive noise equalizer” (Kuo and Ji, 1995).

Afterwards it was extended for a multi-channel active

noise equalizer (ANE) in (Gonzalez et al., 2006),

where the equalization parameters that controlled

each single frequency were arbitrarily selected. In

the present paper we revisit the algorithm proposed

in (Gonzalez et al., 2006) and go a step further

introducing a perceptual strategy to control the

ANE. This strategy beneﬁts from the masking effect

that a sound (music or speech) produces over the

ambient noise, as for example it happens within a

car where its entertainment system can emit music

and can also be used to actively control the car noise.

Estreder, J., Piñero, G., Ferrer, M., de Diego, M. and Gonzalez, A.

Perceptual Active Equalization of Multi-frequency Noise.

DOI: 10.5220/0010648100390047

In Proceedings of the 18th International Conference on Signal Processing and Multimedia Applications (SIGMAP 2021), pages 39-47

ISBN: 978-989-758-525-8

The idea of using the human perception model to

improve some performance of an active noise can-

cellation (ANC) system has been previously studied.

In (Wang et al., 2012), a two-stage ANC approach

is proposed where a carefully designed masking sig-

nal is used to mask the residual noise, whereas (Wang

et al., 2018) takes advantage of the temporal post-

masking effect in order to improve the noise loud-

ness at the ANC output. The patent by Doclo (Doclo,

2016) controls the ﬁlter shape of the ANC system ac-

cording to some audio signal, which is the same idea

that the one explored in this paper. However, in (Do-

clo, 2016) the design of the ANC ﬁlters are obtained

through an iterative optimization process, which is

difﬁcult to achieve in real-time systems (increasing

the active performance in a certain frequency region

typically reduces the active performance in another

frequency region).

In this paper, a multi-channel real-time ANE sys-

tem (Gonzalez et al., 2006) that shape the residual

noise according to a certain masking threshold is pro-

posed, and its performance is compared to other two

proﬁles for different types of musical sounds. We

have selected excerpts from two different music gen-

res, one Latino-type and one Jazz-type. The noise is

a multi-frequency noise similar to that generated by a

diesel engine and other rotating machines (Ho et al.,

2020). The system performance in terms of noise at-

tenuation, mean squared error of the recorded music

signal and anti-noise power emitted by the loudspeak-

ers are analysed. It can be appreciated how the per-

ceptual equalization based on the ambient music can

reduce the emitted anti-noise power without compro-

mising the perceived quality of the music.

The rest of the paper is organized as follows: In

Section 2, relevant concepts regarding active noise

control and the computation of the masking thresh-

old are reviewed. In Section 3, the strategy to imple-

ment a perceptual ANE is stated. In Section 4, the

real-time ANE experiment mounted in our laboratory

is described and their results are shown, together with

a discussion on the system performance obtained for

the different equalization proﬁles and types of music

signals. Finally, Section 5 presents the main conclu-

sions of the paper.

2 SYSTEM MODEL

This section states the model of the ANE system and

describes the process to compute the masking thresh-

old of a sound in the frequency domain, also known

as simultaneous masking (Fastl and Zwicker, 2007).

Once these previous concepts are explained, the prin-

ciples of the perceptual equalization strategy will be

described in Section 3.

2.1 Active Noise Equalization

As said before, the ANE system presented in this pa-

per is based on the multi-channel model described

in (Gonzalez et al., 2006). Fig. 1 shows the block

diagram of the ANE system designed to equalize

the multi-frequency noise. The multi-channel sys-

tem is formed by J secondary sources and L error

sensors (or microphones), although Fig. 1 describes

the signal processing for one generic error sensor

l and one generic loudspeaker j. The output of

the sinusoidal generator is the reference signal x(n)

composed by K sine waves denoted by x

(n) with

k = 1,... ,K. Therefore, the adaptive ﬁlter W

(z) is

formed only by two-coefﬁcients for each component

and secondary source (Kuo and Ji, 1995): W

(z) =

j1,1

j1,2

,. .. ,w

jK,1

jK,2

The noise arriving at the lth microphone is de-

noted by d

(n). Then, the recorded signal at the lth

error sensor is expressed as

(n) = d

(n) +

∑

k=1

(1 − β

)

∑

j=1

l j

(n) ∗ y

(n)], (1)

where β

is the attenuation associated to reference

signal x

(n), s

l j

(n) is the impulse response of the FIR

ﬁlter that models the electroacoustic path (or room

impulse response, RIR) between the jth loudspeaker

and the lth microphone, and y

(n) is the kth com-

ponent of the adaptive ﬁlter output y

(n). Notice

that the RIRs s

l j

(n) must be estimated for the de-

sign of the adaptive ﬁlters, being their estimates de-

noted by ˜s

l j

(n) in Fig. 1. We assume that the esti-

mated RIRs have a negligible estimation error, that is,

˜s

l j

(n) ≈ s

l j

(n),∀l, j.

It was shown in (Gonzalez et al., 2006) that the

system transfer function in steady state for the kth ref-

erence signal is given by

jω

) =

jω

)

jω

)

= β

, l = 1, .. ., L , (2)

which will be intuitively shown in the following

Let us express (1) in the frequency domain for

every ω

such that k = 1,. .. ,K, and taking into ac-

count that Y

jω

) = W

jω

)X(e

jω

) according to

the model shown in Fig. 1:

jω

) = D

jω

) + (1 − β

)×

∑

j=1

l j

jω

)X(e

jω

(3)

For the proof of convergence, see Section V of (Gon-

zalez et al., 2006)

SIGMAP 2021 - 18th International Conference on Signal Processing and Multimedia Applications

LMS

P(z)

(n)d

(n)

x(n)

<latexit sha1_base64="VPSLYXDMl786iIiZCkmXAmi46XU=">AAAB5XicbZDNSsNAFIVv6l+tf1WXbgaLUDclqYIuC25cVrA/0IYymd40QyeTMDMRS+gj6ErUnS/kC/g2JjULbT2rb+45A/dcLxZcG9v+skpr6xubW+Xtys7u3v5B9fCoq6NEMeywSESq71GNgkvsGG4E9mOFNPQE9rzpTe73HlBpHsl7M4vRDelEcp8zavLRY12ej6o1u2EvRFbBKaAGhdqj6udwHLEkRGmYoFoPHDs2bkqV4UzgvDJMNMaUTekEBxlKGqJ208Wuc3LmR4qYAMni/Tub0lDrWehlmZCaQC97+fA/b5AY/9pNuYwTg5JlkczzE0FMRPLKZMwVMiNmGVCmeLYlYQFVlJnsMJWsvrNcdhW6zYZz0WjeXdZazeIQZTiBU6iDA1fQgltoQwcYBPAMb/BuTawn68V6/YmWrOLPMfyR9fENZZ2LYg==</latexit>

Sinusoidal

Generator

Multi-frequency

noise

(n)

𝟏 − 𝜷

𝜷

<latexit sha1_base64="MzmGKS199YmGpBa0KvCUOcSr8wU=">AAAB9HicbZDJTgJBFEVfOyJOjS7dVCQmuCHdhIisJHHjEqMMCRBSXTygpHpIVTUGO/yJroy680v8AXd+ik1DjNNdnXr3VvLedQLBlbasd2NpeWV1bT21kd7c2t7ZNTN7deWHkmGN+cKXTYcqFNzDmuZaYDOQSF1HYMMZnc/8xhil4r53rScBdlw68HifM6rjUdfMtDUXPYyupt1I3Exzd8ddM2vlrUTkL9gLyJ59QKJq13xr93wWuuhpJqhSLdsKdCeiUnMmcJpuhwoDykZ0gK0YPeqi6kTJ6lNy1Pcl0UMkyft7NqKuUhPXiTMu1UP125sN//Naoe6fdiLuBaFGj8WR2OuHgmifzBogPS6RaTGJgTLJ4y0JG1JJmY57SifnlxOROZSKCyjbX+fXC3n7JF+4LGYrhXkPkIIDOIQc2FCCClxAFWrA4BYe4BlejLFxbzwaT/PokrH4sw8/ZLx+At0Kk0E=</latexit>

(z)

Figure 1: ANE system model where only the signal processing within the jth secondary source is shown.

Substituting E

jω

) = β

jω

) from (2) in (3)

and assuming that D

jω

) = P

jω

)X(e

jω

), where

(z) is the primary acoustic path between the noise

source and the lth microphone shown in Fig. 1, then

(3) can be rewritten as

−(1 − β

jω

)X(e

jω

) = (1 − β

)×

∑

j=1

l j

jω

)X(e

jω

(4)

which leads to

∑

j=1

l j

jω

) = −P

jω

), (5)

that is, the primary path of the unwanted noise is com-

pensated by the combined response of all the adaptive

ﬁlters convolved with their respective secondary paths

in time. Therefore, when the adaptive ﬁlters designed

in the ANE system fulﬁll (5) in steady state, the sig-

nals captured by the error sensors are expressed as

jω

) = β

jω

), (6)

for l = 1, .. .,L, j = 1,. .. ,J and k = 1, ...,K.

The parameter β

is the attenuation of the kth com-

ponent of the unwanted noise with respect to its orig-

inal amplitude. In Fig. 1 the parameters β

are gath-

ered in the vector

β =



.. . β



. (7)

2.2 Masking Threshold Computation

In this section we will brieﬂy explain how the mask-

ing threshold of a sound is computed. This concept

has been previously used in (R

o et al., 2012; R

et al., 2013; Estreder et al., 2018) to carry out the per-

ceptual equalization of audio signals in the presence

of an unwanted ambient noise.

The simultaneous (or frequency) masking effect

occurs within the human hearing system when two

sounds are presented at the same time, one of them be-

ing the masker signal and the other the maskee signal.

This effect causes the maskee signal to be perceived

less louder than if it were presented alone (Fastl

and Zwicker, 2007; ISO Central Secretary, 2014).

Moreover, depending on the masking threshold of the

masker signal, the maskee can even get unnoticed by

the listener (Johnston, 1988; Moore, 1985).

Fig. 2 shows the block diagram (R

o et al.,

2013; Estreder et al., 2018) to estimate the masking

threshold of the mth time frame of a broadband sig-

nal r

(n) composed by L

samples. The “Spectral

Analysis” block estimates its Power Spectral Den-

sity (PSD), P

(k), over frequencies f

, where

k = 0, ·· · ,N/2, being N the size of the fast Fourier

transform (FFT) and f

= 44100 Hz the sampling rate.

In our case N = 2L

, being L

/ f

approximately 200

ms.

Once P

(k) is obtained, the “Map to Bark Scale”

block maps the frequency bins into the Bark domain

using (Zwicker and Terhardt, 1980):

ν = 13 arctan



0.76 f

1000



+ 3.5arctan



7500



, (8)

where ν denotes the index of the Bark band, ν =

1,2, .. ., N

, and N

is the number of Bark bands to

process, in our case N

= 25. This block obtains the

energy per Bark band as (Johnston, 1988)

(ν) =

sup(ν)

∑

k=inf(ν)

(k), ν = 1,· ·· , N

, (9)

where inf(ν) and sup(ν) correspond to the frequency

bin of the lower and upper boundary of the νth Bark

band, respectively.

Perceptual Active Equalization of Multi-frequency Noise

(n)

Map to

Bark Scale

Tonality Estimator and

Offset Function

(k)

Spreading

Function

(n)

Spectral

Analysis

Figure 2: Procedure to estimate the masking threshold T

(ν) of a time frame of a sound r

(n).

The “Spreading Function” block models the

masking caused by the energy of a certain Bark band

over the rest of the bands. From the set of previ-

ously proposed spreading functions (Bosi and Gold-

berg, 2012), we have used that of (Schroeder et al.,

1979). In order to compute the overall spread mask-

ing function S

(ν), we showed in (Estreder et al.,

2018) that a linear addition of the curves produced

by the spreading function in (Schroeder et al., 1979)

obtains similar results to the exponential addition

of (R

o et al., 2012), which has also been conﬁrmed

in (Moore, 1985) for multi-frequency signals. There-

fore, for each Bark band ν, their masking is computed

(ν) =

∑

η=1

(ν), (10)

where b

(ν) is the masking intensity produced by

Bark band η over Bark band ν computed from their

corresponding energies (9) as in (Estreder et al.,

2018).

Regarding the lower branch of Fig. 2, it calculates

the spectral offset O

(ν), which models the inﬂuence

of the type of spectral content on the ﬁnal masking:

if most of the energy is concentrated in a small por-

tion of the band (tone-like masking), this value will

be high, whereas if the energy is equally spread along

the band (noise-like masking), this parameter will be

low. In (Gray and Markel, 1974) the Spectral Flat-

ness Measure (SFM) is used to model the tone-like or

noise-like masking (for further details see also (John-

ston, 1988; R

o et al., 2013)). Finally the masking

threshold T

(ν) of time frame r

(n) at the νth Bark

band is estimated as

(ν) = 10

log

(ν))−

(ν)

. (11)

3 PERCEPTUAL ACTIVE NOISE

EQUALIZER

As we stated in Section 1, our goal is to design a per-

ceptual equalizer as in (R

o et al., 2012; Estreder

et al., 2018), but with an alternative strategy: instead

of equalizing the audio signal to mask the noise, the

ambient noise will be equalized by the ANE system

of Fig. 1 in order to get masked by the audio signal.

Fig. 3 shows the block diagram of the Perceptual

Active Noise Equalizer (PANE). The subscript m in

all the signals of Fig. 3 denote the mth time frame

of duration L

samples. The audio signal (speech or

music) emitted by all the speakers is a(n), whereas,

according to (1), the anti-noise signal emitted by the

jth loudspeaker of the ANE is expressed as

(n) =

∑

k=1

(1 − β

(n). (12)

The values of vector β

β are computed in the “At-

tenuation Estimation” block according to the masking

threshold of the music and the current level of the am-

bient noise at each error sensor. The measured signal

at the lth error sensor (1) is now expressed as

(n) = d

(n) +

∑

j=1

l j

(n) ∗ y

(n)] + a

(n)

= d

(n) + a

(n),

(13)

where d

(n) is the total ambient noise measured by

microphone l and

(n) = a(n) ∗

∑

j=1

l j

(n), (14)

is the resulting audio signal propagated from the J

loudspeakers to the lth microphone through the cor-

responding electroacustic paths s

l j

(n) of Fig. 1.

As said before, s

l j

(n) has been previously esti-

mated in the ANE system. Since ˜s

l j

(n) ≈ s

l j

(n), we

can obtain the estimate of the time frames of both sig-

nals, a

(n) and d

(n) in (13), as

˜a

l,m

(n) = a

(n) ∗

∑

j=1

˜s

l j

(n),

l,m

(n) = e

l,m

(n) − ˜a

l,m

(n).

(15)

Notice that the frame of the audio signal chosen to

compute the masking threshold at the input of Fig. 3

is a

(n), whereas the one that will be emitted by the

loudspeakers is a

m+1

(n). In a real-time experiment as

the one presented in Section 4, the audio signal can

SIGMAP 2021 - 18th International Conference on Signal Processing and Multimedia Applications

Active

Noise

Equalizer

1,m

(n)

(m+1)

(n)

1,m

(n)a

1,m

(n) ,

J,m

(n)

Attenuation

Estimation

L,m

(n)a

L,m

(n) ,

Figure 3: Perceptual Active Noise Equalizer.

be originated from a streaming web service or a ra-

dio station, thus, the masking threshold computation

must be carried out on the same audio frame that is

being reproduced (m), and its resulting β

will affect

the next time frame (m + 1). In case the audio sig-

nal is originated from a local memory or disk, then

the estimate ˜a

l,m

(n) can be pre-computed prior to the

reproduction, and a

(n) will be emitted by the loud-

speakers instead of a

m+1

(n).

Finally, the computation of attenuation vector β

for the PANE together with other two possible strate-

gies will be explained in the following.

3.1 Perceptual Equalization (PEQ)

The perceptual equalization (PEQ) proﬁle is the at-

tenuation proﬁle obtained by the “Attenuation Esti-

mation” block of the PANE shown in Fig. 3. Con-

sidering expressions (6) and (13), the noise term at

each error sensor for frequency f

can be expressed

as D

l,m

jω

) = β

k,m

l,m

jω

). Therefore, the values

of β

m,k

are estimated as:

20log

(β

m,k

) =

min

h

l,m

(ν) − G

l,m

(ν)

− g



, ∀ f

∈ B

(16)

where T

l,m

(ν) is the masking threshold of the audio

signal and G

l,m

(ν) is the energy per Bark band of the

noise signal, both referred to error sensor l and ex-

pressed in dB, and B

is the bandwidth (8) of the Bark

band ν in Hertz. The parameter g

is an extra gain

that can be used to force the noise power to fall g

dB below the masking threshold of the audio signal.

This value is usually set to 2 −3 dB in practice to take

into account the power variations of the audio signal

within the time of a frame. Nevertheless, g

can also

be considered as a control parameter that can shape

the noise PSD according to the user preferences.

Notice that the strategy stated in (16) computes

m,k

as the minimum value considering all the error

sensors. An alternative strategy could be to compute

the attenuation for each sensor as 20 log

(β

m,k

(l)) =

l,m

(ν) − G

l,m

(ν)

− g

and then compute β

m,k

their average over l. Additional strategies or con-

straints can be used depending on the music genre and

the locations of the error sensors with respect to the

noise source.

3.2 Hearing Threshold (HT)

The hearing threshold (HT) proﬁle aims to force the

noise power to remain below the standard curve of the

human hearing threshold established in (ISO Central

Secretary, 2014). Therefore, the values of β

m,k

are

estimated as:

20log

(β

m,k

) =

min



T H

jω

) − G

l,m

jω

),0



(17)

where L

T H

jω

) is the value in dB of the hearing

threshold at frequency f

and G

l,m

jω

) is the same

noise term that in (16), but expressed in the frequency

domain.

3.3 Full Cancellation (FC)

The main objective of the full cancellation (FC) pro-

ﬁle is to cancel the noise, or at least attenuate it as

much as possible regardless the existence of the audio

signal. Therefore,

m,k

= 0 , ∀m, k (18)

4 REAL-TIME EXPERIMENT

A multi-channel PANE system has been built in the

listening room located at the Audio Lab

of the Insti-

tute of Telecommunications and Multimedia Applica-

tions (iTEAM) of the Technical University of Valen-

cia. Fig. 4 shows a picture of the 2 × 2 multi-channel

system formed by L = 2 microphones acting as error

sensors, and J = 2 loudspeakers placed as if they were

mounted at the headrest sides of a car seat. A third

loudspeaker (not shown in the picture), placed oppo-

https://gtac.webs.upv.es/

Perceptual Active Equalization of Multi-frequency Noise

MIC #2

error signal

(n)

MIC #1

error signal

(n)

SPK #1

anti-noise

signal y

(n)

SPK #2

anti-noise

signal y

(n)

Figure 4: ANE Prototype mounted in the iTEAM Audio

Lab.

site to the seat, was used to generate the unwanted

noise formed by K = 11 frequency components such

that f

= (k + 1) · 47.05 Hz, k = 1, ...,11, i.e., being

its last frequency f

= 564.6 Hz.

A computer equipped with an Intel i7-8700 pro-

cessor and 32 GB of RAM has been used to run

the algorithms in real-time. The signal processing

is entirely computed by Matlab 2018b, whereas its

audio processing toolbox communication drives the

record and generation of signals through a Roland

Octa-Capture soundcard. The sample rate was 44100

Hz. The microphones are the Behringer ECM8000

model, the loudspeaker for the primary noise is an

Event Project Studio 6 and the two loudspeakers of

the ANE system are model SDQ5P from Apart.

4.1 Calibration and Estimation of

Secondary Paths

The masking threshold computation needs a precise

calibration of the whole audio system previous to

carry out the experiments. Firstly we have calibrated

both microphones with the aid of a pistonphone sig-

nal (Frederiksen, 2008). Secondly, we have estimated

the secondary paths, s

l j

(n), l, j = 1, 2, from each

loudspeaker to each microphone using a maximum

length sequence (MLS) signal (Vanderkooy, 1994) of

16384 samples. The estimated secondary paths have

been trimmed to a length of 4096 coefﬁcients, keep-

ing the 99.95% of their total energy. Finally, every

estimated ˜s

l j

(n) has been normalized according to the

real SPL of the corresponding measured signal in mi-

crophone l. The estimated secondary paths ˜s

l j

(n) in-

clude the electroacoustic characteristics of the loud-

speakers, microphones and audio sound card plus the

room impulse response.

4.2 Experiments

In this section a study has been carried out in order

to compare the different proﬁles of Section 3. For the

PEQ proﬁle a value of g

= 3 dB has been used for all

the frequencies.

0 200 400 600 800 1000

Frequency (Hz)

Power (dB SPL)

FC profile

HT profile

PEQ profile

Recorded Music

(a)

0 200 400 600 800 1000

Frequency (Hz)

Power (dB SPL)

FC profile

HT profile

PEQ profile

Recorded Music

(b)

Figure 5: PSD of the recorded noise at Mic #1 for the three

equalization proﬁles (a) when the “Latino” song is playing,

and (b) when the “Jazz” song is playing. The PSD of the

recorded song is also shown.

As audio signals, we have used two excerpts of

different musical genres. The ﬁrst audio signal is a

30 s excerpt from the song “Con altura” performed

by Rosalia, J Balvin and El Guincho (2019) that we

have labelled as “Latino” song. The second audio is

an instrumental 36 s excerpt from the song “Alfon-

sina y el mar” performed by The Jazz Chamber Trio

(2005) that we have labelled as “Jazz” song. The au-

dio signals are reproduced with a level such that they

are recorded at 83 dB SPL on average, while the noise

signal is reproduced such that it is recorded at 76 dB

SPL on average.

4.3 Results and Discussion

Fig. 5 shows the PSD of noise signal d

(n) measured

at the microphone labelled as Mic #1 in Fig. 4 for

every equalization proﬁle, and the PSD of its corre-

sponding audio excerpt, a

(n). All the PSDs are given

SIGMAP 2021 - 18th International Conference on Signal Processing and Multimedia Applications

in dB SPL. We have considered only the last 10 s of all

the signals in order to assure that the adaptive ﬁlters

(z) of the ANE system of Fig. 1 have converged.

Although the results are only referred to Mic #1, the

noise and music PSDs measured at Mic #2 were very

similar. The frequency is represented only from 0 Hz

to 1 kHz in order to focus on the part of the spectrum

where the noise is present, being its last component

located at f

= 564.6 Hz.

As expected, the proﬁle that provides a lower

recorded noise power for both music signals is the FC,

whose strategy is to cancel the noise in all the frequen-

cies. Regarding the HT proﬁle in Fig. 5, it presents

a similar proﬁle for both “Latino” and “Jazz” sig-

nals, with slight variations for the components around

200 Hz due to experimental variations. Compared to

the FC proﬁle, it allows for more noise content at the

lowest frequencies where the human hearing system

is less sensitive.

Regarding the PEQ proﬁle in Fig. 5, the amount

of noise power recorded at the error sensor is in the

range of 10 − 20 dB higher than the HT proﬁle for

most of the components and for both audio signals.

Comparing the PEQ noise curve to the music PSD,

it can be appreciated how the PEQ is far below the

music in some frequencies, but not in all of them.

This is due to the fact that their attenuation values,

, are designed with respect to the masking thresh-

old of the music, not with respect to its PSD. The

masking threshold is computed through a non-linear

process over the music PSD as Fig. 2 describes, and

it is represented in the Bark domain, thus, compar-

ing the music masking threshold with the noise PSD

would have no sense. Nevertheless, from the PEQ

curves in Fig. 5, we can state that the PEQ strat-

egy achieves its goal of actively equalizing the am-

bient noise to get masked by the music. All the sig-

nals involved in this experiment have been saved as

“wav” ﬁles and are available to be listened at http:

//gpinyero.webs.upv.es/Experiments-sigmap2021/.

Fig. 6, by contrast, shows the PSDs of the anti-

noise signal y

(n) emitted by the speaker labelled as

Spk #1 in Fig. 4. As in Fig. 5, the curves represent the

PSD considering only the last 10 s of the correspond-

ing signals. The power is represented in dB full scale

(FS), where FS means that the power level is referred

to a signal amplitude range limited to [−1,1].

It can be appreciated that FC and HT proﬁles emit

the same power for all the frequencies, although the

noise power received at the error sensors shown in

Fig. 5 is higher for HT in most of the frequency com-

ponents. Therefore, a certain relaxation in the noise

cancellation strategy apparently does not correspond

to the same amount of anti-noise power saved by the

0 200 400 600 800 1000

Frequency (Hz)

-120

-100

-80

-60

-40

-20

Power (dB FS)

FC profile

HT profile

PEQ profile

(a)

0 200 400 600 800 1000

Frequency (Hz)

-120

-100

-80

-60

-40

-20

Power (dB FS)

FC profile

HT profile

PEQ profile

(b)

Figure 6: PSD of the anti-noise signal emitted by Spk #1 for

the three equalization proﬁles (a) when the “Latino” song is

playing, and (b) when the “Jazz” song is playing.

ANE system. Notice that Fig. 5 represents the PSD

of d

(n) = d

(n)+

∑

j=1

1 j

(n)∗ y

(n)] in (13), which

depends on the level of the ambient noise captured at

Mic #1, d

(n), and the combined response of the anti-

noise signals, y

(n), j = 1,2, propagated through their

corresponding secondary paths, s

1 j

(n). Therefore, it

can be thought that, due to particular experimental

conditions, a minimum anti-noise power is needed to

actively affect the undesired ambient noise.

Regarding the curve of the PEQ proﬁle in Fig. 6,

The anti-noise PSD is only 3 − 5 dB lower than HT

PSD for all the components below 300 Hz. How-

ever, for the components above 300 Hz, the saved

power reaches signiﬁcant values around 10 dB for the

“Latino” song and 15 dB for the “Jazz” song. This

is a relevant result for our proposed PANE scheme,

since the anti-noise emitted within a reduced room, as

a car cabin, propagates all over the space and can neg-

atively enhance the ambient noise at positions differ-

Perceptual Active Equalization of Multi-frequency Noise

ent from the seat where the noise is being controlled.

In this sense, any reduction on the amount of energy

noise emitted by the secondary sources would help to

actively control the noise over multiple regions within

a room (Zhang et al., 2020).

Finally, Tables 1 and 2 show the Mean Square Er-

ror (MSE) and the Perceptual Evaluation of speech

quality (PESQ) values of the recorded audio signal at

Mic #1 respectively, being the reference signal in both

cases the audio excerpt recorded at Mic #1 when the

ambient noise is not present. Apart from the results

obtained by the three equalization strategies (FC, HT,

PEQ), the results obtained when no noise equalization

is carried out are presented in the row labelled “ANE-

OFF”.

The MSE is an objective measure whereas the

PESQ is intended to measure the subjective quality of

a speech signal through an index ranging from 0.5 to

4.5 (ITU-T, 2001). Comparing the PESQ to the Mean

Opinion Score (MOS) that ranges from 1 to 5, a PESQ

value of 4.5 can be considered of “excellent” quality,

whereas PESQ values below 3 are usually considered

“fair” for speech communications, but “poor” for au-

dio reproduction. Nevertheless, although the PESQ is

not validated for other applications than speech com-

munication (ITU-T, 2001), it is widely used in many

speech and audio applications as echo cancellation,

audio coding and speech enhancement.

Table 1: MSE of the recorded audio signal at Mic #1.

Latino Jazz

FC -27.3 dB -28.84 dB

HT -27.8 dB -28.3 dB

PEQ -15.13 dB -15.75 dB

ANE-OFF -5.59 dB -6.93 dB

Table 2: PESQ of the recorded audio signal at Mic #1.

Latino Jazz

FC 4.43 4.49

HT 4.41 4.48

PEQ 3.51 4.17

ANE-OFF 2.86 3.63

Table 1 shows that the FC and HT proﬁles achieve

an MSE around −27 dB, whereas Table 2 shows that

their corresponding PESQ values are close to 4.5,

meaning that the audio signals are perceived as if

no ambient noise were present. On the other hand,

the PEQ proﬁle achieves PESQ values of 3.5 for

the “Latino” song and 4.17 for the “Jazz” song, al-

though their respective MSE values are both 12 − 13

dB higher than their corresponding HT values. Never-

theless, the PEQ proﬁle presents a relevant reduction

of 9 − 10 dB of the MSE with respect to the “ANE-

OFF” condition, and a “Good / Excellent” quality in

the PESQ scale.

As a ﬁnal remark, the results here obtained for the

PEQ proﬁle, even if they are related to the particu-

lar conditions of the experimental study, provide very

interesting insights regarding the novel PANE system

shown in Fig. 3:

1. The PEQ can be implemented in real-time without

any additional latency on the audio reproduced by

the entertainment system.

2. The PEQ obtains a “Good / Excellent” audio qual-

ity compared to the same audio signals measured

in absence of the ambient noise.

3. The PEQ achieves a signiﬁcant reduction in the

anti-noise power emitted by the loudspeakers, al-

though the amount of saved power depends on the

particular frequency range considered.

4. The PEQ is a versatile approach that allows to ex-

change the amount of ambient noise power mea-

sured at the microphones with the amount of anti-

noise power emitted by the loudspeakers, just

by controlling the parameter g

in the design of

(16): the more the value of g

is increased, the

more the PEQ proﬁle get close to HT or FC pro-

ﬁles.

5 CONCLUSIONS

In this paper a novel perceptual active noise equalizer

(PANE) has been presented. The PANE intends to re-

duce the power of a multi-frequency ambient noise in

such a way that it cannot be perceived when a music

signal is present thanks to the masking effect of the

human hearing system. A real-time experiment has

been presented and the PANE performance has been

compared to other strategies of the ANE system as the

so-called full cancellation (FC) and hearing threshold

(HT) proﬁles. Although the measured noise power

level is higher for the PEQ proﬁle, the emitted anti-

noise power is lower compared to HT and FC pro-

ﬁles. This performance leads to a reduction of noise

pollution in the room and a lower power consump-

tion of the system loudspeakers. In addition, the PEQ

proﬁle is a versatile approach that can reduce the per-

ceived noise as much as the user decides, reaching

the same performance than the HT or FC proﬁles. Fi-

nally, the experiment presented in this paper has vali-

dated the PANE system as a real-time versatile alter-

native to the classic active noise cancellation systems

for multi-frequency noise. Further research is being

carried out in our laboratory to extend the PEQ to the

SIGMAP 2021 - 18th International Conference on Signal Processing and Multimedia Applications

case of broadband noises, as road and wind noises in

cars.

ACKNOWLEDGEMENTS

This work has been partially supported by EU

together with Spanish Government through RTI2018-

098085-BC41 (MCIU/AEI/FEDER) and RED2018-

102668-T, and Generalitat Valenciana through

PROMETEO/2019/109.

REFERENCES

Bosi, M. and Goldberg, R. (2012). Introduction to digi-

tal audio coding and standards. Springer Science &

Business Media.

Brandenburg, K. and Johnston, J. (1990). Second gener-

ation perceptual audio coding: the hybrid code. In

Audio Engineering Society Convention 88.

Doclo, S. (2016). Active noise reduction method using per-

ceptual masking, Patent US20110026724A1.

Donley, J., Ritz, C., and Kleijn, W. (2016). Improving

speech privacy in personal sound zones. In 2016 IEEE

Intl. Conf. on Acoustics, Speech and Signal Proc.,

pages 311–315.

Estreder, J., Pi

nero, G., Aguirre, F., de Diego, M., and Gon-

zalez, A. (2018). On perceptual audio equalization

for multiple users in presence of ambient noise. In

2018 IEEE 10th Sensor Array and Multichannel Sig-

nal Proc. Workshop (SAM), pages 445–449.

Fastl, H. and Zwicker, E. (2007). Psychoacoustics. Springer

Berlin Heidelberg, Berlin, Heidelberg.

Fletcher, H. (1940). Auditory patterns. Reviews of modern

physics, 12(1):47.

Frederiksen, E. (2008). Microphone calibration. In Hand-

book of Signal Processing in Acoustics, pages 1293–

1312. Springer New York, New York, NY.

Gonzalez, A., de Diego, M., Ferrer, M., and Pi

nero, G.

(2006). Multichannel active noise equalization of in-

terior noise. IEEE Trans. on Audio, Speech and Lan-

guage Proc., 14(1):110–122.

Gray, A. and Markel, J. (1974). A spectral-ﬂatness measure

for studying the autocorrelation method of linear pre-

diction of speech analysis. IEEE Trans. on Acoustics,

Speech, and Signal Processing, 22(3):207–217.

Ho, C.-Y., Shyu, K.-K., Chang, C.-Y., and Kuo, S. M.

(2020). Efﬁcient narrowband noise cancellation sys-

tem using adaptive line enhancer. IEEE/ACM Trans.

on Audio, Speech, and Language Proc., 28:1094–

1103.

ISO Central Secretary (2014). Acoustics – normal equal-

loudness-level contours. Standard, International Or-

ganization for Standardization.

ITU-T (2001). Perceptual evaluation of speech quality

(PESQ): An objective method for end-to-end speech

quality assessment of narrow-band telephone net-

works and speech codecs. ITU-T Recommendation

P.862.

Johnston, J. (1988). Transform coding of audio signals us-

ing perceptual noise criteria. IEEE Journal on Se-

lected Areas in Communications, 6(2):314–323.

Kuo, S. and Ji, M. (1995). Development and analysis of an

adaptive noise equalizer. IEEE Trans. on Speech and

Audio Proc., 3(3):217–222.

Kuo, S. and Tsai, J. (1994). Residual noise shaping tech-

nique for active noise control systems. The Journal of

the Acoustical Society of America, 95(3):1665–1668.

Moore, B. (1985). Additivity of simultaneous masking, re-

visited. The Journal of the Acoustical Society of Amer-

ica, 78(2):488–494.

Mosquera-S

anchez, J., Sarrazin, M., Janssens, K.,

de Oliveira, L., and Desmet, W. (2018). Multiple

target sound quality balance for hybrid electric pow-

ertrain noise. Mechanical Systems and Signal Proc.,

99:478–503.

Pickles, J. (2012). An introduction to the physiology of

hearing. Emerald Group Pub Bingley, Bingley, UK,

4th ed. edition.

o, J., V

alim

aki, V., Alanko, M., and Tikander, M.

(2012). Perceptual frequency response simulator for

music in noisy environments. In AES Conference:

45th Intl. Conf.: Applications of Time-Frequency

Proc. in Audio.

o, J., V

alim

aki, V., and Tikander, M. (2013). Percep-

tual headphone equalization for mitigation of ambient

noise. In 2013 IEEE Int. Conf. on Acoustics, Speech

and Signal Proc., pages 724–728.

Schroeder, M., Atal, B., and Hall, J. (1979). Optimizing

digital speech coders by exploiting masking proper-

ties of the human ear. The Journal of the Acoustical

Society of America, 66(6):1647–1652.

alim

aki, V. and Reiss, J. (2016). All about audio equaliza-

tion: Solutions and frontiers. Applied Sciences, 6(5).

Vanderkooy, J. (1994). Aspects of MLS measuring systems.

J. Audio Eng. Soc, 42(4):219–231.

Wang, T., Gan, W.-S., and Chong, Y.-K. (2012). Psy-

choacoustic active noise control system with auditory

masking. In Proc. 2012 Asia Paciﬁc Signal and Infor-

mation Proc. Assoc., pages 1–6.

Wang, Y., Feng, T., Wang, X., Guo, H., and Qi, H. (2018).

An improved LMS algorithm for active sound-quality

control of vehicle interior noise based on auditory

masking effect. Mechanical Systems and Signal Pro-

cessing, 108:292–303.

Zhang, J., Sun, H., Samarasinghe, P. N., and Abhayapala,

T. D. (2020). Active noise control over multiple re-

gions: Performance analysis. In 2020 IEEE Int. Conf.

on Acoustics, Speech and Signal Proc. (ICASSP),

pages 8409–8413.

Zwicker, E. and Terhardt, E. (1980). Analytical expressions

for critical-band rate and critical bandwidth as a func-

tion of frequency. The Journal of the Acoustical Soci-

ety of America, 68(5):1523–1525.

Perceptual Active Equalization of Multi-frequency Noise