Bioplausible Multiscale Filtering in Retinal to Cortical Processing as

a Model of Computer Vision

Nasim Nematzadeh

, Trent W. Lewis

and David M. W. Powers

1,2

School of Computer Science, Engineering and Mathematics, Flinders University, Adelaide, Australia

College of Computing, Beijing University of Technology, Beijing, China

Keywords: Visual Perception, Cognitive Systems, Pattern Recognition, Biological Neural Networks, Self-organising

Systems, Geometrical Illusions, Tilt Effects, Difference of Gaussian.

Abstract: Visual illusions emerge as an attractive field of research with the discovery over the last century of a variety

of deep and mysterious mechanisms of visual information processing in the human visual system. Among

many classes of visual illusion relating to shape, brightness, colour and motion, “geometrical illusions” are

essentially based on the misperception of orientation, size, and position. The main focus of this paper is on

illusions of orientation, sometimes referred to as “tilt illusions”, where parallel lines appear not to be

parallel, a straight line is perceived as a curved line, or angles where lines intersect appear larger or smaller.

Although some low level and high level explanations have been proposed for geometrical tilt illusions, a

systematic explanation based on model predictions of both illusion magnitude and local tilt direction is still

an open issue. Here a neurophysiological model is expounded based on Difference of Gaussians

implementing a classical receptive field model of retinal processing that predicts tilt illusion effects.

1 INTRODUCTION

Our visual perception of the world is the result of the

underlying processing of both parallel and

progressive (multilevel) visual information, starting

from the low level visual processing done in the

retina and passing information through multiple

levels of processing in the visual system. Visual

illusions are some of the visual distortion

experiences we encounter due to the limitations of

our visual information processing. It is likely these

effects emerge in specific processing stages either in

low level processing done in the retina or higher

level information processing in the cortex. Visual

illusions are often evident near or beyond the

thresholds of what our visual system can handle. So

by studying the visual illusions, we can better

understand the underlying mechanism and

limitations, and more generally the processing done

in our visual system. In the process we can also

develop new understanding and techniques for

computer and robot vision.

There are many approaches to the study of

illusion perception such as Gestalt psychology

(Gregory & Heard, 1979; Gilchrist et al., 1999),

computational models (Fermüller & Malm, 2004;

Robinson et al., 2007), neuro-biological, and

cognitive neuro-science approaches (Grossberg &

Todorovic, 1988; Penacchio & Otazu, 2013). Our

model is a bioplausible computational model

inspired by the low level multiscale filtering

performed in the retina itself.

The patterns explored are tilt illusions involving

enhancement of texture backgrounds such as

Checkerboard, Café Wall and bulging checkerboard

illusions. These types of illusions could be explained

in three different ways including: The theory of

‘contrast and assimilation’ (Smith et al, 2001),

‘perceptual inferences and junctions analysis’

providing high level explanations (Gilchrist et al.,

1999; Grossberg & Todorovic, 1988; Anderson,

1997, 2005), or ‘low level spatial filtering’

(Jameson, 1985; Blakeslee & McCourt, 2004).

For high-level explanation models, the ‘Scission

Theory’ proposed by Anderson (1997, 2005) triggers

the parsing of targets into multiple layers of

reflectance, transparency and illumination and

predicts that erroneous decomposition leads to

brightness illusions. Another high-level model is

‘Anchoring Theory’ (Gilchrist et al., 1999) based on

‘grouping factors’ that signal depth information.

Low-level theories are based on the mechanisms

305

Nematzadeh N., W. Lewis T. and M. W. Powers D..

Bioplausible Multiscale Filtering in Retinal to Cortical Processing as a Model of Computer Vision.

DOI: 10.5220/0005186203050316

In Proceedings of the International Conference on Agents and Artiﬁcial Intelligence (ICAART-2015), pages 305-316

ISBN: 978-989-758-074-1

 2015 SCITEPRESS (Science and Technology Publications, Lda.)

in early visual processing, e.g. simple image features

such as contrast edges rather than global scene

interpretation. For instance Jameson (1985)

proposed ‘Contrast/Assimilation Model’ which

qualitatively modelled both brightness contrast and

assimilation based on parallel processing at multiple

spatial scales by ‘Difference of Gaussians’ (DoG)

filters. Another example is Oriented-DoG (ODoG)

model proposed by Blakeslee and McCourt (1999,

2004) applying multiple scale and oriented DoG

filters to address many brightness/lightness illusions.

There is evidence that visual processing in the

retina is based on many resolutions simultaneously

(ter Haar Romeny, 2003). The idea mentioned by

Marr and Hildreth (1980) decades ago suggesting

that retinal processing carries ‘signatures’ of the

‘three-dimensional structure’ though did not

received physiological evidence at that time. It

seems that retinal low level multiscale processing

provide band pass visual information of the scene

which is an important factor in our real time quick

visual processing.



About how close these different explanations can

be, Dixon et al., (2013) claimed for connections

between ‘ODoG model’ (Blakeslee & McCourt,

1999) with higher level models such as ‘Anchoring

Theory’ of Gilchrist (1999). The key idea that is a

common principle in multiscale, inference base

brightness/lightness perception, mentioned to be

high pass filtering tuned to the object size.

In this paper we explore a multiscale model

based on the circular centre and surround

mechanism of classical receptive field (CRF) in the

retina relying on difference of Gaussian (DoG)

filters while assuming some limited number of

scales for the filter. The model’s output is a

multiscale pyramid of DoG filtered outputs in which

each scale of the filter creates a new layer of visual

information. The amount of information and its

accuracy is based on the neighbourhood size around

the edges that defined by the surround size of retinal

receptive field (RF) and proper scales of the DoG

filters.

The outputs from different scales of the DoG

filter can then be integrated. This multilayer

representation has a significant power in revealing

the underlying structure of the percept. It provides us

with enough information to start processing and

getting some preliminary 3D percept of the pattern,

containing edges, shades, some textures and even

may be some cues about the depth information. This

multiscale DOG filtering representation might be the

underlying mechanism to connect our model to some

high level explanations (e.g. Gilchrist et al., 1999).

This paper is organised as follows. Section Two

explains multiscale representation and spectral

analysis in computer vision (CV). Section Three

seeks for biological connections of these

mathematical transformations to our vision mainly

relying on the mechanism of retinal RFs and

classical receptive fields (CRFs) models. Section

Four includes the details of our model, the

experimental results on some tilt illusion patterns

and a roadmap for our ongoing and future research.

2 FILTERING AND VISION

There is considerable physiological evidence for

frequency and orientation tuning cells in our visual

system like (Hubel & Wiesel, 1962) and image

spectral analysis provides us important clues for the

final percept as the result of our visual processing.

2.1 Multiscale Representation

Experimental research in psychophysics and

physiological findings has suggested the multiscale

transforms as models of the processing and

projections in the visual cortex of mammals. Hubel

and Wiesel (1962) discovered a class of cells they

called simple cells, which have their response based

on the frequency and orientation of the visual stimuli

based on their examination on the cat’s visual

cortex. The physiological experiments showed that

their response could be modelled with linear filters,

whose impulse response has been measured at

different locations of the visual cortex. Daugmann

(1980) showed an approximation of impulse

response of these cortical cells by applying Gaussian

windows modulated by a sinusoidal wave in which

spatial orientation tuning of these cells modelled by

dilation of modulated Gaussians (e.g. Gabor

functions).

In the 1970s and 1980s, the need to extract

multiscale image information was established by

many researchers (Rosenfeld, 1971; Marr, 1982;

Burt & Adelson, 1983; Witkin, 1983) and some of

their ideas have later been subsumed by the wavelet

paradigm. The use of multiresolution sensor

provides high-resolution information (fine scales) at

selected locations and a large field of view with

relatively little data (coarse scale) at the same time.

Multiresolution algorithms can be implemented

using the multiresolution pyramid introduced by

Burt and Adelson (1983). Among many recent

studies on wavelets, Mallat (1996) was one of the

first to show the impact of wavelets for low-level

ICAART2015-InternationalConferenceonAgentsandArtificialIntelligence

306

vision by concentrating on three major applications

of wavelets, including multiresolution search,

multiscale edge detection and texture discrimination.

Pyramidal image representations

 such as scale

invariant transforms (Lowe, 1999) are better

matched to human visual encoding than JPEG-DCT,

and in particular don’t need to partition an image

into blocks before processing. Scale-space analysis

can be performed based on image decomposition by

finding the differences between a pair of scaled

filters with different parameterizations e.g.

Laplacian or Difference of Gaussian filters create a

pyramidal scale hierarchy (Lindeberg, 2011). A

comprehensive comparison of diverse range of

geometric representations for different multiscale

spatial, directional and frequency selectivity

techniques is gathered by Jacques et al., (2011).

Although pyramidal representation with

additional scales is arguably over-complete, it has

the potential to provide a lower error model of the

data, and is more likely to provide the information at

the level of detail required for a particular image or

application. We further connect this to Marr’s idea

of 3D structure above the edge map (Marr &

Hildreth, 1980). We will present illusion processing

results that show evidence for this primitive causal

effect in low level retinal visual. Currently the

simulations of these high-level explanations for

illusion magnitude and error predictions result in

very complex CV models, which tend not to

generalize!

Note further that self-organization models of

repeated patterns of edge detectors at particular

angles are well established (von der Malsburg,

1973). Higher level spatial aggregation of regularly

spaced spots or edges in turn automatically gives

rise to analogues of DCT and DWT type bases, the

latter with localization determined by the higher

level lateral interaction functions or the constraints

of an underlying probabilistic connectivity model

(Powers, 1983).

2.2 Image Spectral Analysis in CV

Image processing in spatial (pixel) domain and in

spectral (frequency) domain have specific

applications in CV, though frequency analysis of the

visual scene seems more biologically plausible. The

more popular discretised spectral transforms are

includes DCT (Discrete Cosine Transform), DFT

(Discrete Fourier Transform), STFT (Short Term

Fourier Transform), and DWT (Discrete Wavelet

Transform).

Such families of functions include not only

bioplausible interaction functions as discussed in the

previous section, but are also fundamental to JPEG

and JPEG2000 compression. Those that are based on

true sinusoidals and/or Gaussians are perhaps not

directly bioplausible, but usefully approximate those

that are bioderived.

One of the main advantages of Fourier

transformation is facilitating image filtering and

convolution (Smith, 2003). The high/low pass

filtering function clearly can contribute to a

multiresolution model, as well as image sharpening

and noise removal, and we can also model edge

detection and texture matching in these terms.

DFT/DCT are intrinsically global and also suffer

from a “Leakage” problem (Merry & Steinbuch,

2005) due to periodically extending the signal.

Whenever localization either in space or time of

spectral components is needed, windowed or

enveloped versions can be used. For example STFT

is calculated by finding DFT after multiplication by

a window function, which is sliding over the entire

image. A main drawback of STFT arises from the

Nyquist-Heisenberg uncertainty principle (Merry &

Steinbuch, 2005), meaning that finding an

appropriate window size is a trade-off between time

and frequency resolution.

Wavelets are a more general approach, and DWT

has had a high impact on signal and image. By

dilation and translation of a mother wavelet,

extraction of very low frequency components at

large scales and very high frequency component at

small scales are possible. The conventional wavelet

has some limitations like shift sensitivity, poor

directionality and lack of phase information, with

newer techniques introduced to address them (Führ

et al., 2006).

Gabor wavelets are product of elliptical Gaussian

and complex plane wave that provide directionality.

Based on dilations and rotations of this generating

function, a set of self-similar Gabor filters generates

for different orientation and scale. This is a reliable

technique for direction and scale tuneable edge and

line detection. Gabor wavelet has the ability to

characterize the underlying texture and image

characteristics due to its ability in finding local

features in small windows, with additional

directional information (Xie et al., 2008; Ali &

Powers, 2014). In our biological model, a Gabor-like

family of wavelets is implied, at different positions

in the retinal map, and at different frequencies at

different levels of processing. Figure 1 illustrates

three different oriented filter banks on a sample

scale.

Although there is physiological evidence for

BioplausibleMultiscaleFilteringinRetinaltoCorticalProcessingasaModelofComputerVision

307

frequency and orientation tuning cells both in the

retina and cortex and the ‘self-organization map’ of

orientation sensitivity (von de Malsburg, 1973),

there is still no specific evidence about the

bioplausibility of particular mathematical

transformations in our visual system, or

demonstration that specific models are more likely

than others.

Figure 1: One sample scale of 6 oriented filter banks (top)

DoG, (middle) LoG, and (bottom) Gabor wavelet.

3 BIOLOGY OF THE RETINA

3.1 Receptive Fields from Retina to

Cortex

The retina is a nerve tissue layer arranged in three

main layers including photoreceptors (rods and

cones), bipolar cells and ganglion cells (GCs). These

layers are then connected through two intermediate

layers of horizontal cells and amacrine cells (Fig 2).

The photoreceptors are the only retinal cells

which directly convert light into nerve impulses and

then transmit the impulses to layer two and three of

the retina the bipolar neurons, and ganglion neurons

respectively. Ganglion cells axons exit the eye and

carry the visual signals to the visual cortex. The

neurons in the intermediate layers also contribute in

the visual processing. Horizontal cells transmit the

photoreceptors outputs to a few surrounding bipolar

neurons, and the amacrine cells; activate the GCs

that are in their vicinity.

Figure 2: Retina layers including three main layers of

photoreceptors, bipolar cells and ganglion cells, and two

intermediate layers of horizontal cells and amacrine cells.

The figure reproduced using (McGill, 2014).

ON-centre and OFF-centre bipolar cells respond

differentially to the light stimuli on their receptive

field centres by either depolarization or

hyperpolarization. Like bipolar cells, the GCs have a

centre surround antagonism of concentric receptive

fields, and in response to stimuli, increase and

decrease the rate of action potential discharges

(McGill, 2014). Excitation and inhibition effect

happening based on light stimuli on the centre of an

ON-centre or OFF-centre GCs that can be easily

implemented by DoG filters.



Recent physiological findings have identified

further features of retinal ganglion cells (RGCs)

dramatically expanding our understanding of retinal

processing. Field and Chichilnisky (2007) published

a detailed study about circuitry and coding of the

information processing inside the retina, mentioning

that there are at least 17 distinct retinal ganglion cell

types and explained how they contribute in the

visual information encoding. Biological findings in

size variation of RGCs due to eccentricity and

dendritic field size (Shapley & Hugh Perry, 1986)

have been implemented in neuro-computational eye

models (e.g. Lourens, 1995; ter Haar Romeny,

2003).

A few types of RGCs found having orientation

selectivity similar to the cortical cells (Barlow &

Hill, 1963; Weng et al., 2005), even for horizontal

and amacrine cells neurobiological evidence showed

their elongated surround well beyond the CRF size

believed to be responsible for orientation detection

in the retina which modelled as retinal non-CRFs

(nCRFs) models (Carandini, 2004; Cavanaugh et al.,

2002; Wei et al., 2011).

All of these evidences indicate that based on the

diversity of intra-retinal circuits, different types of

RGCs (Field & Chichilnisky, 2007), and the

variations of the size of each individual RGCs due to

the retinal eccentricity (Lourens, 1995), the retinal

cells have the underlying mechanics of multiscale

processing from fine to coarse scales supporting

Marr’s indication of full primal sketch in early

stages of vision.

3.2 Retinal Low Level Visual

Processing

Linear filtering has many applications in CV such as

techniques for image improvement by sharpening

the edges and reducing noise. These procedures take

place by convolving the original image with an

appropriate kernel. In convolution, a rectangular grid

of coefficients (weights) known as the kernel is

multiplied by the activations of the neighbourhood

ICAART2015-InternationalConferenceonAgentsandArtificialIntelligence

308

elements of a particular pixel, and summed (or

averaged or integrated). We now explain the

relationship between the convolution operator and

the point spread function inside retina.

3.3 Lateral Inhibition and Point

Spread

Images can be viewed as a summation of impulses,

for instance variations of scale and shifted delta

function can generate an image. The characteristics

of a linear systems evaluated based on their impulse

responses, therefore the output image would be

equal to the input image convolved with the system's

impulse response. The impulse response is often

called the point spread function (PSF) (Smith, 2003).

The human visual system is an excellent example

of this concept. The first layer of the retina

transforms an input of a pattern’s light image into

another pattern consisting of nerve impulses. The

middle layer of the eye passes the bright spike, and

produces a circular region of increased darkness.

This process known as ‘lateral inhibition’, means

that if a nerve cell in the middle layer is activated, it

decreases the ability of its nearby neighbours to

become active. This biological convolution with its

specific PSF improves the ability of the eye to

understand the world.

The object recognition task and identifying

nearby objects in visual system is based on

distinguishing regions from their brightness and

colours. The mechanism in layer 2 of the retina by

sharpening the edges, facilitate this task. In the

processing of poor and blurry defined edge with

gradual change from dark to light such as ramp

Mach bands illusion, the brightness profile

appearing on the optic nerve as the output from

layer 2, has overshoot and undershoot presence,

reinforces the two regions between the light and

dark areas to appear more abrupt (Smith, 2003).

The lateral inhibition mechanism in layer 2 of the

retina seems to be the underlying mechanism of low

level models for addressing brightness lightness

illusions. The middle layer of the retina is an edge

enhancement or high-pass filter, but the first layer of

the retina with nonlinear mechanism, approximately

taking the logarithm of the incoming image for

retinal gain control. This nonlinearity results in

flattening the illumination component and makes it

possible for the eye to see under poor light condition

(Smith, 2003). Both the nonlinearity and filtering in

layer 2 of the retina seem to be important clues for

addressing brightness lightness illusions (Kingdom,

2011), as well as the tilt illusions, which have been

less well studied but are our present focus.

3.4 Classical Receptive Field Models

Classical receptive field (CRFs) models mainly

emphasize the contrast sensitivity of the retinal

ganglion cells and are modelled based on the

circular centre and surround antagonism using

differences and second differences of Gaussians

(DoG) or Laplacian of Gaussian (LoG) (Ghosh et

al., 2007) to reveal the edge information.

The retinal CRF models date back to the 1960s

when Rodieck & Stone (1965) and Enroth-Cuggel

(1966) used the DoG model for implementing RFs

of the RGCs based on their contrast sensitivity with

centre having smaller Gaussian variance compared

to the surround. Marr and Hildreth (1980) proposed

replacing DoG with LoG and claimed the

equivalence of DoG and LoG based on a certain

ratio ofσ (sigma) of the centre and surround

Gaussians. Powers (1983) showed that DoG models

can themselves results from a simple biophysical

model of ontogenesis and can usefully approximate

the interaction functions proposed in a variety of

neural models.

Jameson (1985) developed an early model of

brightness assimilation and contrast based on DoG

filters with multiple spatial scales. In a later study

(Jameson & Hurvich, 1989) they pointed out that

this processing occurs in parallel and accounts for

the simultaneous appearance of sharp edges and

blended colour that define delimited regions. They

claimed about the source of contrast and assimilation

by saying that contrast effect happening when the

stimulus components are relatively large compared

to the centre of the filter, and assimilation effect

arise when stimulus components are small compared

to the filter centre. Similar explanations have been

proposed for the checkerboard illusion by modelling

multichannel analysis of human contrast sensitivity

based on pattern’s spatial frequency (Devalois &

Devalois, 1988).

Our visual perception of a scene starts by

extracting the edge map of the scene and DoG is a

bioplausible implementation to model this process.

The extracted edge map is an essential and primitive

task in most image processing applications, but edge

map alone cannot provide any information about the

shades, lights, and also three dimensional structure

of the image (Ghosh et al., 2007). Therefore

according to Marr’s ‘raw primal sketch’, there is a

need for further processing to get the ‘full primal

sketch’ for a 3D view of the world (Marr & Hildreth,

1980).

BioplausibleMultiscaleFilteringinRetinaltoCorticalProcessingasaModelofComputerVision

309

Applying LoG (Marr & Hildreth, 1980) instead of

DoG shows the possibility of involvement of higher

order Gaussian derivatives in the filtering functions

in retinal visual processing. Young (1985, 1987)

introduced modelling of the retinal and cortical RFs

of many neurons based on linear combination of

Gaussians and higher derivatives of Gaussian. In a

recent study, Ghosh et al., (2007) used the 4

and 6

order derivatives of Gaussians to extract the shade

information next to the edges. Still there is no

biological evidence on the structure of these

functions.

The existence of new features in retinal cells

showed more delicate retinal information processing

which introduced the concept of non-classical

receptive fields (nCRFs) of RGCs. The experimental

findings done by Passaglia et al. (2001) indicated

that the surround has an extension well beyond the

CRFs. Based on the nCRFs implementation (Chao-

Yi & Wu, 1994; Wei et al, 2012) Blakeslee and

McCourt (2004) proposed a directional multiscale

DoG filter model for explaining the magnitude of

various White’s effect patterns. There are also

approaches for nCRFs implementation of the

cortical cells (Rao & Ballard, 1999; Grigorescu et

al., 2003; Tanaka & Ohzawa, 2009).

4 OUR MODEL

It has shown that the GCs excitation can be best

described by centre surround organization (Mangel,

1991), which can be modelled by differences of two

Gaussians (Linsenmeier et al., 1982). A ‘neuro-

physiological model’ has been proposed here based

on multiscale DoG filtering for retinal RF’s

implementation. Our goal here is exploring more

about the mechanism and the outputs coming from

layer 2 of the retina, and analyse whether this low

level visual representation could provide us with

enough information for revelling the tilt illusion

effect or not?

4.1 Multiscale Implementations of

Difference of Gaussians (DoG)

Difference of Gaussians is a filtering technique for

identifying the edges and multiscale representation

of DoG filters can produce the multiscale edge map.

DoG edge detection process starts by first

performing a Gaussian blurring with a specified

sigma () on the original image, results in a blurred

version of the image. Then another blurring with the

second Gaussian with sharper sigma (finer scales)

produces the second output with less blurring effect.

The final result calculated by finding the difference

between the two blurred results of the original

image. The zero crossings of the final result define

the edges, as the pixel values having some variation

in their surrounding neighbourhood.

For a 2D signal such as pattern I, the DoG output

of our retinal GCs model with centre surround

organization is given by:

,





I∗

2πσ











/





I ∗

2π











/









(1)

where the distance from the origin in the horizontal

and vertical axes are x and y respectively, σ is the

sigma of centre Gaussian, and σ indicates the

sigma of the surround Gaussian. Therefore based on

the K factor, the ratio of the surrond Gaussian to the

centre Gaussian is defined. This is the retinal PSF

introduced in section 3.3 modelling lateral inhibition

in layer 2 of the retina.

Figure 3: Difference of Gaussian filter withsigma of 7 for

the centre and 14 for the surround. Window size is

112×112 pixels.

A3D graph of a sample DOG filter is shown in

Figure 3. The value of K in our model set to 2 as

used in the ODoG model (Blakeslee & McCourt,

2004), but our model have a circular centre surround

organization instead of the oriented elongated

surround used in the ODoG model. By increasing

the K factor in Eq. (1), the surround suppression

affects more on the final predicted output. Rather

than the K factor, the DoG window size is another

parameter in the model. Very large windows result

in long computation, and very small windows are

just approximating a box blur filter not weighted

Gaussian one. We set a parameter to define the

window size based on the sigma of the centre

Gaussian and tested different ratios from 3 to 20. For

the experimental results in section 4.4 the window

size set to 6 times larger than the centre Gaussian to

both capture the excitation and inhibition effect.

ICAART2015-InternationalConferenceonAgentsandArtificialIntelligence

310

What we found is that the model is not sensitive to

exact parameter setting. Ideally the models’

parameters are set so that at the finest scale, it can

capture high frequency texture details and at coarse

scales, the kernel has appropriate size relative to the

objects within the scene.

The suggestion of involvement of higher order

Gaussian derivatives first mentioned by Marr’s LoG

approximation in retinal image processing (Marr &

Hildreth, 1982), and the idea used on many research

such as Young (1985, 1987) who applied linear

combination of Gaussian and LoG instead of DoG

(Figure 1), but there is still no biological evidence

for the structure of these functions.

Powers (1983) also proposed an ontogenetic

Bernoulli-like model showing that an appropriate

lateral interaction function can self-organize, and

can approximate many existing mathematical

models, including DoG models and LoG models

(emergent as two levels of DoG processing) noting

that processing is not particularly sensitive to the

parameterization or shape of the filter function.

Indeed cluster-level aggregates of Powers’ Bernoulli

model approximate to Poisson and Gaussian models.

The idea of scale-space analysis is based on

image decomposition, then finding the differences

between a pair of scaled filters (e.g. DoG or LoG)

with different parameterizations, which then used to

create a pyramidal scale hierarchy (Lindeberg,

2011). Our model has a multiscale stack of filtered

outputs to reveal the final percept.

Building a pyramid with additional scales or

multiple models is over-complete but has the

potential to provide a lower error model of the data,

and in particular is more likely to provide the

information at the level of detail required for a

particular image or application. This would in turn

support the connections of Marr’s raw primal to full

primal sketch and his speculation of 3D structure

above the edge map (Marr & Hildreth, 1980). Our

results show evidence for this primitive causal effect

in low level retinal visual processing in terms of

perceptual illusion models. These effects can in turn

be expected to contribute to higher level models of

depth and motion processing.

4.2 Investigated Patterns

The patterns we have investigated here are given in

Figure 4. All of the patterns in this class have a

background effect (such as checkerboards) as well as

other cues such as mortar lines in the Café Wall

illusion or superimposed dots on complex bulge

patterns, which all affect the final percept. From

now on, we refer to this type of tilt illusions as ‘2

order’ tilt effects. The complex bulge pattern

similar patterns generated

in the present project with

different shapes of inducing dots (Figure 5), belong

to 2

order tilt effect illusions, and the

superimposed dots on their backgrounds give some

impression of foreground background percept.

Different position of dots on the textured

background, result in some tilt, bow or wave

perception along the edges as well as expansion and

contractions on checkers corners.

Figure 4: Investigated patterns (top): Café Wall, simple

bulge Checkerboard, and (bottom) Complex bulge pattern

(Kitaoka: “A Bulge”) - http://www.ritsumei.ac.jp/

~akitaoka/index-e.html.

4.3 Alternate Explanations

Results from psychophysical and computational

research have shown that the low level visual

processing models are able to explain some low

level visual illusions. As an example, the ODoG

model presented by Blakeslee and McCourt (2004)

claimed to be a parsimonious model for brightness

induction illusions (Kingdom, 2011) with the ability

to predict both the illusion magnitude as well as its

orientation. For improvement of global

normalization step in the ODoG model, two

extensions of the model proposed by Robinson et al.,

(2007) to implement local normalization of

multiscale oriented outputs. There are other similar

BioplausibleMultiscaleFilteringinRetinaltoCorticalProcessingasaModelofComputerVision

311

models based on higher order derivatives of

Gaussians (e.g. Ghosh et al., 2007).

Figure 5: Similar complex bulge patterns with circular and

rectangular superimposed dots on a checkerboard result in

‘2

order’ tilt effects.

There have been attempt to explain geometrical

illusion patterns as well as brightness illusion

patterns by using high level visual models, such as

the perceptual inferences and fill in models proposed

by Grossberg and Todorovic (1988), as well as

Gestalt grouping and junction analysis (Gilchrist et a

l., 1999). But we are exploring the explanatory

power of retinal level filtering, showing that this low

level processing mechanism can provide enough

information to explain a significant family

geometrical illusion effects without requiring the

high computational and training cost of high level

visual models.

There has also been experimental research (e.g.

Jameson 1985; Westheimer, 2007) that specifically

connects ‘brightness induction’ illusions and

‘geometric illusions’. For instance, some

explanations for ‘SBC’ (Simultaneous Brightness

Contrast illusion: Figure 6-left), where a gray test

patch looks darker on a white background compared

to an identical patch on a black background,

suggested the involvement of some neurons with

small excitatory center and elongated surround

(nCRFs) that could be modeled with either ‘wavelet

based modeling’ (Otazu et al., 2008) or ‘DoG based

models’ (Blakeslee & McCourt, 1999, 2004).

Figure 6: (left) SBC (Simultaneous Brightness Contrast)

illusion, where identical gray test patches appear with

misperception of brightness, and (right)Irradiation pattern

where equal sized black and white test patches appear with

the misperception of size.

Another similar illusion is ‘irradiation pattern’

(Figure 6-right) in which two equal size test patches

of white and black, when positioned on the opposite

colour backgrounds, result in size misperception and

white patch on the black background appears larger.

Westheimer (2007) explained the irradiation effect

and Café Wall illusion by addressing the border shift

in those patterns due to the ‘retinal light spread’,

‘compressive nonlinearity’ and the ‘centre-surround

organization’ of retinal cells. He then mentioned

other factors involved for the final percept such as

cortical stages of straight and sharp borders, pointed

corners, slope of lines, and angle shifts.

Therefore the illusion perception in these 2

order tilt patterns seems to get effect from

‘brightness assimilation and contrast’ as well as

some ‘border shifts’ similar to our investigated

patterns. So for these categories of illusions, the

final percept is not only affected by the brightness

induction, but is also certainly influenced by the

bulging effect happening in the corners of the test

patch, which is basically of geometrical measures

not the exact intensity ones. We are looking to find

whether our multiscale retinal model is able to

address both ‘brightness induction’ and ‘geometrical

clues’ at the same time or not. The model analysis

could potentially be extended to even patterns

related to some other brightness induction illusions

with some minor changes to the model such as

additional multi orientation information.

Some researchers suggest a connection between

brightness induction and geometrical illusions by

other names, such as ‘brightness contrast and

assimilation’ by Jamson (1985), ‘encroachment of

bright regions into dark ones’ and ‘corner effect’ in

Westheimer (2007), ‘diagonal grouping’ along

checkerboard tiles connecting brightness

assimilation to the contrast by Gilchrist (1999),

‘diagonal components’ by Ninio (2006) which claim

to be the missing clue for the tilt illusion

explanations. There thus may be interacting or

related mechanism affecting these two supposedly

distinct illusion categories, and multiscale oriented

spatial filtering could explain the basic underlying

mechanism for the appearance of these effects. In a

complete review of lightness, brightness, and

transparency (LBT), Kingdom (2011) presented a

quarter century of new ideas, mentioned one of the

most promising developments in LBT are models of

brightness coding based on ‘multiscale filtering’ in

conjunction with ‘contrast normalization’.

The contribution of our work to the current

studies is to highlight the multiscale edge map

information derived from a bioplausible modelling

ICAART2015-InternationalConferenceonAgentsandArtificialIntelligence

312

of CRFs by multiscale DoG filters, and use this

multiscale edge representation as a basic neural

model that explains low level illusion precepts.



Figure 7: Café Wall results with centre/surround/window

in the ratio 1:2:6. (a) Multiscale output result for scale

varying from fine to coarse scale starting from the centre

Gaussian of 1 (top-left), increasing 1 pixel at a time to 12

pixels (bottom-right). Original image for the Café Wall

illusion (b) with enlarged output (c) for.6.

4.4 Model’s Predictions and Results

A common assumption is that information in the

visual systems is processed at multiple levels of

resolution, perhaps simultaneously, perhaps

sequentially in some sense. The information in each

scale of our pyramidal bioplausible representation

result creates a new layer of visual information and

investigation of this pyramidal output result from

different scales provides us the multiscale edge map

containing edges, shades around edges, some

textures and even may be some clues about the depth

information as mentioned in full primal sketch of the

3D scene by Marr and Hildret (1980).

Figure 8: (a) Multiscale output result from = 1 to 7 scale

processing for a simple bulge pattern (b) and enlarged

output (c) for.4.

The result of our current experiments shows that

the low level visual processing in layer 2 of the

retina, is able to reveal and explain many unsolved

visual illusion perceptions. We have shown the

simulation result of our simple multiscale CRF

model based on circular centre and surround

organization using multiscale DoG based filtering

representation. We are currently exploring adding

orientation resolution to our model and extending it

to nCRFs model based on more recent physiological

findings related to orientation based multiscale

filtering in the retina (Carandini, 2004; Cavanaugh

et al., 2002; Passaglia et al., 2001; Tanaka &

Ohzawa, 2009).



The output results of the 2

order tilt patterns

investigated here are organised in the Figures 7, 8

and 9 from low to high scale of the DoG filters

starting from top-left corner by moving to the right

in each row and downwards to go to the next row.

We tried to represent the multiscale representation of

our bioplausible retinal model, in a way that the

output result can be seen easily as a sequence of

increasing scales. Also the result shows an enlarged

output for a specific scale of the DoG filter, which

highlight the illusion effect well. The output results

on the 2

order tilt patterns of Café Wall, simple

3×3 Bulge patterns, and complex bulge pattern,

showed that utilizing simply a multiscale DoG

filtering analysis based on classical model for RFs

on those patterns, not only revealed the sharp edges

when small scale filters are used, but also by

increasing the filter scale, other hidden information

such as local texture information was revealed as

well. These results not only add weights to the

findings behind the Jameson’s (1985) ‘contrast and

assimilation theory’, but also indicated that there are

lots of geometrical clues which can be revealed from

this bioplausible multiscale representation.

Of those geometrical cues, our model highlights

the perception of divergence and convergence of

mortar lines in the “Café Wall” illusion shown in

Figure 7. Similar explanation for Café Wall illusion

is given by other researchers in the field based on

low level filtering models (Tani et al., 2006;

McCourt, 1983), although there are some

psychological explanations for it as well (Gregory &

Heard, 1979).

The experimental results show that on the

“Bulge patterns” in Figures 8 and 9, a bulge effect

occurs both in the simple pattern as well as the

complex one, which based on our assumption, is

happening due to a few visual clues for instance the

brightness perception of the checkerboard

background causing a simple border shifts outwards

for white tiles, the expansions happening in the

intersection angles, and some further clues related to

local position of dots, which may have frequency

discharge or emission results in local border tilts or

bow. This might be addressed by high level effects

(a)

(b) (c)

(c)

(b)

(a)

BioplausibleMultiscaleFilteringinRetinaltoCorticalProcessingasaModelofComputerVision

313

or psychological explanations for bulge effect

patterns such as uncertainties in both formation and

processing of image features such as points and lines

(Fermüller & Malm, 2004) and also categorization

of edges based on different intensity values around

them (Gregory & Heard, 1979; Kitaoka, 2007), but it

has a biological neural explanation for that which we

are interested in.



The pyramidal outputs from our model seem to

easily connect to ‘Gestalt grouping’ principles for a

psychological point of view that assumes the

grouping rules as basic blocks for perception of the

world. Our model suggests grouping principals as

we find different perceptual groupings occur at

different scales of the DoG filter applied to the

pattern.

Figure 9: (a) Multiscale output result for = 1,2,3 (first

row), and 4,6,8 (second row) for complex bulge pattern (b)

with enlarged scale output (c) for.3.

For example in “Complex Bulge” pattern for

lower scale filters (Fig 9) we first see the central

bulge which gradually expands to a level in which

another grouping principle dominates in as an X

rather than a bulge. In the “Café Wall” illusion (Fig

7) the appearance of diverging and converging

mortar lines when the DoG filter has a mid-range

scale appear, and by increasing the scale the effect

of mortar lines disappear and another perceptual

grouping of tiles along vertical direction opposite to

the direction of near horizontal mortar effect start to

appear. It is quite likely that this multiscale

representation is the underlying mechanism of not

only perceptual grouping but also some of the higher

level illusion explanatory models.

5 CONCLUSIONS

We have presented our preliminary investigation of

a variant of the classical retinal receptive field

(CRFs) model that implementing a circular centre

and surround mechanism and uses DoG to explain

some of the tilt illusion patterns such as Café Wall

and both Simple And Complex Bulge patterns which

we refer to them all as ‘2

order’ tilt patterns. We

focus on low level processing based on what takes

place in the retinal ganglia. We further expect that

these retinal filter models will prove to play a

significant role in higher level models of depth and

motion processing. Currently the simulations of

these high-level explanations for illusion magnitude

and error predictions result in very complex CV

models, which tend not to generalize. In our future

work we are extending the model to a non-classical

receptive field (nCRF) model with circular centre

and elongated surrounds inspired by our visual

system, and moving to identify angles of orientation

and motion quantitatively.

The experimental results showed that the output

of the model could provide us not only the

multiscale edge map as the indications for some

shades around the edges, but also we get other

information such as local texture information hidden

in the pattern as well. In this multiscale

representation, the information from each scale of

DoG filtering creates a new layer of visual

information.

The outputs from different scales of the DoG

filter can then be integrated to generate a multiscale

pyramid of the outputs generated by the DoG model.

This multiscale pyramidal representation provides us

with enough information to start processing and

getting some introductory 3D percept of the pattern,

including information of edges, shades, some

textures and even may be some preliminary clues

about the depth information, as mentioned by Marr’s

speculation of full primal sketch to complete our 3D

view of the world.

This multiscale filtering representation can be

used for illusion perception prediction and our future

study is on efficient data representation as well as

systematic analysis for predicting both illusion

magnitude and local shift direction by additional

orientation tuning to the model. Also we are keen to

make a connection between our bioplausible model

(c)

(a)

(b)

ICAART2015-InternationalConferenceonAgentsandArtificialIntelligence

314

with the psychological aspects of ‘Gestalt grouping

principles’.

ACKNOWLEDGEMENTS

This work was supported in part by the Chinese

Natural Science Foundation under Grant No.

61070117, the Beijing Natural Science Foundation

under Grant No. 4122004, and the Australian

Research Council under ARC Thinking Systems

Grant No. TS0689874, as well as the Importation

and Development of High-Caliber Talents Project of

Beijing Municipal Institutions.

REFERENCES

Ali, H. B., Powers, D. M., 2014. "Facial Expression

Recognition Based On WAPA AND OEPA FASTICA".

International Journal of Artificial Intelligence &

Applications, 5(3).

Anderson, B. L., 1997. "A theory of illusory lightness and

transparency in monocular and binocular images: The

role of contour junctions". Perception-London-, 26,

419-454.

Anderson, B. L., Winawer, J., 2005. "Image segmentation

and lightness perception". Nature, 434(7029), 79-83.-

Scission theory.

Barlow, H. B., Hill, R. M., 1963. "Selective sensitivity to

direction of movement in ganglion cells of the rabbit

retina". Science, 139(3553), 412-412.

Blakeslee, B., McCourt, M. E., 1999. "A multiscale spatial

filtering account of the White effect, simultaneous

brightness contrast and grating induction". Vision

research, 39(26), 4361-4377.

Blakeslee, B., McCourt, M. E., 2004. "A unified theory of

brightness contrast and assimilation incorporating

oriented multiscale spatial filtering and contrast

normalization", Vision Research, 44, 2483–2503.

Burt, P. Adelson, E., 1983. "The Lapalacian pyramid as a

compact image code", IEEE Trans. Comm., vol.

COM-31, pp. 532-549, Apr.

Carandini, M., 2004. "Receptive fields and suppressive

fields in the early visual system". The cognitive

neurosciences, 3, 313-326.

Cavanaugh, J. R., Bair, W., Movshon, J. A., 2002. "Nature

and interaction of signals from the receptive field

center and surround in macaque V1 neurons". Journal

of neurophysiology, 88(5), 2530-2546.

Chao-Yi, L., Wu, L., 1994. "Extensive integration field

beyond the classical receptive field of cat's striate

cortical neurons—classification and tuning

properties". Vision research, 34(18), 2337-2355.

Daugmann, J. G., 1980. "Two-dimensional spectral

analysis of cortical receptive field profile", Vision

Res., vol. 20, pp. 847-856.

DeValois, R. L., DeValois, K. K., 1988. "Spatial vision".

New York: Oxford University Press.

Enroth-Cugell, C., Robson, J.G., 1966. "The contrast

sensitivity of the retinal ganglion cells of the cat",

Journal of Physiology (London) 187 517–552.

Field, G. D., Chichilnisky, E. J. 2007. "Information

processing in the primate retina: circuitry and

coding". Annu. Rev. Neurosci., 30, 1-30.

Führ, H., Demaret, L., Friedrich, F., 2006. "Beyond

wavelets: New image representation paradigms".

Document and image compression, 7, 179-206.

Fermüller, C., Malm, H., 2004. "Uncertainty in visual

processes predicts geometrical optical illusions".

Vision research, 44(7), 727-749.

Ghosh, K. Sarkar, S. Bhaumik, K. 2007. "Understanding

image structure from a new multiscale representation

of higher order derivative filters". Image and Vision

Computing 25(8): 1228-1238.

Gilchrist, A., Kossyfidis, C., Bonato, F., Agostini, T.,

Cataliotti, J., Li, X., et al. 1999. "An anchoring theory

of lightness perception". Psychological Review, 106,

795–834.

Gregory, R. L., Heard, P., 1979.

"Border locking and the

Café Wall illusion". Perception, 8(4), 365-380.

Grigorescu, C., Petkov, N., Westenberg, M. A., 2003.

"Contour detection based on nonclassical receptive

field inhibition". Image Processing, IEEE Transactions

on, 12(7), 729-739.

Grossberg, S., Todorovic, D., 1988. "Neural dynamics of

1-D and 2-D brightness perception: A unified model of

classical and recent phenomena". Perception &

Psychophysics, 43, 241–277.

Hubel, D. Wiesel, T., 1962. "Receptive fields, binocular

interaction and functional architecture Ain the cat’s

visual cortex", J. Physiol., vol. 160.

Jacques, L., Duval, L., Chaux, C., Peyré, G., 2011. "A

panorama on multiscale geometric representations,

intertwining spatial, directional and frequency

selectivity". Signal Processing, 91(12), 2699-2730.

Jameson, D., 1985 . "Opponent-colours theory in the light

of physiological findings". In D. Ottoson, & S. Zeki

(Eds.), Central and peripheral mechanisms of colour

vision (pp. 83–102). London: MacMillan.

Jameson, D., Hurvich, L. M., 1989. "Essay concerning

color constancy". Annual review of psychology,

40(1), 1-24.

Kingdom, F. A. A., 2011. "Lightness, brightness and

transparency: A quarter century of new ideas,

captivating demonstrations and unrelenting

controversy". Vision Research, 51, 652–673.

Kitaoka, A., 2007. "Tilt illusions after Oyama (1960): A

review1". Japanese Psychological Research, 49(1), 7-

19.

Lourens, T., 1995. "Modelling retinal high and low

contrast sensitivity filters". In From Natural to

Artificial Neural Computation (pp. 61-68). Springer

Berlin Heidelberg.

Lowe, D. G., 1999. "Object recognition from local scale-

invariant features". Proceedings of the International

Conference on Computer Vision 2. pp. 1150–1157.

BioplausibleMultiscaleFilteringinRetinaltoCorticalProcessingasaModelofComputerVision

315

doi:10.1109/ICCV.1999.790410.

Lindeberg, T., 2011. "Generalized Gaussian scale space

axiomatics comprising linear scale-space, affine

scale-space and spatio-temporal scale-space", Journal

of Mathematical Imaging and Vision, Volume 40,

Number 1, 36-81.

Linsenmeier, R. A., Frishman, L. J., Jakiela, H. G.,

Enroth-Cugell, C., 1982. "Receptive field properties of

X and Y cells in the cat retina derived from contrast

sensitivity measurements". Vision research, 22(9),

1173-1183.

Mallat, S., 1996. "Wavelets for Vision". Proceedings of the

IEEE, vol 84, no. 4, april 1996.

von der Malsburg, C., 1973. "Self-organization of

orientation sensitive cells in the striate cortex".

Kybernetik, 14(2), 85-100.

Mangel, S. C., 1991. "Analysis of the horizontal cell

contribution to the receptive field surround of

ganglion cells in the rabbit retina". The Journal of

physiology, 442(1), 211-234.

Marr, D., Hildreth, E., 1980. "Theory of edge detection",

Proc. of Royal Society of London B 207, 187–217.

Marr, D., 1982. "Vision", W.H. Freeman and Company,

New York. Zero-crossing.

McCourt, M. E., 1983. "Brightness induction and the von

der Malsburg illusion". Perception 12: 131-142.

McGill C. A., 2014. Le cerveau à tous les niveaux.

[ONLINE] Available at: http://thebrain.mcgill.ca.

(Including links vision, and retina).

Merry, R. J. E., Steinbuch, M., 2005. "Wavelet theory and

applications". A literature study, Eindhoven

University of Technology.

Ninio, J., Pinna, B., 2006. "Orthogonal expansion: a

neglected factor in tilt illusions". Psychologia, 49(1),

23-37.

Otazu, X., Vanrell, M., Alejandro Parraga, C., 2008.

"Multiresolution wavelet framework models brightness

induction effects". Vision Research, 48(5), 733-751.

Penacchio, O., Otazu, X., Dempere-Marco, L., 2013. "A

Neurodynamical Model of Brightness Induction in

V1". PloS one, 8(5), e64086.

Passaglia, C. L., Enroth-Cugell, C., Troy, J. B., 2001.

"Effects of remote stimulation on the mean firing rate

of cat retinal ganglion cells", Journal of Neuroscience

21, 5794–5803.

Powers, D. M. W., 1983. "Lateral Interaction Behaviour

Derived from Neural Packing Considerations",

Technical Report No 8317, Department of Computer

Science, University of NSW, Australia.

Rao, R. P., Ballard, D. H., 1999. "Predictive coding in the

visual cortex: a functional interpretation of some

extra-classical receptive-field effects". Nature

neuroscience, 2(1), 79-87.

Robinson, A. E., Hammon, P. S., de Sa, V. R., 2007.

"Explaining brightness illusions using spatial filtering

and local response normalization". Vision research,

47(12), 1631-1644.

Rodieck, R. W., Stone, J., 1965. "Analysis of receptive

fields of cat retinal ganglion cells", Journal of

Neurophysiology 28, 833–849.

Rosenfeld, A., Thurston, M., 1971. "Edge and curve

detection for visual scene analysis". Computers, IEEE

Transactions on, 100(5), 562-569.

Shapley, R., Hugh Perry, V., 1986. "Cat and monkey

retinal ganglion cells and their visual functional

roles". Trends in Neurosciences, 9, 229-235.

Smith, R. G., Freed, M. A., Sterling, P., 1986.

"Microcircuitry of the dark-adapted cat retina:

functional architecture of the rod-cone network". The

Journal of neuroscience, 6(12), 3505-3517.

Smith, S. W., 2003. "Digital signal processing: a practical

guide for engineers and scientists". Newnes.

Smith, V. C., Jin, P. Q., Pokorny, J., 2001. "The role of

spatial frequency in color induction". Vision

Research, 41, 1007–1021.

Tanaka, H., Ohzawa, I., 2009. "Surround suppression of

V1 neurons mediates orientation-based representation

of high-order visual features". Journal of

neurophysiology, 101(3), 1444-1462.

Tani, Y., Maruya, K., Sato, T., 2006. "Reversed Café Wall

illusion with missing fundamental gratings". Vision

research, 46(22), 3782-3785.

ter Haar Romeny, B. M., 2003. "A scale-space model for

the retinal sampling. Front-End Vision and Multi-

Scale Image Analysis". Multi-Scale Computer Vision

Theory and Applications, written in Mathematics,

167-177.

Wei, H., Wang, Z. Y., Zuo, Q. S., 2012. "A model of

image representation based on non-classical receptive

fields". In Advances in Neural Networks–ISNN 2012

(pp. 297-306). Springer Berlin Heidelberg.

Wei, H., Zuo, Q., Lang, B., 2011. "Multi-scale image

analysis based on non-classical receptive field

mechanism". In Neural Information Processing (pp.

601-610). Springer Berlin Heidelberg.

Westheimer, G., 2007. "Irradiation, border location, and

the shifted-chessboard pattern". Perception, 36(4),

483.

Witkin, A. P., 1983. "Scale-space filtering", in:

Proceedings of International Joint Conferences on

Artificial Intelligence, Karlsruhe, pp. 1019–1022.

Xie, X., Lam, K. M., Zhao, H., Dai, Q., 2008. "Efficient

rotation-and scale-invariant texture classification

method based on Gabor wavelets". Journal of

Electronic Imaging, 17(4), 043026-043026.

Young, R.A., 1985. "The Gaussian derivative theory of

spatial vision: analysis of cortical cell receptive field

line weighting profiles". General Motors Research

Publication GMR-4920.

Young, R. A., 1987. "The Gaussian derivative model for

spatial vision: I. Retinal mechanisms", Spatial Vision

2, 273–293.

ICAART2015-InternationalConferenceonAgentsandArtificialIntelligence

316