Orientation and Mobility with Prosthetic Vision

Combination of Luminosity and Depth Information for Scene Representation

Guillaume Tatur

1,4

, Isabelle Marc

2,3

, Gerard Dupeyron

1,4

and Michel Dumas

3,4

Centre Hospitalier Universitaire de Nîmes, Place du Pr R. Debré 30000 Nimes, France

LGI2P, Ecole des Mines d’Alès, Parc Scientifique et Technique Georges Besse, 30035 Nimes cedex 1, France

Institut d’Electronique du Sud, UMR 584, 2 Place Eugène Bataillon 34095 Montpellier cedex 5, France

Institut ARAMAV, 12, Chemin du Belvédère, 30900 Nimes cedex, France

Keywords: Prosthetic Vision, Mobility, Depth-based Representation.

Abstract: Recent advances in visual prostheses raise good hope for enhancement of late blind people performances in

daily life tasks. Autonomy in mobility is a major factor of quality of life and ongoing researches aim to

develop new image processing for environment representation and try to evaluate mobility performances.

We present a novel approach for the generation of a scene representation devoted to mobility tasks, which

may complement current prosthetic vision research. In this work, done in collaboration with low vision

rehabilitation specialists, depth cues as well as contrast perception are made accessible through a composite

representation. After presenting advantages and drawbacks of a scene representation based solely on

captured depth or luminosity information, we introduce our method that combines both types of information

in a unique representation based on a temporal scanning of depth layers.

1 INTRODUCTION

Regaining some autonomy in mobility in unknown

environments is a critical step in improving the

quality of life for blind people, and reduces the

social handicap that they often experience. As a

consequence, numerous studies in the framework of

prosthetic vision are devoted to the issue of mobility

(Dagnelie et al., 2007; Parikh, Humayun and

Weiland, 2010; Van Rheede, Kennard and Hicks,

2010). Prostheses currently under development only

use a small number of electrodes to stimulate the

retina, and the implanted persons receive very

impoverished visual information, consisting of

patterns of a few tens of bright spots called

phosphenes. Recent clinical trials have proved that

implanted people are able to perform simple tasks

such as tracking a line on the ground, or locating

strongly contrasted objects. However, these tasks are

performed far from real life conditions and work is

still to be done in order to optimize informational

content of visual prostheses. This content is

hampered at first by the limited number of

stimulation electrodes available. But one can also

wonder about the nature of the information to be

transmitted, which is only based so far on a

grayscale representation of the environment. In

natural vision (Bruce and Green, 1990) distances are

perceived by means of different methods, some of

them using binocular cues as stereopsis and

convergence, and others based on monocular cues,

as motion parallax, occlusion, etc. This is made

possible by exploiting a set of inferences and

assumptions acquired by learning and experiencing

the structure of the world that surrounds us. This

approach is also used during training sessions for

low vision rehabilitation, when visually impaired

people are taught to use every resource possible to

understand the geometrical layout of the

environment and build a mental map of the space in

which they move (Markowitz, 2006). We therefore

assume that it is possible to change the nature of the

information transmitted by a retinal implant without

hindering its understanding by the implantee,

provided he has been taught to use this information.

For orientation and mobility tasks, we propose a new

method of representation of the scene, combining

depth and brightness information. This paper will

first discuss the advantages and limits of depth-

based representation in the case of prosthetic vision.

A method for composite representation, which

superimposes luminosity and depth data, is then

122

Tatur G., Marc I., Dupeyron G. and Dumas M..

Orientation and Mobility with Prosthetic Vision - Combination of Luminosity and Depth Information for Scene Representation.

DOI: 10.5220/0004732101220126

In Proceedings of the International Conference on Bio-inspired Systems and Signal Processing (BIOSIGNALS-2014), pages 122-126

ISBN: 978-989-758-011-6

 2014 SCITEPRESS (Science and Technology Publications, Lda.)

detailed, along with its expected contribution to

improvement of mobility performances.

2 PROSTETHIC VISION

SIMULATION

Figures that illustrate our words are a representation

of phosphene images, aimed at simulating prosthetic

vision. The simulated phosphene characteristics

have been derived from data gained during Argus II

project (Humayun, 2009). The phosphene image

consists of a square lattice of 9 * 9 circular light

dots. Each Gaussian shaped phosphene occupies a

31.2 minutes of arc (arcmin) field of view and

between phosphene spacing (center to center) is 36

arcmin. For information coding, N=10 gray levels

may be applied with 100% contrast (black

background).

Luminosity and depth data were

obtained from a stereo pair of cameras (STH-

MDCS3-C from VIDERE DESIGN) in rectified

geometric conditions.

3 REPRESENTATION BASED

ON LUMINOSITY DATA

Current studies devoted on mobility tasks with

prosthetic vision mainly use information gained

from grayscale images of the environment: input

image is usually split into blocks of pixels and the

average gray value of each block is used to compute

the visual characteristics of the corresponding

phosphene. For a complete review of techniques for

phosphene generation, see (Chen et al., 2009).

The ability to detect the different entities that

compose the environment, and to estimate distances

to them is of paramount importance to ensure a safe

perambulation. A luminosity-based representation

(LBR) with a few tens of phosphenes does not allow

for this spatial discrimination. A possible way to

make small objects detectable is to decrease the size

of each phosphene receptive field. However, in this

case, the entire field of view provided by the

phosphene image is dramatically reduced, and will

no longer match the minimum requirements for

mobility tasks, as estimated in (Cha, Horch and

Normann, 1992; Sommerhalder et al., 2006).

Moreover, this type of representation is strongly

influenced by the lighting conditions and may also

yield some ambiguities (for example a shadow or a

dark spot on the ground can be misinterpreted as an

obstacle), which may contribute to increase

cognitive load during mobility and orientation tasks.

4 REPRESENTATION BASED

ON DEPTH DATA

These observations lead us to propose the use of

depth information instead of luminosity (Tatur et al,

2011). In a depth-based representation (DBR), the

brightness of each phosphene is defined in relation

with the distance to the surrounding objects, its

intensity increasing as distance decreases. The

information obtained is independent of the texture

and reflectivity of entities, and of light conditions. In

the following, a method for DBR generation is

briefly described, and a comparison between the two

representations is made.

The transfer function that converts distance to

subject into intensity value for each phosphene is

established considering the maximum and minimum

distances observable: as the number of gray levels

for the representation is limited to N distinct values,

the range of distances to be rendered must also be

limited, otherwise fine spatial discrimination will not

be possible. In addition, for mobility tasks, it is

important to highlight nearby objects. For this

purpose, a nonlinear transfer function has been

defined so that it emphasizes short distances (Figure

1), which are represented by a larger number of gray

levels, and improve the contrast perception between

depth layers, to the detriment of visibility for the

more distant areas.

For  and  the maximum and

minimum distances, and for the highest gray level

value that can be presented on our display (255), the

gray level associated with the distance D is given by:





∗





(1)

with



255









(2)

∗



(3)

Where rnd() is the rounding function and  is a

control parameter, which is empirically chosen.

Finally, a uniform quantization step allows

associating 



with one of the N possible

brightness levels.

The D parameter in the previous equations stands

for a representative distance between the subject and

objects lying in the receptive field of a

given phosphene. This distance can be calculated in

OrientationandMobilitywithProstheticVision-CombinationofLuminosityandDepthInformationforScene

Representation

123

Figure 1 : DBR generation using linear and non linear

transfer function. N = 10, 6m and 0 m.

A) Full resolution image. B) DBR with linear transfer

function: 



∗















and C) DBR with non

linear transfer function with =-1.7.

different ways, for example by using the arithmetic

mean of the distance values in the studied area, the

median of these values, or the minimum value.

Figure 2 shows three phosphene representations of

the same scene based on different definitions of this

D parameter. The use of median value seems to

provide better separability between the entities in the

scene, and should mitigate the influence of local

values that would result from depth value estimation

errors. It is this last definition of D which is used in

the remainder of this paper.

Figure 2: DBR generation using three distance calculation:

A : Full resolution image, B : DBR when using the mean

value of distances, C : DBR when using the minimum

value and D : DBR when using the median value. E: LBR.

A similar approach is found in (Lieby et al, 2012). In

a navigation task through a maze with overhanging

obstacles using a 30 * 30 phosphene array, depth-

based representation appears more efficient than

LBR in terms of preferred walking speed.

If the number of phosphenes is decreased in

order to concur with the current capacity of

implants, and therefore if the amount of available

information is decreased too, the DBR should be

even more interesting, as it makes it possible to

adjust the data to be transmitted. In the case of a real

environment, with many elements present in the

vicinity, it is possible to filter the information on the

basis of the distance and thus present only entities

contained in a restricted volume around the subject.

This should limit attention effort required to decode

the scene by presenting only relevant information for

a safe travel.

This type of representation also presents some

limitations. First, the number of gray levels N is

limited and it is necessary to achieve some trade-off

between resolution and range of distances to be

converted. Even more importantly, when navigating

through an unknown environment, it is of course

necessary to avoid obstacles, but it is also important

to be able to orient oneself in order to choose a

suitable (safe and non erratic) path. The luminosity

information is therefore needed to collect visual cues

as light sources as well as contrasting areas

(pedestrian crossing for instance).

We present in the next section a method to

provide the two types of information in a unified

representation.

5 COMBINING DEPTH AND

LUMINOSITY INFORMATION:

A COMPOSITE

REPRESENTATION

As the intensity A of a phosphene is the only

parameter that can be controlled in order to convey

information, transmitting two different types of

information (luminosity and depth) is not

straightforward. For this purpose, we propose a

method based on the temporal scanning of the

successive depth layers of a scene. In this

representation, the presented phosphene image

corresponds initially to the LBR, and then a

successive highlight of the objects occurs depending

of their distance to the observer until a previously set

maximum scanning distanceDmaxis reached.

For a given scanning speed S and for a maximum

distanceDmax, we define a distance p measured

from the observer at time t:









. < 

(4)

For each phosphene we define a scanning related

value 



∈



0 255



such as:





 255∗

























(5)

where σ is a parameter controlling transitional

behavior between depth layers andD, the associated

median distance.

We can then determine a composite

representation which associates for each phosphene

BIOSIGNALS2014-InternationalConferenceonBio-inspiredSystemsandSignalProcessing

124

the value A



such as:







































(6)

whereA



is a threshold value directly related to the

depth layer thickness. This parameter is arbitrary

chosen to ensure the best visibility.

In order to help to distinguish between the two

different types of information, we propose to

enhance the visual contrast between them by

quantifying the LBR values on n levels while

maintaining a quantification of N levels for the



Figure 3 presents simulated prosthetic images

obtained with this composite representation.

Information for a safe navigation such as the

existence of the obstacle in the foreground of the

image is not accessible through the representation

solely based on luminosity, whereas it is possible to

detect small objects when using depth-based

representation. Scanning process is helpful too for

discrimination between objects close to each other.

Figure 3: Composite representation: visualisation of 3

images extracted from video during scanning process. Top

raw: Full resolution (left), representation based on

luminosity data (middle), and depth based representation

(right). Bottom raw: composite representation at time t

< t

(from left to right).

Clues about the environment geometrical layout

might be inferred by the observer when using the

scanning technique. This phenomenon is illustrated

in figure 4, which displays a corridor-like scene. We

observe that the scanning method should lead to a

better understanding of the presented scene: depth

information indicates the presence of lateral

obstacles that border what seems to be a passage.

Moreover, the dark region on the right side of the

image which could have been interpreted as a

passage with a LBR, appears to be part of the wall

and could then be correctly interpreted as a

contrasted area of this wall. Interestingly, vanishing

points can be estimated by observing the

convergence (figure 4 from de t



tot



) of the

successive depth layers.

Figure 4: representation of a corridor-like scene. Top row:

full resolution image. Bottom row: composite

representation for the time t



tot



Composite representation is expected to present

other interesting characteristics. In the case of a real

environment, it is possible to filter the information

on the basis of the distance and thus present only

entities contained in a restricted volume around the

subject. This should limit attention effort required to

decode the scene by presenting only relevant

information for a safe travel. The possibility of

triggering the scanning and controlling its

parameters can be given to the implantee via a

dedicated interface. Optimal values for maximum

distance and scanning rate may be chosen in relation

with the subject walking speed or the scene content.

In this case, subject should become more efficient in

his exploration tasks, as it is known from

psychological studies that an active participation

facilitates construction of mental representations

(Wexler and Van Boxtel, 2005). Another possible

benefit of this composite representation may arise

when considering one of the shortcomming of

epiretinal prosthesis: it is conceivable that the fading

(Zrenner et al., 2010) i.e. gradual disappearance of

the sensation when a continuous stimulus is applied,

may be at least partially overcome if the scanning

technique is used, by the simple fact that it causes

the periodic update of the phosphene image.

6 CONCLUSIONS

In this paper, a method for the generation of a

OrientationandMobilitywithProstheticVision-CombinationofLuminosityandDepthInformationforScene

Representation

125

composite scene representation based on depth and

luminosity information is presented. This

representation should allow for safer mobility as

well as preserving luminosity contrast perception,

useful to orientation. Orientation and mobility tests

with well sighted subjects wearing head mounted

displays simulating prosthetic vision are underway

in order to evaluate this method and to determine

efficient values for all parameters, notably for the

scanning velocity. These tests should validate the

supposed advantages of the composite

representation. A first statement is that it is a means

to assess the presence and the position of

surrounding obstacles, independently of their

appearances and lighting condition. Scanning as

presented here can help remove possible ambiguities

between obstacles when they are in close proximity

with each other. Moreover, this method can provide

a solution to the classic dilemma between field of

view and acuity: with the scanning method,

transmitting the entire camera field of view should

be possible because thin objects can still be detected.

Finally, according to us, the major advantage of this

technique is the possibility given to the subject to

choose the scanning parameters in relation to his

current actions and expectations. Thus, visual

exploration tasks such as landmarks detection and

mental map establishment could be facilitated. The

condition to the optimal use of this new kind of

representation, particularly for the distinction

between depth and luminosity information, relies on

a complete mental assimilation of the technique

through dedicated training sessions. As a

consequence, using low vision rehabilitation

concepts, one of our future aims is to develop

pertinent learning strategies.

ACKNOWLEDGEMENTS

This research was supported by the French

Federation of the Blind and Visually Impaired

(FAF).

REFERENCES

Bruce V., Green P. (1990). Visual perception: physiology,

psychology and ecology. Lawrence Erlbaum,

Hillsdale, NJ.

Cha, K., Horch, K. W., Normann, R. A. (1992). Mobility

Performance With A Pixelized Vision System. Vision

research, 32, 1367-1372.

Chen, S. C., Suaning, G. J., Morley, J. W., Lovell N. H

(2009). Simulating prosthetic vision: I. Visual models

of phosphenes. Vision research, 49, 1493-1506

Dagnelie, G., Keane, P., Narla, V., Yang, L., Weiland, J.,

& Humayun M. (2007). Real And Virtual Mobility

Performance In Simulated Prosthetic Vision. Journal

of Neural Engineering, 4(1):92-101

Humayun MS (2009).: Preliminary results from Argus II

feasibility study: a 60 electrode epiretinal prosthesis.

Investigative ophthalmology & visual science; 50: e-

abstract 4744

Lieby P., Barnes N, McCarthy C., Liu N., Dennett H.,

Walker J.G., Botea V., Scott A.F (2012). Substituting

depth for intensity and real-time phosphene rendering:

Visual navigation under low vision conditions, ARVO

2012, Fort Lauderdale, Florida, USA.

Markowitz S.N (2006), Principles of modern low vision

rehabilitation, Canadian Journal of Ophtalmology,

vol41, pp 289-312.

Parikh, N., Humayun, M. S., Weiland J. D (2010)

Mobility Experiments With Simulated Vision and

Peripheral Cues. ARVO 2010, Fort Lauderdale,

Florida, USA.

Sommerhalder J. R., Perez Fornos A., Chanderli K.,

Colin, L., Schaer X., Mauler F., Safran A. B. and

Pelizzone, M.,(2006). Minimum requirements for

mobility in unpredictable environments. Investigative

Ophthalmology & Visual Science, vol 47.

Tatur G., Marc I., Lafon D., Dupeyron G., Bardin F.,

Dumas M. (2011) Une approche fonctionnelle en

vision prothétique : étude préliminaire dans le contexte

de la mobilité. ASSISTH'2011, Paris, France

Van Rheede J. J., Kennard C., Hicks S., (2010).

Simulating prosthetic vision: optimizing the

information content of a limited visual display.

Journal of Vision, vol 10(14):32, pp1-15.

Wexler, M., Van Boxtel, J. J. A. (2005). Depth perception

by the active observer. Trends in cognitive sciences,

9(9), 431-438.

Zrenner, E., Benav H, Bruckmann, A., Greppmaier, U.,

Kusnyerik, A., Stett, A., Stingl, K., Wilke, R.(2010).,

Electronic Implants Provide Continuous Stable

Percepts in Blind Volunteers Only if the Image

Receiver is Directly Linked to Eye Movement, ARVO

2010, Fort Lauderdale, Florida, USA.

BIOSIGNALS2014-InternationalConferenceonBio-inspiredSystemsandSignalProcessing

126