A PORTABLE LOW VISION AID BASED ON GPU

R. Ureña, P. Martinez-Cañada, J. M. Gómez-López, C. Morillas and F. Pelayo

Departamento de Arquitectura y Tecnología de los Computadores, CITIC, ETSIIT

Universidad de Granada, C/ Periodista Daniel Saucedo Aranda s/n, Granada, Spain

Keywords: Low vision, Head-mounted display, Real time, Video processing, GPU, CUDA, GPGPU.

Abstract: The purpose of this work is to describe a customizable aid system based on GPU for low vision. The system

aims to transform images taken from the patient's environment and tries to convey the best information

possible through his visual rest, applying various transformations to the input image and projecting the

processed image on a head-mounted-display, HMD. The system easily enables implementing and testing

different kinds of vision enhancements adapted to the pathologies of each low vision affected, his particular

visual field, and the evolution of his disease. We have implemented several types of visual enhancements

based on extracting an overlaying edges, image filtering, and contrast enhancement. We have developed a

complete image processing library for GPUs compatible with CUDA in order the system can perform real

time processing employing a light-weight netbook with an integrated GPU NVIDIA ION2. We briefly

summarize here their computational cost (in terms of processed frames per second) for three different

NVIDIA GPUs.

1 INTRODUCTION

There are thought to be 38 million people suffering

from blindness worldwide, and this number is

expected to double over the next 25 years.

Additionally, there are 110 million people who have

severely impaired vision. (Fosters A. et Al, 2005)

Low Vision (LV) is the term commonly used to

describe partial sight, or sight which is not fully

correctable with conventional methods such as

glasses or refractive surgery (Peláez-Coca et al.,

2009).

The low vision pathologies can be divided

mainly into two categories; those that predominantly

suffer from a loss of visual acuity due to macular

degenerations, and those that suffer from a reduction

in the overall visual field such as Retinitis

Pigmentosa. In many countries, there is an

increasing prevalence of diabetic retinopathy and an

ageing population with 1 in 3 over the age of 75

being affected with some form of ageing macular

degeneration (Al-Atabany et al., 2009). The affected

who suffer from loss of visual acuity lose their

foveal vision and therefore they can walk easy and

avoid objects but they struggle to read or to watch

TV. On the other hand, those who suffer from a

reduction of their visual field, such as Retinitis

Pigmentosa, can perform properly static tasks which

require a relatively reduced visual field. However

their ability to walk and avoid obstacles is very

limited. Moreover they experience a progressive loss

of contrast sensitivity and therefore they have many

problems to manage themselves in low illumination

environments or, in general, in environments where

the illumination is not controlled.

There are several LV aids which try to improve

the visual capabilities taking advantage of residual

vision. Some of these devices employ an opaque and

immersive HMD to project the enhanced images.

For example the System LVES (Massof et al., 1992),

and the system JORDY by Enhanced Vision. Also it

has been developed a portable aid systems (Peli. E,

2001; Peláez-Coca et al., 2009) based on see-

through displays which overlap the edges of the

whole scene over the patient's useful visual field.

These systems are specially oriented to Retinitis

Pigmentosa affected people and use DSP devices

and/or FPGA for real-time processing.

The aid systems described above perform

transformations of the input image, amplifying it in

size, intensity or contrast. These transformations are

mainly based on digital zooming and edge

overlaying. The systems based on the magnification

of the image are very useful in static and controlled

201

Ureña R., Martinez-Cañada P., Gómez-López J., Morillas C. and Pelayo F. (2011).

A PORTABLE LOW VISION AID BASED ON GPU.

In Proceedings of the 1st International Conference on Pervasive and Embedded Computing and Communication Systems, pages 201-206

DOI: 10.5220/0003366702010206

 SciTePress

illumination environments, such as watching

television or reading but they have little use in

mobile environments, since they reduce the visual

field and present unrealistic images which prevent

the user from getting a real insight into the distance

at which obstacles are.

Most of the LV pathologies are characterized by

a slow progression with residual vision deteriorating

gradually with time; therefore the patients have

requirements that change as the disease advances.

Moreover the LV diseases affect unevenly to

different areas of visual field, thus a non-uniform

processing adapted to the affected needs and visual

field may be useful.

The systems mentioned above do not enable

totally customize the processing to the visual needs

and disease progression.

In this context the main contribution of the

present system is a new platform with allows

implementing and testing different kinds of image

enhancements adapted to the visual needs of each

affected, to his visual field, and to the evolution of

his disease. So as to customize the enhancements the

system has a graphical user interface. Moreover we

have developed different kinds of image

enhancements which improve the image contrast

even in low light environments where low vision

affected experiment several difficulties. The

designed system achieves real time image

processing (above 25 frames per second video-rate)

using a last generation Graphic Processor Unit

(GPU) integrated in a light weight netbook.

Even though embedded solutions based on DSPs

and/or FPGA may provide speed performance,

modern GPUs integrated in small size portable

computers can also provide the minimum latency

and frame rate required as they have multiple scalar

processors. The main advantage of GPU-based

systems is that they are easier and faster to

customize to the needs each visual impaired than

other implementations. It also provides facilities for

rapid development and testing of new image

enhancements.

2 SYSTEM SPECIFICATIONS

The proposed system can be viewed as a SW/HW

platform for low vision support, which aims to easily

implement and test different types of visual

correctors tailored to the needs of each affected, and

his visual field. Therefore the system aims to

transform images taken from the patient's

environment and tries to convey the best information

possible through his visual rest, applying different

transformations to the input image.

The main characteristics are:

(1) Customizable System: The system is able to

perform a sequence of transformations totally

adapted to the visual requirements, and visual

field of each low vision affected.

(2) Portability: The image processing device needs

to be carried by the patient in mobile

environments such as walking and similar tasks.

(3) Real Time Processing: The system is able to

perform different image enhancements in real-

time by using a low-power GPU embedded in a

light weight netbook.

(4) Flexibility: The system can combine several

types of visual enhancements including digital

zooming, spatial filtering, edge extraction and

tone-mapping and works properly in non uniform

illumination environments.

2.1 Architecture

The developed platform runs over a netbook ASUS

EEPC 1201 PN. It uses the netbook’s CPU and a

GPU NVIDIA ION2 connected via PCI-express.

In the CPU runs the main application, and is

where the user can define the processing to be

performed according to the visual needs of each LV

using a graphical user interface (UI). The UI is

based in the system RETINER (Morillas et al.,

2007) and a platform for speeding up non-uniform

image processing (Ureña et al., 2010). The

application performs algebraic optimizations based

on the convolution properties to simplify filter

stages.

After the optimization we can make out what

tasks are to run on the GPU and on the CPU. The

tasks performed by the CPU are invoked directly by

the application, whereas in the case of the GPU

using MEX (NVIDIA Corporation, 2007) modules

allows us to both set the type of processing to be

performed, and image transfers.

In Figure 1 we can see a diagram that

summarizes the functional architecture of the

implemented system.

Our system uses GPU to speed up the image

processing since current GPUs has a multiprocessor

architecture suitable for pixel-wise processing.

Most GPUs, given its size and high power

consumption are not suitable for portable

applications. However, the GPU used in this system,

the NVIDIA ION2, has 16 processors integrated on

a platform with low power consumption; which has

its own battery with about 4 hours of usage.

PECCS 2011 - International Conference on Pervasive and Embedded Computing and Communication Systems

202

Moreover the system takes advantage of the Intel

ATOM N450 processor, integrated in the used

netbook, which is faster than other processor buit-in

FPGAs, for example PowerPC.

The system can receive as input live video

captured from a camera, images and videos in AVI

format. In all cases the image can be in grayscale or

color. The color scheme can be RGB, YCbCr or

HSV performing the conversion between these color

schemes automatically. The system output is a

processed image in grayscale or RGB format,

depending on the input image and the particular

characteristics of the processing chain. The

processed image is projected on a Head-Mounted-

Display (HMD). In figure 2 we can see a person

using the system. As we can see the system have an

USB camera, and a HMD display, both connected to

the netbook ASUs EEPC 1201PN.

Figure 1: System Architecture.

Figure 2: Example of a person using the system.

2.2 Available Image Enhancements

The various image enhancements that can be applied

in this version of the platform are:

Edge Detection: Edge detection and overlapping

has been used widely in low vision rehabilitation

with patients with central and/or peripheral vision

loss (Peli, 2007). It have been assessed the effect of

this enhancement on performance and on perceived

quality of motion video. The results indicate that

adaptive enhancement (individually-tuned using a

static image) adds significantly to perceived image

quality when viewing motion video.

Contrast-Enhancement: Most of the low vision

affected people experiment a noticeable loss of

contrast sensitivity, resulting in almost a complete

loss of vision in low light environments or in

environments where the illumination is not totally

controlled (sudden changes in lighting conditions,

for example). One of the main objectives of this

system is to help the affected precisely in these

environments. Therefore we have developed a new

method to improve image contrast, based on the

conversion of the image to the HSV color space. The

system calculates automatically the histogram of the

component V to detect if the captured image is too

dark, too light or well contrasted. Then equalization

of the V channel or of the S channel is done if the

image is too dark or too light respectively. Finally

the enhanced channels are combined linearly with

the original ones using a weighting factor set by the

user depending on the desired degree of

enhancement. Also we have included a tone-

mapping operator (Biswas, et al., 2005).

Figure 3 (a) shows a sunset, in which many of

the characteristics of the image have been lost while,

in the enhanced images, we can appreciate all the

details of the landscape, such as the threes.(See

Figure 3 (b) and Figure 3 (c)) Figure 3 (d) shows a

man driving, as we can see all the details of the face,

such as the ear, have been lost nevertheless we can

see clearly the things which are outside the car. In

the enhancement images (Figure 3 (e) and (f)) we

can appreciate all the details of the face, and also we

can see clearly the details of the street. Comparing

the Biswas algorithm with the one presented in this

article we can see that the former clarifies more the

picture distorting in some places the color of the

image (see the sky tone in Figure 3 b and the face

tone in Figure 3 e) whereas the one presented here

even enhanced the colors.

Other Image Enhancements: The system also can

perform other image processing tasks like digital

zooming, spatial filtering with several types of

masks (Gaussian, Difference of Gaussian, Laplacian

of Gaussian, Unsharp). The Unsharp mask is of

special interest in low vision context since it

provides edge and contrast enhancement. Also the

system can perform histogram calculation and

equalization; these transformations are useful for

contrast enhancement, and for automatic

thresholding.

A PORTABLE LOW VISION AID BASED ON GPU

203

Original Image Biswas and Pattanaik Enhancement Our contrast Enhancement

Figure 3: Contrast enhancement examples.

2.3 Non Uniform Processing

and Simplification

Many LV pathologies affect unevenly to different

regions of the visual field. Therefore, in some cases,

it could be useful to perform different kinds of visual

enhancements depending on the specific region of

the visual field. For example, a Macular

Degeneration affected person suffers from a partial

loss in his foveal vision, which varies depending on

the evolution of the disease, whereas his peripheral

vision is undamaged (see Figure 4. a)

(a) (b)

Figure 4: Example of non uniform processing.

Therefore, in certain situations, he may need the

central region of his visual field to be enhanced, but

it is not necessary to perform any kind of processing

in the peripheral region. In figure 4.b we can see an

example of non-uniform processing. We have

enhanced the edges only in the region where the LV

affected presents vision loss.

Each LV affected has different regions of interest

(ROI) according to the specific characteristics and

the evolution of the disease. Consequently our

system has a graphical tool with enables defining

different types of ROIs adapted to the visual field of

each affected.

All the image enhancements available in the

system and explained in section 2.2 can be

combined. So as to define a complete processing

chain, the user may introduce a text chain which

specifies how the different transformations are

combined and the ROI to each transformation must

be applied.

To combine the transformations the system has

three operators which are explained in table 1.

Table 1: Available operators.

Operator Function

Sums the output from the implied transformations.

Subtracts the output from the implied transformations.

Concatenates transformations.

Once the processing chain is defined, the system

performs an algebraic simplification, if necessary, so

as to minimize the number of filtering stages. The

simplification is based in the convolution properties

and enables reducing N consecutives or parallel

filtering stages in one stage with a mask resulting

from the convolution or the sum /subtraction of the

N masks respectively. In order to do all the possible

simplifications the system changes the

transformations order if possible, taking into account

that contrast transformations are not commutable

with filters.

2.4 Real Time Processing using GPU

To perform all the image enhancements mentioned

above in real time we have developed an extensive

library of processing modules for GPU in CUDA

(NVIDIA Corporation, 2009).

Our target GPU, the NVIDIA ION2 , consists of

two streaming multiprocessors. Each streaming

multiprocessor has one instruction unit, eight stream

PECCS 2011 - International Conference on Pervasive and Embedded Computing and Communication Systems

204

processors (SPs) and one local memory (16KB).

Thus it has 16 SPs in total. Eight SPs in a streaming

multiprocessor are connected to one instruction unit.

This means that the eight SPs execute the same

instruction stream on different data (called thread).

In order to extract the maximum performance of SPs

by hiding memory access delay, we need to provide

four threads for each SP, which are interleaved on

the SP. Therefore, we have to provide at least 32

threads for each streaming multiprocessor.

To optimize the use of the available

multiprocessors, the parameters to be determined are

the number of threads/block and the shared memory

space between the threads of each block.

To accurately size the modules we have used the

CUDA Occupancy Calculator tool that shows the

occupation of the multiprocessor’s cache and its

percentage of utilization. The thread block size is

chosen in all cases so that multiprocessor occupancy

is 100%. The size of the GRID (number of

processing blocks to be executed by the kernel) is set

dynamically according to the size of the image.

The streaming multiprocessors are connected to

large global memory (512MB in ION2), which is the

interface between the CPU and the GPU. This

DRAM memory is slower than the shared memory,

therefore at the beginning of the module all the

threads of a block load in the shared memory the

fragment of the image that this block needs.

Depending on how the data are encoded in the GPU

global memory, each thread can load 1 data if we are

working with 4-byte data or 4 data if we are working

with 1-byte data. The global memory accesses of the

GPU for both reading and writing are done so that in

one clock cycle all the threads of a warp (K) access

to 4K bytes of RAM, ,where K is equal to 32 if we

work with CUDA Compute Capability GPUs 1.x.

Before turning to the processing stage all the

threads of the processing block have to wait in a

barrier to ensure that all of them have loaded its

corresponding data. Following the calculation step

may be a second stage of synchronization of the

block threads before writing to the GPU global

memory. The general structure of the GPU

processing modules is illustrated in Figure 5.

The interface between the host and the GPU

global memory is the bottleneck of the application

so each image data is encoded as 1-byte unsigned

integer. Therefore to encode a color pixel 3 bytes are

used. If more precision is needed (when working

with HSV color space for example) a conversion to

float is done once the image is stored in the GPU

global memory, exploiting the parallelism provided

by the GPU.

Figure 5: General structure of the GPU modules.

2.4.1 GPU Modules Performance

In this section we present the performance of the

GPU modules in terms of frames per second (fps).

For this evaluation we use three different platforms

to verify the scalability to the number of

multiprocessors available in each GPU:

1. GPU NVIDIA ION2 512 MB DDR3, 2

streaming multiprocessors.

2. GPU NVIDIA GeFORCE 8800GT 512 MB

DDR3, 14 streaming multiprocessors.

3. GPU NVIDIA 9200MGS, 256MB DDR3, 1

streaming multiprocessor.

In the measures are not included the image transfer

delay from host to GPU global memory and vice

versa, which are approximately 2.6 ms for the

NVIDIA GeForce 8800, 4.1ms for the NVIDIA

GeForce 9200MGS and 12 ms for the NVIDIA

ION2 when working with 800x600 RGB images,

applying the transformation to the whole image.

As we can see in table 2 in the case of the NVIDIA

GPU ION2 all developed modules work in real time,

more than 25 frames per second. If we combine

several transformations the total processing delay is

the result of summing the processing delay of each

transformation and the transference delay.

3 DISCUSSION

AND FUTURE WORK

We have presented a portable system that enables in

such an easy an effective way to combine and test a

wide range of visual enhancements of utility for low

vision affected that can benefit from on-line real-

time image processing.

One of the main advantages of the system is that

it can be fully customized to particular user’s

requirements, such as visual field or the evolution of

A PORTABLE LOW VISION AID BASED ON GPU

205

the pathology, covering a wide range of visual

disabilities. In order to adapt the system to the visual

field of each LV affected, the platform is able to

perform a specific processing to each region of the

visual field. Furthermore it can work properly in not

controlled or even in low illumination environments

since it is able to carry out real-time contrast

enhancement algorithms, and it allows the

incorporation of other visual enhancements (which

might be proposed and tested by others authors).

Table 2: Performance of the GPU modules.

FPS

GPU

9200MGS

FPS

GPU

8800GT

FPS

GPU

ION 2

Filtering (mask size 7x7) 25.71 200 54.26

Histogram equalization 71.68 625 126.9

Edge detection 28.22 370.37 50.4

LUT substitution 221.73 333.33 389.11

RGB to HSV 54.14 312.5 98.14

RGB to YCbCr 91.83 476.19 161.55

Digital zooming 20.96 270.27 37.01

Tone-Mapping Biswas 19.67 229.89 34.65

Contrast enhancement based

on HSV

34.6 338.98 60.06

The system can be used in mobile environments

such as walking since it is able to perform real time

processing in a light weight netbook. In order to

achieve real time processing we have developed an

image processing library for GPUs compatible with

CUDA. The performance of each GPU module in

terms of frames per second have been measured for

three different GPUs, our target GPU, the NVIDIA

ION2, and two more, to show the scalability of the

developed modules to the number of multiprocessors

available in each GPU.

In order to demonstrate the usefulness of this

unique visual aid system we are going to conduct a

series of tests with a group of Retinitis Pigmentosa

affected. Specifically for this group, the platform is

going to be used to enhance image features in low

contrast environments where those affected

experience several difficulties.

Furthermore we are planning to implement some

of the most useful visual enhancements using others

embedded devices based on ARM processor, or

FPGA, in order to obtain real-time processing in

smaller and lighter devices than heretofore

employed netbook.

ACKNOWLEDGEMENTS

This work has been supported by the Junta de

Andalucía Project P06-TIC-02007, and the Spanish

National Grants RECVIS (TIN2008-06893-C03-02)

and DINAM-VISION (DPI2007-61683).

REFERENCES

Al-Atabany W., Memon M., Downes S. M., Degenaar

P. A., 2010 Designing and testing scene enhancement

algorithms for patients with retina degenerative

disorders, BioMedical Engineering OnLine.

Biswas K., Pattanaik S., 2005 Simple Spatial Tone

Mapping Operator for High Dynamic Range Images.

Proceedings of IS&T/SID's 13th Color Imaging

Conference, CIC 2005.

Foster A, Resnikoff S., 2005 The impact of Vision 2020

on global blindness. Eye 2005, 19:1133-1135.

Massof R. W., Rickman D. L., 1992, Obstacles

encountered in the development of the low vision

enhancement system. Optom Vis Sci, 69:32-41.

Morillas, C., Romero, S., Martínez, A., Pelayo, F. J., Ros,

E., Fernández, E. 2007, A Design Framework to

Model Retinas. BioSystems 87: 156-163.

NVIDIA Corporation, 2007, Accelerating MATLAB with

CUDA using MEX files.

NVIDIA Corporation, 2009. NVIDIA CUDA C

Programming Best Practices Guide 2.3

Pelaez-Coca M. D., F. Vargas-Martin, S. Mota, J. Díaz, E.

Ros-Vidal, 2009 A versatile optoelectronic aid for low

vision patients; Ophtalmic and Phsysiological Optics.

Vol. 29, pp. 565-572.

Peli, E., 2001, Vision multiplexing: an engineering

approach to vision rehabilitation device development.

Optom Vis SCI. 78, 304-315.

Peli E., Luo G., Bowers A., Rensing N., 2007,

Applications of augmented-vision head-mounted

systems in vision rehabilitation, Journal of the SID

15/12.

Ureña, R., Morillas C., Gómez-López J.M., Pelayo F.,

Cobos J.P., 2010, Plataforma Hw/Sw de aceleración

del procesamiento de imágenes. Actas I Simposio de

Computación Empotrada (SiCE)

PECCS 2011 - International Conference on Pervasive and Embedded Computing and Communication Systems

206