Smartphone-based Approach for Automatic Focus Assessment in NIR

Fundus Images Targeted at Handheld Devices

Tudor-Ionut Nedelcu

, Francisco Veiga

, Miguel Santos

, Marcos Liberal

and Filipe Soares

Fraunhofer Portugal AICOS, Rua Alfredo Allen 455/461, 4200-135 Porto, Portugal

Keywords:

Retina, Smartphone, Image Acquisition, Autofocus, Feature Extraction, Machine Learning.

Abstract:

Mobile fundus imaging devices can play an important role in the decentralization of eye diseases screening

methods, increasing the accessibility of telemedicine solutions in this area. Since image focusing is crucial to

obtain an optimal retinal image, this work presents a smartphone-based approach for automatic focus assess-

ment of NIR retinal images, acquired by a prototype of a handheld fundus camera device called EyeFundusS-

cope (EFS) A009. A DCT-based focus metric is proposed and compared against a group of Gradient-based,

Statistical-based, and Laplacian-based functions in the same experimental setup. The paper also presents the

EFS image acquisition logic and the protocol for creating the necessary NIR dataset with the optic disc region

around the centre of the image. The results were obtained within 853 images acquired from 8 volunteers. The

developed method combined with other features, and a SVM classiﬁer in a Machine Learning approach which

attained an AUC of 0.80, has shown to be a viable solution to integrate into the EFS mobile application.

1 INTRODUCTION

Globally, it is estimated that approximately 1.3 bil-

lion people live with some form of vision impairment.

As the prevalence of diseases that affect the eyes

is expected to increase, more and more people will

have potentially blinding conditions (WHO, 2019).

Glaucoma and Diabetic Retinopathy are emerging

causes of visual problems and represent around 14%

of the total cases of blindness in the world. For both

diseases, early diagnosis and treatment are essential

for slowing their progression and prevent irreversible

damage, which makes fundus imaging extremely im-

portant (Rani et al., 2018). Despite the current aware-

ness on the importance of regular examinations, many

patients do not have the possibility to have a frequent

follow-up due to the high-cost of the equipment and

shortage of trained personnel, making it difﬁcult to

distribute them in isolated and less developed areas.

Retinal cameras can be mydriatic or non-

mydriatic. Mydriatic cameras require pharmacolog-

ical eye dilation which can bring burdensome side

effects to allow wide-ﬁeld photographs of the ocu-

https://orcid.org/0000-0002-2047-2254

https://orcid.org/0000-0001-6118-2600

https://orcid.org/0000-0001-8994-5104

https://orcid.org/0000-0002-8727-2636

https://orcid.org/0000-0002-2881-313X

lar fundus (P

erez et al., 2012). On the other hand,

non-mydriatic cameras use near-infrared (NIR) illu-

mination systems to exploit patient’s natural dilation

in ambients with low light levels. Usually, NIR illu-

mination is used to preview the retina on the device

screen, and, once the monitor’s image is focused and

aligned, a ﬂash of visible light is enabled for the im-

age capture moment.

Recent technological developments have enabled

use of smartphones in designing small-sized and af-

fordable biomedical imaging devices, since they can

perform imaging, processing, and wireless commu-

nication tasks. Thus, the creation of mobile retinal

screening devices can play a fundamental role in over-

coming these barriers allowing the feasibility of oph-

thalmological telemedicine solutions (Karakaya and

Hacisoftaoglu, 2020). In this way, the EyeFundusS-

cope (EFS) A009 is a non-mydriatic handheld fundus

camera prototype that consists of a mobile device that

can illuminate the human eye fundus and captures im-

ages of it through a smartphone camera (Melo et al.,

2019). The illumination of the device consists of a

NIR system that is used during the alignment of the

prototype with the eye, being useful for the examiner

guidance, and a white visible system that is used in a

short period of time, during the ﬁnal moment of image

acquisition (Melo et al., 2018). The NIR illumination

is useful to ensure the mentioned non-mydriatic ac-

Nedelcu, T., Veiga, F., Santos, M., Liberal, M. and Soares, F.

Smartphone-based Approach for Automatic Focus Assessment in NIR Fundus Images Targeted at Handheld Devices.

DOI: 10.5220/0010325100730081

In Proceedings of the 14th International Joint Conference on Biomedical Engineering Systems and Technologies (BIOSTEC 2021) - Volume 2: BIOIMAGING, pages 73-81

ISBN: 978-989-758-490-9

quisition, since this wavelength light is not perceived

by the human eye, and therefore, does not lead to a

decrease of the pupil size (Toslak et al., 2018).

Regarding image focusing process, retinal cam-

eras can be equipped with a mechanical focusing

system that consists in displacing a compensation

lens that, when combined with the optics of the eye,

matches the image plane to the retina. This focus con-

trol mechanism is conceived to compensate for possi-

ble refractive errors in subject’s eyes (which can be

different for each eye). The EFS presents in its opti-

cal system an aspheric objective lens that minimizes

dioptre errors and the level of optical aberrations pro-

duced in the retinal image (Melo et al., 2018). How-

ever, this manual focusing process is error prone, es-

pecially when performed by inexperienced examin-

ers operating a handheld device, and may lead to sub-

optimal images which is not desirable for eye screen-

ing purposes.

This work presents an automatic focus assessment

approach for non-mydriatic NIR fundus images that

can be executed by a smartphone on a handheld de-

vice. Our method allows the system to ﬁnd the best

focus value based on the NIR preview images during

the manual alignment step, optimising the retinal im-

age quality at the moment of image capture.

We will also present an extensive comparative

analysis of focus features and measures, proposing a

new DCT-based function against a group of Gradient-

based, Statistical-based and Laplacian-based func-

tions in the same experimental setup. The ﬁrst ap-

proach consisted of a discriminative analysis of all

metrics performance and later, the authors decided to

observe the impact of the implementation of Machine

Learning models in the analysis, trying to take into

account the balance between performance and com-

putational processing time.

This paper is structured as follows: Section 1

presents the motivation and objectives of this work;

Section 2 summarizes the related work and applica-

tions found on the literature; Section 3 provides an

overview of the system architecture including the data

collection, followed by the methodologies studied for

the retinal focus assessment; in Section 4, the results

and discussion are presented; and ﬁnally, the conclu-

sions and future work are drawn in Section 5.

2 RELATED WORK

Unlike tabletop retinal cameras, EFS is a handheld

device and despite the help of its image acquisition

logic, it requires a low level of training. Therefore,

our research group developed strategies to maximize

the quality of the acquisition, minimizing reﬂections

associated with poor alignment of the device and non-

focused images due to errors in manual focus process.

Through the implementation of a ﬂexible eyecup and

an internal luminous ﬁxation system, the examiner

can ask the patient to ﬁx his gaze on a certain point,

stabilizing the acquisition device and making all the

images consistent in terms of imaged area (Soares

et al., 2020). In addition, the authors sought to imple-

ment other strategies that could enable a more robust

imaging system, which is fundamental for medical

screening purposes. Since the native digital camera

applications of state-of-the-art smartphones are not

able to use their automatic focus tool in NIR images,

the team felt the need to create a suitable focus assess-

ment pipeline that will be integrated into a previously

developed EFS mobile application.

To the best of our knowledge, the number of pub-

lished works on autofocusing in retinal imaging is

scarce. In the work of (Moscaritolo et al., 2009), it

is proposed an algorithm to assess optic nerve sharp-

ness with generation of a quantitative index. How-

ever, the authors use images captured with conven-

tional tabletop mydriatic devices in the visible spec-

trum, and does not present an extensive study compar-

ing focus metrics with the same experimental setup.

Regarding automatic focus algorithms for non-

mydriatic retinal imaging with NIR illumination,

(Marrugo et al., 2012) proposes a passive auto-focus

measure based on the directional variance of the nor-

malized discrete cosine transform (DCT). A focusing

window is selected such that there are retinal struc-

tures within for computation of the normalized DCT.

Consequently, a weighted directional sampling on the

normalized DCT is calculated and ﬁnally the focus

measure is the variance from all considered direc-

tions. Although a comparative analysis of the results

with other metrics is presented, the data is from table-

top retinal cameras (Marrugo et al., 2014), unlike the

present work that refers to a mobile and handheld

imaging system. The proposed approach performs the

focus score by using the ratio between high and low

frequencies of the DCT image. By using pre-deﬁned

masks, there are avoided regions from the DCT im-

age related to the noise component and others related

with the basic frequencies. In addition, the work aims

to study the impact of feature-based machine learning

with a set of classiﬁers evaluated in the EFS use case.

BIOIMAGING 2021 - 8th International Conference on Bioimaging

Light Control

Turn on

Fixation Point

Enable continuous

NIR LED

White LED ﬂash

EFS Device

Handheld

Alignment

Is best focus

locked?

Focus value

variation

Image Storage

Yes

Camera API

Get frame preview

Image capture

Start EFS App

Start

End

Figure 1: EFS Image Acquisition Logic with focus assessment.

3 METHODOLOGY AND

EXPERIMENTAL SETUP

The proposed software and hardware systems allows

the automatic focus assessment of NIR fundus images

targeted at handheld devices. The architecture of the

developed approach, illustrated in Figure 1, is divided

into three main modules: Light Control, EFS Device

and Camera API. When starting the mobile applica-

tion, the examiner has the possibility to choose a lu-

minous ﬁxation point of his choice to ﬁx the patient’s

eye gaze, which for this use case was the point that

maximized the centrality of the optic disc. After that,

the continuous light from the NIR LED is turned on,

which allows the handheld alignment of the device in

low levels of ambient light, getting several preview

frames of the retina. This induces the system to go

through different focus values under NIR illumina-

tion, a process that ends when the focus assessment

algorithm locks the best one. Finally, a white LED

is turned on, enabling the capture and storage of the

retinal image.

3.1 Data Collection

In order to study multiple approaches to evaluate the

best focus of NIR fundus images captured by the EFS,

a dataset of images with discrete steps of focus vari-

ation was collected using an Android application. A

session is considered by a set of images acquired from

the same eye, with a variation of focus from an initial

focus distance to a ﬁnal one. From the entire dataset a

number of 115 sessions were acquired. The dataset

is composed of a total of 853 NIR fundus images

from both eyes of 8 different volunteers (32 years-

old in average, with brown or blue eyes, 50% female,

5 of them wearing glasses that were removed for the

study). The written informed consents were obtained.

The EFS prototype was the only imaging device, in

order to assure overall consistency of the proposed so-

lution. Moreover, the optical focus distance was man-

ually determined by a panel of image quality experts

(see Figure 2) for each acquisition session (one posi-

tive label, among all the focus variation steps).

3.2 Focus Metrics

A set of metrics regarding image focus assessment

were evaluated considering feature extraction meth-

ods, followed by a dimensionality reduction approach

to compute the ﬁnal score (Table 1). Several measures

are used to reduce the dimension (mean, maximum

value, minimum values, sum of the values, skewness,

kurtosis, standard deviation) for a better usage of the

extracted features. By using feature functions which

were designed based on different principles, we gain

more insight regarding the possible capabilities of an

autofocusing algorithm. A wide range of features and

combined measures were selected in this study to pro-

Smartphone-based Approach for Automatic Focus Assessment in NIR Fundus Images Targeted at Handheld Devices

(a) Unfocused NIR. (b) Focused NIR. (c) Captured White.

Figure 2: Examples of images in the dataset.

vide a better overview of the focusing performances,

in the present use case of low contrast NIR images.

For each score computed using the feature metrics

in Table 1 and the one proposed in Section 3.3, the

focus distance is predicted by selecting the minimum

or maximum values within a session. Since the iden-

tiﬁcation of the exact focus distance is a challenging

task, in this work it is also evaluated the mean squared

error (MSE) based on the computed focus distance.

Considering f the ground-truth focus distance and f

the computed value, the MSE score s for the dataset

is computed as:

s =

∑

n=0

( f

− f

)

, (1)

where N = 115 represents the number of sessions.

3.3 DCT-HLM Presentation and Mean

Square Error Strategy

The proposed metric is based on a DCT ratio of high

and low frequencies, and the authors call it DCT-

HLM (High and Low Masks). Instead of applying

the algorithm over the entire image, to provide a fast

and reliable metric due to lower optical quality at the

edges of the illuminated ﬁeld-of-view, the DCT-HLM

is applied in a small region of interest (ROI) (Figure

3a). From the initial preview NIR image with the di-

mensions of 640 by 480 pixels, a ROI is centred in the

middle of the image and cropped for 200 by 200 pix-

els. The optic disc is usually centred due to the afore-

mentioned ﬁxation points feature and only the blue

channel is used. Over this image I (with the width

and length M = N = 200) the DCT transform is ap-

plied:

= α

M−1

∑

m=0

N−1

∑

n=0

cos

π(2m + 1)p

cos

π(2n + 1)q

(2)

Table 1: Summary of focus functions and measures.

Group Feature function Measure

Brenner Function

Gaussian Derivative

Gradient- Squared Gradient mean, std, min

based Thresholded Absolute Grad max, sum, L2norm

Gradient Energy skew, kurt

Tenenbaum Grad

Tenenbaum Grad Variance

Variance

Normalized Variance

Statistics Histogram Entropy entropy b

-based

Vollath’s F4 mean, std, min

Vollath’s F5 max, sum, L1norm

L2norm, skew, kurt

Modiﬁed Laplacian

Laplacian- Energy of Laplacian

based Diagonal Laplacian

Variance of Laplacian

Laplacian Filter

DCT Energy Ratio mean, std, min

DCT-based DCT Reduced Energy Ratio max, sum, L2norm

Modiﬁed DCT skew, kurt

Image Curvature

Miscellaneous Spatial Frequency

Image Contrast

Helmli & Scheres Mean

where

(

√

, p = 0

, 1 ≤ p ≤ M −1

(3)

and

(

√

, q = 0

, 1 ≤q ≤ N −1

(4)

Over the image in DCT domain D, the absolute values

are computed:

= |D| (5)

To compute the ratio of high and low frequencies,

specialised masks are used (M

and M

). In Figure 4

BIOIMAGING 2021 - 8th International Conference on Bioimaging

(a) Image ROI. (b) DCT representation.

Figure 3: Image ROI (a) and DCT representation (b).

(a) Low frequency

mask.

(b) High frequency

mask.

Figure 4: Low (a) and high (b) DCT masks used to compute

the metric score.

the masks for high and low coefﬁcients of image in

the DCT domain are presented.

The metric score s is computed as the ratio of DCT

low frequency component over the high frequency

one:

s =

∑

M−1

m=0

∑

N−1

n=0

M

∑

M−1

m=0

∑

N−1

n=0

M

(6)

where M and N represents the image dimensions of

the cropped image (M = N = 200), and () is the

element-wise product.

Since it is desirable to obtain a sharp image re-

gardless of the existing noise, the ratio of the high and

low frequencies is used, eliminating the top and bot-

tom frequencies (see masks in Figure 4). By remov-

ing the last lines and columns from the high frequency

mask, the noise related components are omitted while

keeping most of the information related to that spe-

ciﬁc frequency. A similar procedure is adopted to re-

move the base frequencies for the mask of low fre-

quencies.

The DCT-HLM focus metric in Equation 6 was

tested multiple times with sets of 500 previews, re-

sulting in an average computational time of 0.0034

seconds running in a Smartphone Galaxy S8 (G950F)

with a Octa-core CPU (4x2.3 GHz Mongoose M2 &

4x1.7 GHz Cortex-A53).

3.4 Study of Discriminatory Metrics

with Optimal Threshold

Most of the focus functions found in the literature

(Pertuz et al., 2013), the higher the value of the focus

metric the more focused the image is. To understand

the discriminatory ability of a single metric, we could

specify a threshold or cut-off probability at which an

image is classiﬁed as focused or unfocused. The se-

lection of a threshold can have a dramatic effect on a

focus metric Accuracy.

The selection of an optimal threshold for each fo-

cus metric was performed by using the optimal thresh-

old module of the Image Focus Assessment (IFA)

component developed at Fraunhofer Portugal. The

IFA component is composed by two modules: the fea-

ture extraction and the optimal threshold module. The

feature extraction module, given a set of images, ex-

tracts the focus metrics values of each of the focus

functions (see Table 1), outputting a dataset where

the features are these focus metrics values. The op-

timal threshold module was based on an open source

R package (Freeman and Moisen, 2008) and, given

the dataset outputed by the feature extraction module,

it determines the discriminatory ability of the focus

metrics by ﬁnding the optimal threshold for each of

the metrics.

Having the collected data as described in Section

3.1, we used the IFA component to ﬁnd the most dis-

criminatory metrics, following the steps enumerated

below:

1. Dataset Creation: The feature extraction module

was used to create the dataset of the focus metrics

of the focused and unfocused images. One of the

options chosen when producing the dataset was to

output the normalized values of the focus metrics,

using the Scaling normalization. This is impor-

tant because for the next steps the values of the

features must be normalized between 0 and 1.

2. Dataset Split: The dataset was split into train and

test samples by a ratio of 70/30 (see Table 2). This

step is important to validate the optimal threshold

found for each metric. Also, the same train and

test data was used in Section 3.5 to have a more

accurate comparison between methods.

3. Find the Optimal Threshold: There are multi-

ple criteria by which we can calculate the opti-

mal threshold (e.g. to maximize the Area Under

the ROC curve (ROC-AUC), to deﬁne minimum

Sensitivity or Speciﬁcity, to ﬁnd the best sum of

Sensitivity and Speciﬁcity). After testing differ-

ent criteria, the one which yield better results in

test data was to set the minimum Sensitivity as

0.7, which allows to ﬁnd the optimal threshold

that meets that requirement (e.g ﬁnd the highest

speciﬁcity while meeting the required sensitivity).

4. Produce Tables and Plots: To summarize the

process of ﬁnding the most discriminatory met-

rics, the optimal threshold module outputs a table

Smartphone-based Approach for Automatic Focus Assessment in NIR Fundus Images Targeted at Handheld Devices

with the performance metrics for each of the focus

metrics (see Table 3) and plots: bar plot of the ob-

served values as predicted probability, ROC plot,

Error Rate vs Threshold (e.g. bar plot Figure 5).

Table 2: Distribution of train/test datasets with a 70/30 split.

Dataset

Focused

Images

Non Focused

Images

Total

Train 80 517 597

Test 35 221 256

Total 115 738 853

As mentioned previously, for most of the focus

function in the literature the higher the value of the

metric, the more focused the image is. In these cases

after normalizing the focus metrics values, a perfectly

focused image would have a value of 1 and a perfectly

unfocused image would have a value of 0. However,

this is not always the case since there are some met-

rics that the distribution of the data does not follow

this pattern. In these situations we create a symmetric

representation of the data.

A focus metric with a good discriminatory power

will have all or most of its focused values close to

the extremes. Otherwise, it means that this metric by

itself does not discriminate well. A good indicator

that it does, is when all of the optimised thresholds

fall into the center of the double humped histogram

(see Figure 5).

Figure 5: Example of a focus metric where the optimized

threshold falls near the center (see mark) of the double

humped histogram.

3.5 Evaluation with Machine Learning

In order to train a model that correctly predicts if an

image is focused or unfocused, one must ﬁnd and op-

timise an appropriate machine learning pipeline. For

instance, feature engineering and model parameter

tuning are some of the most time consuming tasks

when developing machine learning and usually re-

quire some expertise.

In this work, the selection and optimisation of

the machine learning pipeline was performed using

a Feature-based Machine Learning (FbML) frame-

work introduced by (Gonc¸alves et al., 2019). This

framework enables fast exploration of machine learn-

ing models and has an optimisation tool which in-

cludes features such as search space initialization via

meta-learning (search for similar datasets and ini-

tialize hyper-parameter optimisation algorithm with

the found conﬁguration), data pre-processing (balanc-

ing, imputation of missing values, re-scaling), feature

transformation, and feature and classiﬁer selection.

To ﬁnd the best machine learning pipeline several

options were explored in the FbML framework:

1. Feature Transformation/Selection: Principal

component analysis (PCA); Univariate Feature

Selection; Classiﬁcation Based Selection (L1-

regularized Linear SVM); None.

2. Classiﬁers: K-Nearest Neighbors; Decision

Trees, Random Forest, AdaBoost, Linear and

Non-linear Support Vector Machines.

3. Validation Strategy: 5-Fold Cross Validation.

4. Optimisation Metric: ROC-AUC.

The data used to train and test the model was the

same used in Section 3.4 (see Table 2) plus the pro-

posed DCT-HLM. To evaluate the machine learning

models, k-fold cross-validation was performed. Also,

the resulting model was furthered tested on the test

dataset to ensure the model Accuracy.

4 RESULTS AND DISCUSSION

One may think that the number of subjects in the

present study was small. However, the image ac-

quisition protocol applied with the EFS prototype in-

troduces a natural variability in the acquisitions, be-

cause the examiner (photographer) can achieve a good

alignment with the pupil with slightly different angles

of light incidence. It can also approximately achieve

the centrality of the optic disc while activating the in-

ternal ﬁxation point (that the patient eye should aim

for). This was manually veriﬁed for all the images

from all the sessions, as well as the natural variability

of focus.

The MSE presents useful results regarding the

overall performance of each individual focus metric.

The focus functions provided satisfactory predictions

of the focus distance as it can be seen in the initial

results in Figure 6 (the smaller the value, the bet-

ter). The last metric of the plot (number 177) rep-

resents the score of the DCT-HLM. Although the re-

sults are not outstanding, since the DCT-HLM is a

BIOIMAGING 2021 - 8th International Conference on Bioimaging

lightweight metric (low complexity) it is able to run

in real time (CPU runtime in a Hexa-core Intel



Core

i7-10710U on a laptop: 0.00167 seconds) be-

ing a suitable approach for this problem (Figure 7). In

spite of the Laplacian ﬁlter is providing the best MSE

score, we can notice that it requires 0.013 seconds to

process an image being also the slowest of them all.

Figure 6: MSE score of the focus distance prediction of

each individual metric.

Figure 7: MSE score with respect to the computational

time. In red is represented the result for DCT-HLM.

For the task of predicting the exact focus distance,

the Precision usually provides an interesting overview

since it can show the correctly identiﬁed focus dis-

tance for each session (considering 115 as the total

number instances). Because the Precision was never

above 40% for each measure individually, the authors

decided to optimise thresholds using the full range of

images and evaluate the impact of adding Machine

Learning to the problem.

As explained before, the MSE offers interesting

information about the performance of the focus func-

tions, regarding the variation between the ground-

truth focus distance and the detected one. In the initial

stage the Laplacian ﬁlter with the absolute values of

the sum (LapFiltSUMVALABS) was already provid-

ing the best value of MSE and DCT-HLM was per-

forming reasonably.

For these two speciﬁc features, the optimised

thresholds are shown in Figure 8 and Figure 9. As it

can be seen there is a better separation for Laplacian-

based than the DCT-based one. From all the individ-

ual features, a subset of the ones achieving better per-

formance after the threshold optimisation procedure

are shown in Table 3.

Table 3: Top performing features after threshold

optimisation: (1) LapFilt SUMVAL ABS; (2) Lap-

Filt L2NORM ABS; (3) ModDCT SUMVAL ABS; (4)

TenGrad MEAN ABS; (5) BrennFunc SUMVAL ABS;

(6) DCT HLM centred.

Metric Threshold Acc. Sens. Spec.

1 0,57 0,69 0,69 0,69

2 0,49 0,68 0,66 0,69

Test 3 0,47 0,69 0,57 0,71

4 0,47 0,64 0,66 0,64

5 0,41 0,61 0.60 0,62

6 0,43 0,44 0,86 0,37

Figure 8: LapFilt SUMVAL ABS test data Histogram plot.

Figure 9: DCT HLM centred test data Histogram plot.

The top performing models among the multiple

classiﬁer architectures are presented in Table 4, us-

ing the ROC-AUC as optimisation metric and feature

selection. We can see that the SVM was not the best

in the training phase but it generalized well for the

test set, achieving better AUC and higher Sensitiv-

ity than the others. The best performing model was

obtained using the optimization pipeline described in

Section 3.5 with a linear kernel and C=20, using 43

features selected. The computational time need to run

the SVM model was 2.97 ×10

−6

seconds, evaluated

Smartphone-based Approach for Automatic Focus Assessment in NIR Fundus Images Targeted at Handheld Devices

in the inference job with the selected test data.

Table 4: Classiﬁcation results for the two best performing

models for preview NIR image focus assessment.

Model Data Acc. Sens. Spec. AUC

AdaBoost Train 0.94 1 0.93 0.97

Test 0.79 0.71 0.80 0.76

Random Train 0.95 1 0.95 0.97

Forest Test 0.87 0.60 0.91 0.75

Linear Train 0.79 0.86 0.78 0.82

SVM Test 0.77 0.83 0.76 0.80

In the present use case, if the Sensitivity is too low

it may result that not a single frame is selected as well

focused in a given eye session, making the user repeat

the acquisition and hampering the usability of the so-

lution in daily life. For this reason and taking into ac-

count the unbalanced dataset, the authors were ﬁrstly

optimising the models for the best possible AUC and

then selecting the one with best Sensitivity (among

the 30 top models of each type of classiﬁer). On the

other hand, the False Positives cases may be not so

critical as long as they are near the unique optimal

distance that was selected by the experts. For this

reason, the authors went to investigate the 52 False

Positives cases provided by the best performing SVM

model and concluded that 87% of them were cases ad-

jacent to the selected ground-truth value of each ses-

sion. Since in general these adjacent values in the fo-

cus variation levels show very close sharpness to the

human expert, the authors consider these as very rel-

evant results.

5 CONCLUSION

Mobile screening devices have the potential to play a

key role in the decentralization of ophthalmological

screening actions, making the range of telemedicine

solutions greater and, consequently, contributing to

the early diagnosis of diseases such as Diabetic

Retinopathy and Glaucoma in underserved areas.

Since manual focusing process is error prone, espe-

cially when performed by inexperienced examiners, it

may lead to unfocused images which is not desirable

for medical screening purposes.

In this paper, a new smartphone-based approach

for automatic focus assessment in NIR fundus images

targeted at handheld devices has been proposed. An

acquisition pipeline was developed and implemented

into the mobile application of a non-mydriatic fundus

camera developed in previous works, the EFS. This

approach allows the device to search for the best fo-

cus value when the examiner is previewing the retinal

image, under NIR illumination.

A new focus measure was presented, DCT-HLM,

which is based on the ratio between high and low

frequency values of the image DCT, by using pre-

computed masks. Despite not having the best score,

the proposed measure is suitable for this use case due

to the short time of computational power that requires.

Besides the proposed method, a second study was per-

formed considering the best performing metrics ex-

tracted by using a machine learning approach. Al-

though this approach can be more time consuming, it

can be performed after a set of images is acquired to

perform the focus assessment. By using a machine

learning approach, namely with SVM classiﬁers, the

results are improved considerably as described in the

Section 4.

In future work, the outcomes of the top perform-

ing models found in this work, will be veriﬁed again

after running the respective classiﬁers integrated in

the Android application. The balance between com-

putational performance and autofocus performance

will be calibrated. The application front-end may be

adapted to provide highly visual feedback of the fo-

cused retina, as one of the ways to simplify the us-

ability and adoption of EFS prototype by non-experts

in ophthalmology. These developments will be tested

during a pilot study in a private hospital.

ACKNOWLEDGEMENTS

This work was supported by EyeFundusScopeNEO:

Demonstration of EyeFundusScope with Non-Expert

Ophthalmology users, co-funded by Portugal 2020,

framed under the COMPETE 2020 (Operational Pro-

gram Competitiveness and Internationalization) and

European Regional Development Fund from Euro-

pean Union, with operation code POCI-01-0247-

FEDER-038400.

A special acknowledgement to all participants in

the data collection, and to Ruben Moutinho and Cris-

tiana Braga for helping with raw materials used dur-

ing the data collection.

REFERENCES

Freeman, E. A. and Moisen, G. (2008). PresenceAbsence:

An R package for PresenceAbsence analysis. Journal

of Statistical Software, 23(11):1–31.

Gonc¸alves, J., Conceic¸

ao, T., and Soares, F. (2019). Inter-

observer reliability in computer-aided diagnosis of di-

abetic retinopathy. In Proceedings of the 12th Inter-

national Joint Conference on Biomedical Engineering

Systems and Technologies. SCITEPRESS - Science

and Technology Publications.

BIOIMAGING 2021 - 8th International Conference on Bioimaging

Karakaya, M. and Hacisoftaoglu, R. E. (2020). Comparison

of smartphone-based retinal imaging systems for dia-

betic retinopathy detection using deep learning. BMC

Bioinformatics, 21(S4):259.

Marrugo, A., Mill

an, M., Cristobal, G., Gabarda, S., and

Abril, H. (2012). Anisotropy-based robust focus mea-

sure for non-mydriatic retinal imaging. Journal of

biomedical optics, 17.

Marrugo, A. G., Millan, M. S., and Abril, H. C. (2014).

Implementation of an image based focusing algorithm

for non-mydriatic retinal imaging. In 2014 III Inter-

national Congress of Engineering Mechatronics and

Automation (CIIMA). IEEE.

Melo, D., Costa, J., Soares, F., and Vieira, P. (2018). Op-

tical Design of a Compact Image Acquisition De-

vice for Mobile Diabetic Retinopathy Screening:. In

Proceedings of the 11th International Joint Confer-

ence on Biomedical Engineering Systems and Tech-

nologies, pages 63–70, Funchal, Madeira, Portugal.

SCITEPRESS - Science and Technology Publications.

Melo, D., Soares, F., Felgueiras, S., Gonc¸alves, J., and

Vieira, P. (2019). A new compact optical system pro-

posal and image quality comparison against other af-

fordable non-mydriatic fundus cameras. In Biomedi-

cal Engineering Systems and Technologies, pages 26–

48. Springer International Publishing.

Moscaritolo, M., Knezevich, F., Jampel, H., and Zeimer, R.

(2009). An Image Based Auto-Focusing Algorithm

for Digital Fundus Photography. IEEE Transactions

on Medical Imaging 28(11):1703-7.

Pertuz, S., Puig, D., and Garcia, M. A. (2013). Analysis of

focus measure operators for shape-from-focus. Pat-

tern Recognition, 46(5):1415–1432.

erez, M. A., Bruce, B. B., Newman, N. J., and Biousse, V.

(2012). The Use of Retinal Photography in Nonoph-

thalmic Settings and Its Potential for Neurology. The

Neurologist, 18(6):350–355.

Rani, P., Nangia, V., Murthy, K., Khanna, R., and Das, T.

(2018). Community care for diabetic retinopathy and

glaucoma in India: A panel discussion. Indian Journal

of Ophthalmology, 66(7):916.

Soares, F., Gonc¸alves, J., Felgueiras, S., Peixoto, R., Mene-

ses, R., and Melo, D. (2020). Smartphone-based hand-

held optical device and method for capturing non-

mydriatic retinal images, Patent EP3695775A1.

Toslak, D., Liu, C., Alam, M. N., and Yao, X. (2018).

Near-infrared light-guided miniaturized indirect oph-

thalmoscopy for nonmydriatic wide-ﬁeld fundus pho-

tography. Optics Letters, 43(11):2551–2554.

WHO (2019). World report on vision. World Health Orga-

nization, Geneva. Licence: CC BY-NC-SA 3.0 IGO.

Smartphone-based Approach for Automatic Focus Assessment in NIR Fundus Images Targeted at Handheld Devices