A First-person Database for Detecting Barriers for Pedestrians

Zenonas Theodosiou

, Harris Partaourides

, Tolga Atun

, Simoni Panayi

and Andreas Lanitis

1,2

Research Centre on Interactive Media Smart Systems and Emerging Technologies, Nicosia, Cyprus

Department of Multimedia and Graphic Arts, Cyprus University of Technology, Limassol, Cyprus

Keywords:

Pedestrians, Safety, Visual Lifelogging, Egocentric Vision, First-person View, Dataset.

Abstract:

Egocentric vision, which relates to the continuous interpretation of images captured by wearable cameras, is

increasingly being utilized in several applications to enhance the quality of citizens life, especially for those

with visual or motion impairments. The development of sophisticated egocentric computer vision techniques

requires automatic analysis of large databases of ﬁrst-person point of view visual data collected through wear-

able devices. In this paper, we present our initial ﬁndings regarding the use of wearable cameras for enhancing

the pedestrians safety while walking in city sidewalks. For this purpose, we create a ﬁrst-person database that

entails annotations on common barriers that may put pedestrians in danger. Furthermore, we derive a frame-

work for collecting visual lifelogging data and deﬁne 24 different categories of sidewalk barriers. Our dataset

consists of 1796 annotated images covering 1969 instances of barriers. The analysis of the dataset by means

of object classiﬁcation algorithms, depict encouraging results for further study.

1 INTRODUCTION

Walking is the most basic and highly popular form of

transportation and it is evident today that it is getting

more dangerous. According to the World Health Or-

ganization (WHO, 2019), 270K pedestrians per year

lose their lives around the world. Contemporary cities

have to deal with the various problems caused by the

increasing amount of technical barriers and damages

that occur on the footpaths which endanger the lives

of pedestrians (Sas-Bojarska and Rembeza, 2016).

Guaranteeing everyday urban safety has always been

a central theme for local authorities, addressing re-

markable human, social, and economic aspects. The

need for clear paths in urban sidewalks, free of bar-

riers, continuous, and in a well-maintained condition

is of great importance. Thus, the automatic detection

of obstructions and damages can have a positive im-

pact on the sustainability and safety of citizens’ com-

munities. The pedestrian detection (Szarvas et al.,

2005) is one of the main research areas as an ultimate

aim to develop efﬁcient systems to eliminate deaths

in trafﬁc accidents. The safety in roads has attracted

a large interest in the last years and a number of stud-

ies have been presented for both pedestrians (Nesoff

et al., 2018; Wang et al., 2012) and drivers (Timmer-

mans et al., 2019). A study on pothole detection was

presented by Prathiba el al. (Prathiba et al., 2015) for

the identiﬁcation of different types of cracks on road

pavements. Wang et al. (Wang et al., 2012) developed

the WalkSafe, a smartphone application for vehicles

recognition to help pedestrians cross safely roads.

Jain et al.(Jain and Gruteser, 2017) presented an ap-

proach based on smartphone images for recognizing

the texture of the surfaces in pedestrians routes to be

used for safety purposes. A mobile application which

uses phone sensors was also presented to enhance the

safety of the distracted pedestrians (Tung and Shin,

2018). On the other hand, Maeda et al (Maeda et al.,

2018) proposed an approach for the detection of sev-

eral road damages in smartphones using convolution

networks.

In the realm of safety, the practicality and efﬁcient

use of wearable cameras can effectively help increase

the safety of pedestrians. The continuous visual data

acquisition can lead to the real-time detection of ob-

structions, warning the wearers of the potential dan-

gers and alerting the authorities for taking mainte-

nance or corrective actions for ensuring the elimina-

tion of dangerous spots for pedestrians. Due to the

broad use of deep learning algorithms in analysing

visual lifelogging data, the existence of large anno-

tated datasets are more essential than ever before. Al-

though there are available datasets created by wear-

able or smartphone cameras refer to road safety, none

of them is dedicated speciﬁcally for the safety of

660

Theodosiou, Z., Partaourides, H., Atun, T., Panayi, S. and Lanitis, A.

A First-person Database for Detecting Barriers for Pedestrians.

DOI: 10.5220/0009107506600666

In Proceedings of the 15th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP 2020) - Volume 5: VISAPP, pages

660-666

ISBN: 978-989-758-402-2; ISSN: 2184-4321

pedestrians in sidewalks. This work outlines a ﬁrst-

person database, which can be used for the develop-

ment of techniques for automatic detection of barri-

ers and other damages that pose safety issues to the

pedestrians. In addition, we present initial results

on the performance of the dataset in a classiﬁcation

scheme using a well-known deep Convolutional Neu-

ral Network (CNNs) as a baseline and elaborate on

the promising outcomes.

The structure of the rest of the paper is as follows:

Section 2 presents the current state of the art focus-

ing on visual lifelogging, datasets and image inter-

pretation techniques. Section 3 is dedicated on the

created dataset explaining the method we have fol-

lowed to collect and annotate the lifelogging data.

The methodology we used to evaluate the perfor-

mance along with the initial results are shown in Sec-

tion 4. Finally, conclusions and future work are drawn

in Section 5.

2 BACKGROUND

2.1 Visual Lifelogging

Visual lifelogging has been broadly used nowadays

due to the advances in wearable and sensing tech-

nologies (Theodosiou and Lanitis, 2019). The small

size and light weight of wearable cameras in addi-

tion to the broad use of smartphone devices allow

the 24/7 uninterrupted acquisition of the carriers daily

life (Bola

nos et al., 2015). The interpretation of lifel-

ogs can lead to useful results which can be exploited

to enhance health, protection, security, and to analyze

lifestyle and daily habits. The automatic analysis of

visual lifelogging data combining both computer vi-

sion and machine learning techniques, is known as

Egocentric or First-person camera Vision.

Several image analysis methodologies have been

proposed dedicated on visual lifelogging for both

indoor and outdoor applications. In (Bolaos and

Radeva, 2016) a two-step method is presented for

food detection and recognition in lifelogging images.

A recognition of personal locations in daily activi-

ties of the wearer is studied in (Furnari et al., 2017)

while social interactions and lifestyle patterns are ana-

lyzed in (Bano et al., 2018) and (Herruzo et al., 2017)

respectively. Visual lifelogging has also been used

in ambient assisted living applications (Climent-Prez

et al., 2020) such as fall detection, monitoring, etc.

Wearable cameras can play the role of a digi-

tal memory helping people with memory problems

to improve the quality of their daily life (Oliveira-

Barra et al., 2019). Thus, the recording, storage

and retrieval of inaccessible memories through vi-

sual lifelogging has been extensively studied the last

years (Silva et al., 2016). In addition, navigation

and safety applications have also been developed us-

ing wearable cameras, enhancing the quality of liv-

ing for several groups of people. Jiang et al. (Jiang

et al., 2019) proposed a vision sensors based system

for assisting people with vision impairments while

in (Maeda et al., 2018) a smartphone application

of road damages detection is presented for drivers

protection. A variation of the latter system can be

adopted to detect and recognize the barriers in side-

walks for pedestrians safety.

2.2 Egocentric Databases

The creation of annotated databases is a crucial step

for the development of new egocentric vision tech-

niques. The current trends in automatic analysis

and interpretation of lifelogs collected from wear-

able camera devices relate with deep learning meth-

ods that perform better when are trained and tested

on qualitative and quantitative data. Mayor and Mur-

ray (Mayol and Murray, 2005) created the ﬁrst ego-

centric vision dataset with 600 frames captured by a

wearable camera installed on the left shoulder of the

wearer. The dataset was used for training systems to

recognize hand actions. The abundance and availabil-

ity of wearable cameras and smartphones have led to

the creation of several ﬁrst person datasets the last

years (Bola

nos et al., 2015) including datasets for ob-

ject recognition (Bullock et al., 2015), activity recog-

nition (Gurrin et al., 2016), social interaction analy-

sis (Bano et al., 2018), etc. However, not all datasets

are publicly available to the academic community.

Epic-kitchens (Damen et al., 2018) is the largest

publicly available egocentric dataset with a total of

55 hours of recordings collected by a head-mounted

high-deﬁnition camera. The dataset consists of

11.5 million frames covering daily activities in 32

kitchens. KrishnaCam (Singh et al., 2016) is an-

other example of a large available egocentric dataset

dedicated on daily outdoor activities captured using

Google Glass. It consists of 460 unique video record-

ings, each ranging in length from a few minutes to

about a half hour of video, making up 7.6 million

frames in total.

Concerning the road safety, several datasets have

been created for road and pavement cracks detec-

tion (Gopalakrishnan, 2018). The dataset presented

by Zhang et al. (Zhang et al., 2016) was created using

a low-cost smartphone. The dataset consists of 500

images and was used to detect cracks on pavements

with the aid of deep convolutional networks. A large-

A First-person Database for Detecting Barriers for Pedestrians

661

scale dataset focused on road damages was created by

Maeda et al. (Maeda et al., 2018). This dataset was

made up of 9,053 road damage images captured with

a smartphone installed on a car. A large dataset was

created by merging recordings captured by 8 pedestri-

ans while walking in 4 large cities (Jain and Gruteser,

2017). The dataset used to detect different surfaces in

daily walking paths. Although the dataset has many

possibilities to be used in several applications, its not

annotated according to the relevant barriers and its not

publicly available yet.

2.3 Image Interpretation

The interpretation of visual lifelogs requires ﬂexi-

ble algorithms that can address their speciﬁc features

such as, large number of objects, blurring, motion ar-

tifacts, lighting variations, etc. Due to the difﬁculties

of the traditional algorithms to cope with these limi-

tations, deep learning algorithms have been success-

fully used the last years to analyze the visual content

of data collected through wearable devices.

CNNs have been established as the most promi-

nent strain of neural networks within the ﬁeld of com-

puter vision due to their efﬁciency in capturing spa-

tial dependencies in images. They have achieved

great strides in fundamental tasks for image inter-

pretation such as classiﬁcation, localization and ob-

ject detection. The radical advancements in CNNs

has been possible by the abundance of large pub-

lic image repositories in the likes of ImageNet, Pas-

cal VOC and MS COCO (Russakovsky et al., 2015;

Lin et al., 2014; Everingham et al., 2007) which

serve as platforms for enhancing generations of ar-

chitectures in a bid to achieve state-of-the-art perfor-

mances (Krizhevsky et al., 2012; Simonyan and Zis-

serman, 2014; He et al., 2016).

Since their conception (LeCun et al., 1998), CNNs

have evolved into numerous different architectures

with allow for an increased network depth and com-

plexity but, at their core, their basic components have

remained very similar. Even though ResNet is ap-

proximately 20 times deeper than AlexNet and 8

times deeper than VGGNet (Gu et al., 2018), all three

architectures consist of three types of core layers:

convolutional, pooling and fully connected layers.

In general, a CNN is considered as a hierarchi-

cal feature extractor where the convolutional layers

are responsible for creating feature maps, the pooling

layers are used for reducing the resolution of the fea-

ture maps and the fully connected layers perform the

high-level feature learning, i.e. image interpretation

tasks.

3 BARRIER DETECTION

DATASET

Based on our background research, we observed that

there is no available dataset to effectively tackle is-

sues regarding the safety of pedestrians on city side-

walks. The detection and identiﬁcation of barriers

that may endanger pedestrians’ lives are facing sev-

eral challenges such as: the barriers type (i.e. cracks,

objects, etc.), the small differences between the harm-

less and dangerous barriers a pedestrian encounters

in sidewalks, the fact that a harmless object can be-

come dangerous in the case of pedestrians, and the

fact that a barrier is considered dangerous when it

is in the vicinity of the pedestrian. To this end, we

design a methodology for collecting and annotating

such data which then utilize to populate our dataset.

In this section, we present in more detail the proposed

methodology, the data acquisition and annotation pro-

cess, followed by an overview of the created dataset.

3.1 Proposed Methodology

Our approach for populating the pedestrians’ barriers

dataset consists of three core tasks. Initially, a person

walks around the city limits sidewalks collecting in

frequent time intervals snapshots with the rear camera

of a mid-range smartphone. The smartphone is placed

on the chest of the wearer with a slight downwards an-

gle to accommodate our target area which is near the

pedestrian’s feet. Our approach considers a mid-range

smartphone which is used by the majority of pedes-

trians. This is followed, by a pre-processing step that

blurs faces, license plates, and brands to solve privacy

issues and comply with the EU General Data Protec-

tion Regulation (GDPR).

Finally, the annotation process consists of plac-

ing bounding boxes around the categories derived by

our analysis of the common barriers existing in ur-

ban areas. Speciﬁcally, the annotator must consider

the danger area around a pedestrian. This translates

to a region of 2m around the pedestrian. Considering

the downwards angle of the camera this is broadly de-

ﬁned in the lower half of the collected images. Addi-

tionally, the annotation process considers up to three

(most imminent) barriers which are placed up to the

middle of the image. The barriers are categorized in

24 classes spanning in 3 main categories and 7 barrier

types covering a broad range of possible barriers on

the city sidewalks that affect pedestrians’ safety. See

Table 1 for more details.

VISAPP 2020 - 15th International Conference on Computer Vision Theory and Applications

662

3.2 Dataset

During the ﬁrst stage of the acquisition process, an in-

dividual collected 1796 ﬁrst-person images by walk-

ing 12.3Km in the city center of Nicosia, Cyprus. The

smartphone used was an iPhone 7 with snapshot res-

olution of 1512x2016 pixels . The collected images

were manually annotated using the VGG Image An-

notator tool (Dutta and Zisserman, 2019) into the 24

different classes. A total of 1969 bounding boxes

were found. 472 of the bounding boxes are related

to barriers of the Infrastructure, 909 are related to

barriers of Physical Condition and the remaining 588

to the category indicating Temporary barriers. Tree

was the most popular Barrier class with 360 bound-

ing boxes while Boulder, Chair, Table, Mail Box and

Bench were the less popular classes (1). Examples of

the annotation on different types of barriers are de-

picted in Fig. 1.

Looking at the confusion matrix, we can observe

that there is a general trend which suggests that as

the number of examples within a class increases so

do its performance metrics. This is because the more

examples a classiﬁer is given, the more robust it can

become in identifying the features of a speciﬁc class.

It can also be seen that for some classes, our baseline

model has difﬁculty distinguishing them and conse-

quently mislabels most of their examples. For exam-

ple, class B64 has most of its images misclassiﬁed in

B01 which suggests that either there are not enough

B64 examples or that the two barriers have very simi-

lar features. Similarly, B63 has approximately half of

its examples classiﬁed in B01.

4 EXPERIMENTAL RESULTS

To evaluate our dataset, we deploy a typical CNN

deep network using using Tensorﬂow (Chollet et al.,

2015) and train in an end-to-end manner using

stochastic gradient descent.

Speciﬁcally, we utilize a variant of the VGG-16

model architecture with 13 convolutional layers, 5

max pooling layers followed by two dense layers and

a dropout with a ratio set to 0.5, initialized with the

pretrained weights of ImageNet.

As part of the classiﬁcation process, image regions

containing barriers, as indicated in the annotation pro-

cess, were cropped in order to isolate the objects as

individual images. We then perform a simple clas-

siﬁcation on each individual object found within our

dataset using the VGG network.

Before the images are passed through the network,

we perform preprocessing steps to avoid overﬁtting

Table 1: Barrier types in our dataset.

Category Barrier

Type

Detail Class #

Condition

Damage

Crack B00 28

Physical Hole/

Pot-hole

B01 323

Paver

(broken)

B02 27

Layout

Narrow

Pavement

B10 54

Pavement

B11 40

Infrastructure

Furniture

Bench B20 2

Street Light B21 80

Bin B22 70

Parking

Meter

B23 53

Plant Pot B24 88

Street Decor

Tree B30 360

Shrub B31 52

Parking

Prevention

B32 2

Mail Box B33 202

Temporary

Vehicles

4-wheels B50 85

2-wheels B51 61

Construction

Boulder B61 8

Safety

Sign

B62 74

Fence B63 161

Trafﬁc

Cone

B64 151

Other

Litter B70 16

Chair B71 6

Table B72 9

Advert.

Sign

B73 17

during training. The preprocessing steps include sub-

sampling to the ﬁxed-size of 256x256 pixels, shuf-

ﬂing the training examples, normalize the images in

the range [0, 1], varying the brightness of images as

well as some image geometric transformations such

as random rotations, width and height shifts, horizon-

tal ﬂips and image magniﬁcations.

In order to balance our dataset, we conducted

training and evaluation using a reduced version of our

dataset where we removed any class which contained

30 examples or less. Our revised dataset included

15 out of the original 24 barrier classes which subse-

quently yielded a more balanced class representation.

Additionally, we randomly divided our data in a ratio

of 70% training data (1297 images) and 30% valida-

tion data (557 images).

For this experiment, our classiﬁer was trained for

200 epochs with a batch size of 64. During train-

ing the model achieved an accuracy of 65% and a

weighted average of 59% whereas during validation

it reached 55% for both accuracy and weighted av-

A First-person Database for Detecting Barriers for Pedestrians

663

(a) (b) (c) (d)

Figure 1: Sample images from the created dataset: (a) No Pavement, (b) Parking Meter, (c) 2-Wheels, (d) 4-Wheels.

Table 2: The table represents the confusion matrix produced when the baseline model is used make predictions on the valida-

tion set.

B01 B10 B11 B21 B22 B23 B24 B30 B31 B33 B50 B51 B62 B63 B64 Recall

B01 105 0 0 0 0 0 0 33 0 10 1 2 3 24 52 0.46

B10 0 5 1 0 0 0 0 11 0 2 0 0 0 18 0 0.14

B11 0 0 18 0 0 0 0 2 0 0 8 0 0 0 0 0.64

B21 0 0 0 16 0 0 0 12 1 0 14 3 4 0 0 0.32

B22 2 0 0 1 20 0 0 12 0 1 8 1 2 0 0 0.43

B23 0 0 0 1 0 18 0 5 1 3 0 0 6 0 0 0.53

B24 0 0 4 1 0 1 16 19 0 9 1 11 4 0 0 0.24

B30 2 0 0 1 12 8 0 214 0 6 2 1 1 0 0 0.87

B31 0 0 0 3 0 0 0 12 12 1 0 1 5 0 0 0.35

B33 10 0 1 0 1 0 0 20 0 100 4 3 0 1 0 0.71

B50 0 0 3 0 0 0 0 8 0 0 45 3 0 0 0 0.76

B51 0 0 0 0 0 0 0 7 0 0 3 32 3 0 0 0.71

B62 1 0 3 1 0 0 0 13 0 0 0 5 26 0 2 0.51

B63 26 1 0 0 0 0 0 6 0 1 1 0 3 66 11 0.57

B64 55 0 0 0 0 0 0 4 0 0 0 0 0 3 52 0.46

Precision 0.52 0.83 0.60 0.67 0.61 0.67 1.00 0.57 0.86 0.75 0.52 0.52 0.46 0.59 0.44

erage. Table 2 depicts the confusion matrix on the

training data. We enhance the matrix by including the

precision and recall for each class.

5 CONCLUSIONS & FUTURE

WORK

This work presents the preliminary results of our on

going study on creating a new ﬁrst-person dataset on

pedestrians barriers while walking on sidewalks. To

the best of our knowledge, this is the ﬁrst dataset ded-

icated on sidewalks barriers. Currently the dataset

consists of 1796 images including 3156 instances of

barriers categorized into 24 different classes. The per-

formance of the dataset was evaluated using convolu-

tional networks for object classiﬁcation. The VGG-

16 architecture was used as a baseline classiﬁer for

15 barriers objects.

Future work involves collecting more images es-

pecially for classes with small number of instances

and repeating the annotation process by two different

annotators, as well as, using segmentation masks to

ﬁnalize the dataset. Additionally, more experiments

will be conducted to evaluate the performance of the

ﬁnal dataset as a training set in deep learning schemes

for both detection and recognition tasks. Last but not

least, the future work includes the preparation of the

dataset to become publicly available and its dissemi-

nation to the research community.

The ultimate aim of this work is to develop an

accurate real time pedestrian barrier detection sys-

tem that will be incorporated in an integrated smart

city platform that aims to provide services and en-

hanced safety for pedestrians. Our belief is that this

system can be adopted by Municipalities and can be

used as an immediate report system that can commu-

nicate any safety concerns for pedestrians for them to

VISAPP 2020 - 15th International Conference on Computer Vision Theory and Applications

664

be repaired, thus working towards achieving the goals

set by the World Health Organization on reducing the

deaths of pedestrians.

ACKNOWLEDGEMENTS

This project has received funding from the European

Unions Horizon 2020 research and innovation pro-

gramme under grant agreement No 739578 comple-

mented by the Government of the Republic of Cyprus

through the Directorate General for European Pro-

grammes, Coordination and Development.

REFERENCES

Bano, S., Suveges, T., Zhang, J., and McKenna, S. (2018).

Multimodal egocentric analysis of focused interac-

tions. IEEE Access, 6:1–13.

Bola

nos, M., Dimiccoli, M., and Radeva, P. (2015).

Toward storytelling from visual lifelogging: An

overview. IEEE Transactions on Human-Machine

Systems, 47:77–90.

Bolaos, M. and Radeva, P. (2016). Simultaneous food lo-

calization and recognition. In 23rd International Con-

ference on Pattern Recognition (ICPR), pages 3140–

3145.

Bullock, I. M., Feix, T., and Dollar, A. M. (2015). The yale

human grasping dataset: Grasp, object, and task data

in household and machine shop environments. The In-

ternational Journal of Robotics Research, 34(3):251–

255.

Chollet, F. et al. (2015). Keras. https://keras.io.

Climent-Prez, P., Spinsante, S., Michailidis, A., and Flrez-

Revuelta, F. (2020). A review on video-based active

and assisted living technologies for automated lifelog-

ging. Expert Systems with Applications, 139:112847.

Damen, D., Doughty, H., Farinella, G. M., Fidler, S.,

Furnari, A., Kazakos, E., Moltisanti, D., Munro, J.,

Perrett, T., Price, W., and Wray, M. (2018). Scaling

egocentric vision: The epic-kitchens dataset. In Euro-

pean Conference on Computer Vision (ECCV).

Dutta, A. and Zisserman, A. (2019). The VIA annotation

software for images, audio and video. arXiv preprint

arXiv:1904.10699.

Everingham, M., Van Gool, L., Williams, C. K., Winn, J.,

and Zisserman, A. (2007). The pascal visual object

classes challenge 2007 (voc2007) results.

Furnari, A., Farinella, G. M., and Battiato, S. (2017). Rec-

ognizing personal locations from egocentric videos.

IEEE Transactions on Human-Machine Systems,

47(1):6–18.

Gopalakrishnan, K. (2018). Deep learning in data-driven

pavement image analysis and automated distress de-

tection: A review. Data, 3(3):28.

Gu, J., Wang, Z., Kuen, J., Ma, L., Shahroudy, A., Shuai, B.,

Liu, T., Wang, X., Wang, G., Cai, J., et al. (2018). Re-

cent advances in convolutional neural networks. Pat-

tern Recognition, 77:354–377.

Gurrin, C., Joho, H., Hopfgartner, F., Zhou, L., and Albatal,

R. (2016). Overview of ntcir-12 lifelog task.

He, K., Zhang, X., Ren, S., and Sun, J. (2016). Deep resid-

ual learning for image recognition. In Proceedings of

the IEEE conference on computer vision and pattern

recognition, pages 770–778.

Herruzo, P., Portell, L., Soto, A., and Remeseiro, B.

(2017). Analyzing ﬁrst-person stories based on so-

cializing, eating and sedentary patterns. In Battiato,

S., Farinella, G. M., Leo, M., and Gallo, G., edi-

tors, New Trends in Image Analysis and Processing

– ICIAP 2017.

Jain, S. and Gruteser, M. (2017). Recognizing textures

with mobile cameras for pedestrian safety applica-

tions. IEEE Transactions on Mobile Computing, PP.

Jiang, B., Yang, J., Lv, Z., and Song, H. (2019). Wearable

vision assistance system based on binocular sensors

for visually impaired users. IEEE Internet of Things

Journal, 6(2):1375–1383.

Krizhevsky, A., Sutskever, I., and Hinton, G. E. (2012). Im-

agenet classiﬁcation with deep convolutional neural

networks. In Advances in neural information process-

ing systems, pages 1097–1105.

LeCun, Y., Bottou, L., Bengio, Y., Haffner, P., et al. (1998).

Gradient-based learning applied to document recogni-

tion. Proceedings of the IEEE, 86(11):2278–2324.

Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P.,

Ramanan, D., Doll

ar, P., and Zitnick, C. L. (2014).

Microsoft coco: Common objects in context. In Euro-

pean conference on computer vision, pages 740–755.

Maeda, H., Sekimoto, Y., Seto, T., Kashiyama, T., and

Omata, H. (2018). Road damage detection and classi-

ﬁcation using deep neural networks with smartphone

images: Road damage detection and classiﬁcation.

Computer-Aided Civil and Infrastructure Engineer-

ing, 33.

Mayol, W. W. and Murray, D. W. (2005). Wearable hand

activity recognition for event summarization. In Ninth

IEEE International Symposium on Wearable Comput-

ers (ISWC’05), pages 122–129.

Nesoff, E., Porter, K., Bailey, M., and Gielen, A. (2018).

Knowledge and beliefs about pedestrian safety in an

urban community: Implications for promoting safe

walking. Journal of Community Health, 44.

Oliveira-Barra, G., Bolaos, M., Talavera, E., Gelonch, O.,

Garolera, M., and Radeva, P. (2019). Lifelog retrieval

for memory stimulation of people with memory im-

pairment, pages 135–158.

Prathiba, T., Thamaraiselvi, M., Mohanasundari, M., and

Veerelakshmi, R. (2015). Pothole detection in road us-

ing image processing. International Journal of Man-

agement, Information Technology and Engineering,

3(4):13–20.

Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S.,

Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bern-

stein, M., et al. (2015). Imagenet large scale visual

A First-person Database for Detecting Barriers for Pedestrians

665

recognition challenge. International journal of com-

puter vision, 115(3):211–252.

Sas-Bojarska, A. and Rembeza, M. (2016). Planning the

city against barriers. enhancing the role of public

spaces. Procedia Engineering, 161:1556 – 1562.

Silva, A. R., Pinho, M., Macedo, L., and Moulin, C. (2016).

A critical review of the effects of wearable cameras on

memory. Neuropsychological rehabilitation, 28:1–25.

Simonyan, K. and Zisserman, A. (2014). Very deep con-

volutional networks for large-scale image recognition.

arXiv preprint arXiv:1409.1556.

Singh, K. K., Fatahalian, K., and Efros, A. A. (2016). Kr-

ishnacam: Using a longitudinal, single-person, ego-

centric dataset for scene understanding tasks. In 2016

IEEE Winter Conference on Applications of Computer

Vision (WACV), pages 1–9.

Szarvas, M., Yoshizawa, A., Yamamoto, M., and Ogata, J.

(2005). Pedestrian detection with convolutional neural

networks. pages 224 – 229.

Theodosiou, Z. and Lanitis, A. (2019). Visual lifelogs re-

trieval: State of the art and future challenges. In 2019

14th International Workshop on Semantic and Social

Media Adaptation and Personalization (SMAP).

Timmermans, C., Alhajyaseen, W., Reinolsmann, N., Naka-

mura, H., and Suzuki, K. (2019). Trafﬁc safety culture

of professional drivers in the state of qatar. IATSS Re-

search.

Tung, Y. and Shin, K. G. (2018). Use of phone sensors to

enhance distracted pedestrians safety. IEEE Transac-

tions on Mobile Computing, 17(6):1469–1482.

Wang, T., Cardone, G., Corradi, A., Torresani, L., and

Campbell, A. T. (2012). Walksafe: A pedestrian safety

app for mobile phone users who walk and talk while

crossing roads. In Proceedings of the Twelfth Work-

shop on Mobile Computing Systems & Applica-

tions, HotMobile ’12, pages 5:1–5:6.

WHO (2019). Pedestrian safety: a road safety manual

for decision-makers and practitioners. https://www.

who.int/roadsafety/projects/manuals/pedestrian/en/.

Accessed: 2019-07-30.

Zhang, L., Yang, F., Zhang, Y., and Zhu, Y. (2016). Road

crack detection using deep convolutional neural net-

work.

VISAPP 2020 - 15th International Conference on Computer Vision Theory and Applications

666