Development of a Realistic Crowd Simulation Environment for

Fine-Grained Validation of People Tracking Methods

Paweł Foszner

1,∗ a

, Agnieszka Szcz˛esna

1,∗ b

, Luca Ciampi

3,† c

, Nicola Messina

3,† d

Adam Cygan

5,§

, Bartosz Bizo

5,§

, Michał Cogiel

4,‡ e

, Dominik Golba

4,‡ f

, El

zbieta Macioszek

2,∗ g

and Michał Staniszewski

1,∗,∗∗ h

Department of Computer Graphics, Vision and Digital Systems, Faculty of Automatic Control,

Electronics and Computer Science, Silesian University of Technology, Akademicka 2A, 44-100 Gliwice, Poland

Department of Transport Systems, Trafﬁc Engineering and Logistics, Faculty of Transport and Aviation Engineering,

Silesian University of Technology, Krasi´nskiego 8, 40-019 Katowice, Poland

Institute of Information Science and Technologies, National Research Council, Via G. Moruzzi 1, 56124 Pisa, Italy

Blees sp. z o.o. Zygmunta Starego 24a/10, 44-100 Gliwice, Poland

QSystems.pro sp. z o.o. Mochnackiego 34, 41-907 Bytom, Poland

∗

Keywords:

Crowd Simulation, Realism Enhancement, People and Car Simulation, People Tracking, Deep Learning.

Abstract:

Generally, crowd datasets can be collected or generated from real or synthetic sources. Real data is generated

by using infrastructure-based sensors (such as static cameras or other sensors). The use of simulation tools

can signiﬁcantly reduce the time required to generate scenario-speciﬁc crowd datasets, facilitate data-driven

research, and next build functional machine learning models. The main goal of this work was to develop

an extension of crowd simulation (named CrowdSim2) and prove its usability in the application of people-

tracking algorithms. The simulator is developed using the very popular Unity 3D engine with particular

emphasis on the aspects of realism in the environment, weather conditions, trafﬁc, and the movement and

models of individual agents. Finally, three methods of tracking were used to validate generated dataset: IOU-

Tracker, Deep-Sort, and Deep-TAMA.

1 INTRODUCTION

Using real crowd datasets can produce effective and

reliable learning models, useful in the following ap-

plications such as object tracking (Cafarelli et al.,

2022) (Lin et al., 2017), image segmentation (Bolya

et al., 2019) (Chen et al., 2018), visual object counting

(Ciampi et al., 2022c) (Avvenuti et al., 2022) (Ciampi

et al., 2022a), individuals activity or violence recog-

nition (Ciampi et al., 2022b; Foszner et al., 2022),

crowd anomaly detection and prediction and wider

crowd management solutions monitor. However, ac-

https://orcid.org/0000-0001-5491-9096

https://orcid.org/0000-0002-4354-8258

https://orcid.org/0000-0002-6985-0439

https://orcid.org/0000-0003-3011-2487

https://orcid.org/0000-0002-9776-9654

https://orcid.org/0000-0002-4542-3547

https://orcid.org/0000-0002-1345-0022

https://orcid.org/0000-0001-9659-7451

∗∗

Corresponding author

quiring real crowd data faces several challenges, in-

cluding the expensive installation of a sensory infras-

tructure, the data pre-processing costs, and the lack

of real datasets that cover particular crowd scenarios.

Consequently, simulation tools have been adopted for

generating synthetic datasets to overcome the chal-

lenges associated with their real counterparts. Using

simulation tools that can signiﬁcantly reduce the time

required to generate scenario-speciﬁc crowd datasets,

mimic observed crowds in a realistic environment, fa-

cilitate data-driven research, and build functional ma-

chine learning models (Khadka et al., 2019; Ciampi

et al., 2020) based on generated data. Simulation of-

fers ﬂexibility in adjusting the scenarios, and gener-

ating and reproducing datasets with deﬁned require-

ments.

The main motivation for that work was to imple-

ment more realistic crowd simulation with additional

features that can be applied in many modern artiﬁ-

cial intelligence approaches (including the evaluation

of people tracking algorithms). The proposed crowd

simulator has the following advantages:

222

Foszner, P., Szcz˛esna, A., Ciampi, L., Messina, N., Cygan, A., Bizo

n, B., Cogiel, M., Golba, D., Macioszek, E. and Staniszewski, M.

Development of a Realistic Crowd Simulation Environment for Fine-Grained Validation of People Tracking Methods.

DOI: 10.5220/0011691500003417

In Proceedings of the 18th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theor y and Applications (VISIGRAPP 2023) - Volume 1: GRAPP, pages

222-229

ISBN: 978-989-758-634-7; ISSN: 2184-4321

 2023 by SCITEPRESS – Science and Technology Publications, Lda. Under CC license (CC BY-NC-ND 4.0)

Figure 1: The visualization of tracking of multiple pedestrians in images generated by application of CrowdSim2.

• realism enhancement by application of motion

matching, people and car generation,

• automatic assessment of ground truth and detec-

tion (Figure 1) in the known format of MOT Chal-

lenge (Dendorfer et al., 2020),

• simulation placed in 3 different localisations (with

3 views) for people movement and 2 localisations

for cars,

• introduction of 4 realistic weather conditions in-

cluding sun, fog, rain, snow, and different day

time,

• many possible options for application including

object detection and tracking, action detection,

and recognition.

2 RELATED WORKS

Databases containing visual data are needed to de-

velop the detection and tracking method, including

cars and pedestrians. For that purpose the annotations

of traced objects (like a pedestrian) which includes an

approximate bounding box are necessary. The Mall

dataset (Chen et al., 2012) was collected from a pub-

licly accessible webcam with ground truth consist-

ing of annotating 60,000 pedestrians. NWPU (Wang

et al., 2020) includes approximately 5,000 images

and 2,133,375 annotated heads. JHU-CROWD++

(Sindagi et al., 2020) is another crowd dataset cap-

tured in different scenarios, and geographical loca-

tions, under weather conditions such as fog, haze,

snow, and rain. JHU-CROWD++ provides head-level

labeling which includes an approximate bounding

box. The GTA5 Crowd Counting (GCC) (Wang et al.,

2019) is an example of a large-scale visual synthetic

dataset (15,212 images, 7,625,843 persons) generated

using the well-known video game GTA5 (Grand Theft

Auto 5). AGORASET (Courty et al., 2014) is also a

visual synthetic dataset for crowd video analysis. For

a comprehensive overview of databases and simula-

tors, see the following review articles (Bamaqa et al.,

2022; Lemonari et al., 2022; Van Toll and Pettré,

2021; Yang et al., 2020). In (Amirian et al., 2020)

the analysis of the statistical properties of real word

datasets is available. Recent advancements in crowd

simulation unravel a wide range of functionalities for

virtual agents, delivering highly-realistic, natural vir-

tual crowds.

In the following work, simulated data will be

used for the evaluation of different tracking algo-

rithms (Staniszewski et al., 2016). Here the concept

of tracking by detection will be used, which means

that detection will be available along with simulated

data, and afterward tracking algorithms are applied

for joining detections in tracks. Additionally, instead

of whole-person detection, facial recognition could be

applied (P˛eszor et al., 2016). The ﬁrst considered

algorithm IOU-tracker was presented in (Bochinski

et al., 2018) and it doesn’t use any image informa-

tion, which allows it to run simpler tracking algo-

rithms. Thanks to this non-image approach, it uses

much less computing power than other trackers. The

authors of the Deep-Sort method (Wojke and Bewley,

2018) - presented Simple Online and Realtime Track-

ing with a Deep Association Metric as a tracking-

by-detection method. Deep-Sort is an extension of

the SORT algorithm (Wojke et al., 2017) made to in-

tegrate appearance information based on a deep ap-

pearance descriptor. The Deep-TAMA method (Yoon

et al., 2019)– which stands for Deep Temporal Ap-

pearance Matching Association contrary to Deep-Sort

and IOU-tracker during one stage performs tracking

together with evaluation of the results. Another group

of SORT applications - Observation-Centric SORT

(Bewley et al., 2016) - is used for a multiple object

tracker. OC-SORT was built to ﬁx limitations in the

Kalman ﬁlter and SORT algorithm. It is an online

tracker and it has improved non-linear motion and ro-

bustness over occlusion. For wide application, the

framework MMtracking (Contributors, 2020) was es-

tablished which is an open-source video perception

toolbox by PyTorch.

Development of a Realistic Crowd Simulation Environment for Fine-Grained Validation of People Tracking Methods

223

Figure 2: Exemplary views from CrowdSim2: junction and park with moving pedestrians and cars along with examples of

snow and rain weather conditions that can be generated.

Table 1: Information summarizing a number of folders, sec-

onds, and frames of data for different weather conditions.

Number of folders seconds frames

Sun 2899 86 970 2 174 250

Rain 1633 48 990 1 224 750

Fog 1653 49 590 1 239 750

Snow 1646 49 380 1 234 500

3 CROWD SIMULATOR

The proposed crowd simulator CrowdSim2

the next version of crowd simulator CrowdSim

(Staniszewski et al., 2020) especially for testing

multi-object tracking algorithms but also for action

and object detection. It uses the microscopic (or

‘agent-based’) crowd simulation methods that model

the behavior of each person, from which collective be-

havior can then emerge (Saeed et al., 2022; Van Toll

and Pettré, 2021). The simulator is developed us-

ing the very popular Unity 3D engine with particu-

lar emphasis on the aspects of realism in the environ-

ment, weather conditions, trafﬁc, and the movement

and models of individual agents. The proposed sys-

The dataset is freely available in the Zenodo Reposi-

tory at https://doi.org/10.5281/zenodo.7262220

tem can be used to generate a sequence of random

images (datasets) for use in tracking and object detec-

tion algorithms evaluation but also in the crowd, car

counting, and other crowd and trafﬁc analysis tasks.

The generated output data is in the format of the

MOT challenge. The most important components of

CrowdSim2, to support realism when rendering the

resulting image, are described below.

3.1 Agents Motion and Interactions

The component necessary for producing lively and re-

alistic virtual crowds is animating the characters, thus

creating 3D moving agents. Data-driven approaches

include methods utilizing motion capture data to use

during skeleton-based animation of 3D human mod-

els (Wereszczy

nski et al., 2021). This approach re-

quires many variations of data to represent move-

ments in different activities. To ensure the universal-

ity of the system and to generate animations based

on real human motion data, a motion matching al-

gorithm was used (Clavet, 2016). Motion matching

is an alternative animation system without the need

for a state machine with vectors given in Figure 3.

Thanks to this, it is possible to perform different ac-

tivities at the agent level, including dancing or ﬁghts.

In the future, it is planned to use the learned motion

matching algorithm (Holden et al., 2020) with addi-

GRAPP 2023 - 18th International Conference on Computer Graphics Theory and Applications

224

tional styling (Aberman et al., 2020; Holden et al.,

2017). Currently, we only have two styles of move-

ment male and female, which rely on separate motion

databases. The interactions are carried out based on

interaction zones. These zones are placed in the city

and deﬁne the type of interaction (for example dance,

ﬁght). When an agent enters the zone, a set of con-

ditions is checked, and, depending on the situation,

the agent is either ignored interaction, added to the

queue, or starts interaction (see Figure 6). If an agent

is added to the queue, there are not enough agents in

the zone to start interacting. The agent moves nor-

mally while waiting in the queue, and if he moves too

far away from the zone before the interaction begins,

he is removed from the zone queue.

Figure 3: Agents movement concerning the concept of mo-

tion matching and system of features.

3.2 Photo-Realistic Rendering and

Trafﬁc

The main element is physical volumetric light that re-

sponds adequately to dynamically changing surround-

ings. There is a dynamic volumetric fog in the sim-

ulation. Additionally, the simulation also includes

snow and rain based on particle effects (Figure 2). Fi-

nally, thanks to the use of the High Deﬁnition Render

Pipeline (HDRP) in Unity engine and physical cam-

eras, it is possible to map the lens and matrix settings

of the real camera to create photo-realistic output im-

ages. For the global agent movement on a macro

scale, the already built-in NavMesh was used. On

the micro-scale at the beginning of the study of hu-

man behavior, Emergent human behavior in Naviga-

tion was created by the application of the unique sys-

tem of features. Cars can park in randomly selected

parking bays as visible in Figure 4. At the crossing,

they also choose a random direction. Trafﬁc is based

on a created system of nodes located on the roads,

particularly crossing, sharp turns, and parking places.

Figure 4: The navigation system for car movement with

parking area and pedestrian stops.

4 RESULTS AND DISCUSSION

4.1 Collected Dataset

CrowdSim2 was used as the main tool for the genera-

tion of many different simulated situations in 3 main

places in the virtual city. All places were observed

by 3 cameras directed at the same position to get the

general view from different angles. All videos were

recorded in the resolution of 800x600 in 25 frames

per second. Each situation lasted 30 seconds - 750

pictures were recorded. The number of pedestrians

varied from 1 to 160, and situations were inﬂuenced

by weather conditions such as sun, rain, fog, and

snow. The dataset was generated in the format of

MOT Challenge. The number of generated videos,

folders and frames is presented in Table 1.

4.2 Results

The obtained dataset was validated in two different

scenarios: 1) by veriﬁcation of the inﬂuence of the

crowd density on the accuracy of tracking methods

and 2) by application of 4 different weather con-

ditions utilizing clear sunny weather (with just re-

sulting sun reﬂections), rain and snow with cloudy

weather inﬂuencing also background of the scene and

fog day. Finally, 3 methods of tracking were used to

test generated dataset: IOU-Tracker (Bochinski et al.,

2018), Deep-Sort (Wojke and Bewley, 2018) and

Deep-TAMA (Yoon et al., 2019). Presented methods

were chosen under two conditions - ﬁrst the availabil-

ity of open source code and second ﬁnite time of exe-

cution. All methods were applied in MOT Challenge

ranking. The given evaluation was divided into two

separate subsections - 1) validation of tracking meth-

ods in changing the number of people in simulation

and 2) veriﬁcation of different weather conditions.

Development of a Realistic Crowd Simulation Environment for Fine-Grained Validation of People Tracking Methods

225

(a) MOTA for varying crowd density. (b) Varying density for MOTA on IOU-tracker.

Figure 5: Results of investigated tracking methods (IOU-tracker, Deep-Sort and Deep-TAMA) on varying crowd density data

(from 1 - 160 agents) on MOTA parameter.

Figure 6: Random animated interaction between agents

generated in CrowdSim2.

Methods were tested for the chosen parameters ap-

plied in MOT Challenge: a) MOTA - Multiple-Object

Tracking Accuracy and b) IDs - ID switches.

Crowd simulation was ﬁrst run on the different

numbers of people, which varied from 1 to 160.

Thanks to that it was possible to verify what is the

inﬂuence of the number of people in tracking results.

The result of the comparison is presented in Figure 5

in the form of inﬂuence on the MOTA parameter and

also the distribution of results for the set of people. In

the second step, data were divided into weather con-

ditions - sun, rain, fog, and snow. Here it is also pos-

sible to distinguish differences in the results of meth-

ods. The ﬁnal score was presented in a few aspects

- in the form of boxplots for each weather condition

Table 2: Mean and standard deviation for tracking method

results concerning exemplary evaluation parameters MOTA

(that should be high) and Ids (that should be low).

MOTA ↑ Ids ↓

Deep-Sort

Avg 96.20 362.05

STD 2.64 436.04

IOU-tracker

Avg 99.74 13.07

STD 0.36 159.64

Deep-TAMA

Avg 74.58 241.41

STD 13.36 349.84

on the MOTA parameter (Figure 7) and on different

parameters in Figure 8 and Table 2.

4.3 Discussion

The generated data was used for two different valida-

tions of tracking methods. On one side crowd den-

sity can be the ﬁrst point of analysis. For the MOTA

parameter, it can be observed which method can ob-

tain better results. In all cases, the IOU-tracker gen-

erates better results which rely mainly on the assump-

tion that automatically all detections are given at once.

Deep-Sort has slightly worse results because it takes

also image context into consideration. Deep-TAMA

fails in the case of simulated data due to the size of

smaller detections. It has to be mentioned that the

GRAPP 2023 - 18th International Conference on Computer Graphics Theory and Applications

226

(a) Fog weather condition. (b) Rain weather condition.

Figure 7: The inﬂuence of different weather conditions (fog, rain, snow, and sun) on the MOTA parameter for exemplary data

from CrowdSim2 presented in the form of boxplots.

number of people is not uniformly distributed - the

generation was run in the speciﬁc number of pedes-

trians but not always all detections were present and

because of it for 160 pedestrians, not so many trails

we could obtain.

In terms of weather conditions, the order of accu-

racy of methods ﬁts the crowd density analysis. The

most challenging conditions are given for snow. It

may result from the inﬂuence of remaining snow on

the background. The same trend is visible and sat-

isﬁed for different parameters from the MOTA chal-

lenge. Still in that case the best results are achieved

by the IOU-tracker taking into consideration just the

bounding box position. In fact to get the conclusion

regarding validation both approaches can be used and

still conclusion of which method can give better track-

ing is possible. In all cases, in terms of MOTA, it can

be observed that a better method generates higher re-

sults. On the other hand for IDs, that value should be

lowered.

Crowd simulations and direct connection to a

graphical engine allow the generation of very accurate

detections (bounding boxes) for all visible pedestri-

ans and on each frame. After validation performance,

it could be noticed that generation of all detections

has some disadvantages. The ﬁrst drawback lies in

the number of detections and their size - that means

that also very small pedestrians can be generated and

put in the ground truth. On the other hand, in prac-

tice, obtaining detection for each frame is mostly not

possible. Due to evaluation processing, any changes

in that protocol were not submitted but in the future

generated data could be randomly disturbed in terms

of the number of detections. On the other hand on

generated data, some other detection methods could

be applied to produce not ideal detections.

5 SUMMARY

In the presented work an extension of crowd simula-

tion CrowdSim2 was introduced with many advanced

features applied for the reality enhancement of gener-

ated results. To show the practical application of gen-

erated simulated data, tracking methods were run for

evaluation purposes. Algorithms were tested concern-

ing the crowd density and weather conditions show-

ing differences in ﬁnal results and ordering accuracy

of methods. The obtained results conﬁrmed that syn-

thetic data from CrowdSim2 can be used in the val-

idation process for many scenarios without the need

for real data. Besides tracking algorithms it can be ap-

plied for object detection, action detection, and recog-

nition, as a part of the testing procedure, and also in

the training of machine learning algorithms. In the fu-

ture, generated simulated dataset can be enhanced by

the used of post-processing methods to improve real-

ity.

Development of a Realistic Crowd Simulation Environment for Fine-Grained Validation of People Tracking Methods

227

(a) MOTA - Multiple-Object Tracking Accuracy. (b) IDs - ID switches.

Figure 8: The impact of weather conditions in the validation of tracking methods for different evaluation parameters for the

dataset from CrowdSim2. For MOTA the obtained results should be maximized and for IDs, the values should be lowered.

ACKNOWLEDGMENTS

This work was supported by: European Union

funds awarded to Blees Sp. z o.o. under grant

POIR.01.01.01-00-0952/20-00 “Development of a

system for analysing vision data captured by pub-

lic transport vehicles interior monitoring, aimed at

detecting undesirable situations/behaviours and pas-

senger counting (including their classiﬁcation by age

group) and the objects they carry”); EC H2020

project “AI4media: a Centre of Excellence deliv-

ering next generation AI Research and Training at

the service of Media, Society and Democracy” un-

der GA 951911; research project (RAU-6, 2020) and

projects for young scientists of the Silesian University

of Technology (Gliwice, Poland); research project

INAROS (INtelligenza ARtiﬁciale per il mOnitorag-

gio e Supporto agli anziani), Tuscany POR FSE CUP

B53D21008060008. Publication supported under the

Excellence Initiative - Research University program

implemented at the Silesian University of Technol-

ogy, year 2022. This research was supported by the

European Union from the European Social Fund in

the framework of the project "Silesian University of

Technology as a Center of Modern Education based

on research and innovation” POWR.03.05.00- 00-

Z098/17 We are thankful for students participating in

design of Crowd Simulator: Piotr Bartosz, Stanisław

Wróbel, Marcin Wola, Angelika Gluch and Marek

Matuszczyk.

REFERENCES

Aberman, K., Weng, Y., Lischinski, D., Cohen-Or, D., and

Chen, B. (2020). Unpaired motion style transfer from

video to animation. ACM Transactions on Graphics

(TOG), 39(4):64–1.

Amirian, J., Zhang, B., Castro, F. V., Baldelomar, J. J.,

Hayet, J.-B., and Pettré, J. (2020). Opentraj: As-

sessing prediction complexity in human trajectories

datasets. In Proceedings of the Asian Conference on

Computer Vision.

Avvenuti, M., Bongiovanni, M., Ciampi, L., Falchi, F., Gen-

naro, C., and Messina, N. (2022). A spatio- tempo-

ral attentive network for video-based crowd counting.

In IEEE Symposium on Computers and Communica-

tions, ISCC 2022, Rhodes, Greece, June 30 - July 3,

2022, pages 1–6. IEEE.

Bamaqa, A., Sedky, M., Bosakowski, T., Bastaki, B. B., and

Alshammari, N. O. (2022). Simcd: Simulated crowd

data for anomaly detection and prediction. Expert Sys-

tems with Applications, 203:117475.

Bewley, A., Ge, Z., Ott, L., Ramos, F., and Upcroft, B.

(2016). Simple online and realtime tracking. In 2016

IEEE International Conference on Image Processing

(ICIP), pages 3464–3468.

Bochinski, E., Senst, T., and Sikora, T. (2018). Extend-

ing iou based multi-object tracking by visual infor-

mation. In IEEE International Conference on Ad-

vanced Video and Signals-based Surveillance, pages

441–446, Auckland, New Zealand.

Bolya, D., Zhou, C., Xiao, F., and Lee, Y. J. (2019).

YOLACT: Real-time instance segmentation. In 2019

IEEE/CVF International Conference on Computer Vi-

sion (ICCV). IEEE.

Cafarelli, D., Ciampi, L., Vadicamo, L., Gennaro, C.,

Berton, A., Paterni, M., Benvenuti, C., Passera, M.,

and Falchi, F. (2022). MOBDrone: A drone video

dataset for man OverBoard rescue. In Image Anal-

ysis and Processing – ICIAP 2022, pages 633–644.

Springer International Publishing.

Chen, K., Loy, C. C., Gong, S., and Xiang, T. (2012). Fea-

ture mining for localised crowd counting. In Bmvc,

volume 1, page 3.

Chen, L.-C., Papandreou, G., Kokkinos, I., Murphy, K., and

Yuille, A. L. (2018). DeepLab: Semantic image seg-

mentation with deep convolutional nets, atrous con-

volution, and fully connected CRFs. IEEE Transac-

tions on Pattern Analysis and Machine Intelligence,

40(4):834–848.

Ciampi, L., Carrara, F., Totaro, V., Mazziotti, R., Lupori,

GRAPP 2023 - 18th International Conference on Computer Graphics Theory and Applications

228

L., Santiago, C., Amato, G., Pizzorusso, T., and Gen-

naro, C. (2022a). Learning to count biological struc-

tures with raters’ uncertainty. Medical Image Analy-

sis, 80:102500.

Ciampi, L., Foszner, P., Messina, N., Staniszewski, M.,

Gennaro, C., Falchi, F., Serao, G., Cogiel, M., Golba,

D., Szcz˛esna, A., and Amato, G. (2022b). Bus vio-

lence: An open benchmark for video violence detec-

tion on public transport. Sensors, 22(21).

Ciampi, L., Gennaro, C., Carrara, F., Falchi, F., Vairo, C.,

and Amato, G. (2022c). Multi-camera vehicle count-

ing using edge-AI. Expert Systems with Applications,

207:117929.

Ciampi, L., Messina, N., Falchi, F., Gennaro, C., and Am-

ato, G. (2020). Virtual to real adaptation of pedestrian

detectors. Sensors, 20(18):5250.

Clavet, S. (2016). Motion matching and the road to next-

gen animation. In Proc. of GDC, volume 2016.

Contributors, M. (2020). MMTracking: OpenMMLab

video perception toolbox and benchmark. https://

github.com/open-mmlab/mmtracking.

Courty, N., Allain, P., Creusot, C., and Corpetti, T. (2014).

Using the agoraset dataset: Assessing for the quality

of crowd video analysis methods. Pattern Recognition

Letters, 44:161–170.

Dendorfer, P., Rezatoﬁghi, H., Milan, A., Shi, J., Cremers,

D., Reid, I., Roth, S., Schindler, K., and Leal-Taixé, L.

(2020). Mot20: A benchmark for multi object track-

ing in crowded scenes. arXiv:2003.09003[cs]. arXiv:

2003.09003.

Foszner, P., Staniszewski, M., Szcz˛esna, A., Cogiel, M.,

Golba, D., Ciampi, L., Messina, N., Gennaro, C.,

Falchi, F., Amato, G., and Serao, G. (2022). Bus Vio-

lence: a large-scale benchmark for video violence de-

tection in public transport.

Holden, D., Habibie, I., Kusajima, I., and Komura, T.

(2017). Fast neural style transfer for motion data.

IEEE computer graphics and applications, 37(4):42–

49.

Holden, D., Kanoun, O., Perepichka, M., and Popa, T.

(2020). Learned motion matching. ACM Transactions

on Graphics (TOG), 39(4):53–1.

Khadka, A. R., Oghaz, M., Matta, W., Cosentino, M., Re-

magnino, P., and Argyriou, V. (2019). Learning how

to analyse crowd behaviour using synthetic data. In

Proceedings of the 32nd International Conference on

Computer Animation and Social Agents, pages 11–14.

Lemonari, M., Blanco, R., Charalambous, P., Pelechano,

N., Avraamides, M., Pettré, J., and Chrysanthou, Y.

(2022). Authoring virtual crowds: A survey. In Com-

puter Graphics Forum, volume 41, pages 677–701.

Wiley Online Library.

Lin, T.-Y., Goyal, P., Girshick, R., He, K., and Dollar, P.

(2017). Focal loss for dense object detection. In 2017

IEEE International Conference on Computer Vision

(ICCV). IEEE.

P˛eszor, D., Staniszewski, M., and Wojciechowska, M.

(2016). Facial reconstruction on the basis of video

surveillance system for the purpose of suspect iden-

tiﬁcation. In Nguyen, N. T., Trawi

nski, B., Fujita,

H., and Hong, T.-P., editors, Intelligent Information

and Database Systems, pages 467–476, Berlin, Hei-

delberg. Springer Berlin Heidelberg.

Saeed, R. A., Recupero, D. R., and Remagnino, P. (2022).

Simulating crowd behaviour combining both micro-

scopic and macroscopic rules. Information Sciences,

583:137–158.

Sindagi, V., Yasarla, R., and Patel, V. M. (2020). Jhu-

crowd++: Large-scale crowd counting dataset and a

benchmark method. IEEE Transactions on Pattern

Analysis and Machine Intelligence.

Staniszewski, M., Foszner, P., Kostorz, K., Michalczuk,

A., Wereszczy

nski, K., Cogiel, M., Golba, D., Woj-

ciechowski, K., and Pola

nski, A. (2020). Application

of crowd simulations in the evaluation of tracking al-

gorithms. Sensors, 20(17):4960.

Staniszewski, M., Kloszczyk, M., Segen, J., Wereszczy

nski,

K., Drabik, A., and Kulbacki, M. (2016). Recent de-

velopments in tracking objects in a video sequence. In

Intelligent Information and Database Systems, pages

427–436. Springer Berlin Heidelberg.

Van Toll, W. and Pettré, J. (2021). Algorithms for micro-

scopic crowd simulation: Advancements in the 2010s.

In Computer Graphics Forum, volume 40, pages 731–

754. Wiley Online Library.

Wang, Q., Gao, J., Lin, W., and Li, X. (2020). Nwpu-crowd:

A large-scale benchmark for crowd counting and lo-

calization. IEEE transactions on pattern analysis and

machine intelligence, 43(6):2141–2149.

Wang, Q., Gao, J., Lin, W., and Yuan, Y. (2019). Learn-

ing from synthetic data for crowd counting in the

wild. In Proceedings of the IEEE/CVF conference on

computer vision and pattern recognition, pages 8198–

8207.

Wereszczy

nski, K., Michalczuk, A., Foszner, P., Golba, D.,

Cogiel, M., and Staniszewski, M. (2021). Elsa: Euler-

lagrange skeletal animations - novel and fast motion

model applicable to vr/ar devices. In Computational

Science – ICCS 2021, pages 120–133, Cham. Springer

International Publishing.

Wojke, N. and Bewley, A. (2018). Deep cosine metric learn-

ing for person re-identiﬁcation. In 2018 IEEE Win-

ter Conference on Applications of Computer Vision

(WACV), pages 748–756. IEEE.

Wojke, N., Bewley, A., and Paulus, D. (2017). Simple on-

line and realtime tracking with a deep association met-

ric. In 2017 IEEE International Conference on Image

Processing (ICIP), pages 3645–3649. IEEE.

Yang, S., Li, T., Gong, X., Peng, B., and Hu, J. (2020). A

review on crowd simulation and modeling. Graphical

Models, 111:101081.

Yoon, Y., Kim, D. Y., Yoon, K., Song, Y., and Jeon,

M. (2019). Online multiple pedestrian tracking us-

ing deep temporal appearance matching association.

CoRR, abs/1907.00831.

Development of a Realistic Crowd Simulation Environment for Fine-Grained Validation of People Tracking Methods

229