PanoTherm: Panoramic Thermal Imaging for Object Detection and

Tracking

Thomas Kernbauer

, Philipp Fleck

and Clemens Arth

ICG, Graz University of Technology, Inffeldgasse 16/2, 8010 Graz, Austria

{thomas.kernbauer, philipp.ﬂeck, arth}@icg.tugraz.at

Keywords:

Thermal Imaging, Camera Calibration, Object Detection.

Abstract:

Visible-light cameras are used traditionally in object detection and tracking. Thermal imaging can equally be

used for this purpose, however at the cost of additional calibration efforts, expenses, and limitations concerning

the ﬁeld of view. Still, thermal imaging is advantageous in various scenarios and basically the only plausible

technology to apply in harsh outdoor environments, in which the use of standard RGB cameras is prohibitive

due to low-light conditions or in complete darkness. While panoramic imaging using visible light cameras

is becoming more popular for advanced photography or action recording, limited work has been done on

developing panoramic thermal cameras. In this work, we present the ﬁrst panoramic thermal camera targeting

the constant 360

◦

monitoring of the environment. We describe the calibration and stitching process in detail

and demonstrate how to use the camera in a vehicular scenario. Finally, we give an example of the detection

and tracking of objects and discuss the advantages and disadvantages of thermal imaging for this purpose.

1 INTRODUCTION

In recent years visible-light cameras have become the

state-of-the-art in detecting and tracking humans and

objects. Following a vast amount of development in

the area of Computer Vision and through the hype

around autonomous driving, visible-light cameras are

nowadays the standard to apply in vehicles to increase

safety in road trafﬁc. However, digital aids still lack

reliability, particularly under adverse environmental

conditions, leading to thousands of, partially fatal, ac-

cidents per year.

Additional mirrors on trucks or other heavy ma-

chinery, blind spot notiﬁers on side mirrors, and spec-

iﬁed driver training have not led to the hoped decrease

in accidents. Studies reveal alarming numbers for,

e.g., Germany (Luis Technology, 2023), while more

recent numbers for Austria indicate no signiﬁcant im-

provements concerning blind spot accidents in recent

years (Spitzer, 2023):

• 37 thousand accidents a year with 47 thousand

people involved,

• on average one hundred accidents a day, involving

128 people on average.

https://orcid.org/0009-0000-7539-644X

https://orcid.org/0000-0002-9827-2594

https://orcid.org/0000-0001-6949-4713

While these numbers are arguably high, they only

concern regular trafﬁc accidents and exclude other

similar accidents with heavy machinery like, reach

stackers, snow groomers, equipment on construction

sites, and others.

More visibility, like bright colors (i.e., a warning

vest), mirrors, or visible-light cameras seem to im-

prove this situation, but their usefulness is strongly re-

lated to the environmental conditions. Fog, rain, dark-

ness, snow, dust, and sand will drastically reduce the

ability to quickly identify human beings in or close to

a blind spot.

In this work, we present a novel take on this cat-

egory of problems using a low-cost panoramic ther-

mal camera, PanoTherm, to remove blind spots to

the greatest extent possible. Using a uniform and cal-

ibrated view, we can easily detect and track objects

with certain heat signatures, especially humans. The

application of our proposed development ranges from

disaster scenarios, e.g., identifying ﬁre hazards and

ﬁnding humans (conscious and unconscious), to the

application in areas with bad visibility, e.g., smoky

buildings or off-road, low-light navigation. Mounted

on the roof of a vehicle, persons in typical blind spot

areas are immediately recognized. Furthermore, it is

equally suited for surveillance situations like trespass-

ing or maintenance tasks like pipe inspection. In a

more futuristic use case, our panoramic thermal im-

Kernbauer, T., Fleck, P. and Arth, C.

PanoTherm: Panoramic Thermal Imaging for Object Detection and Tracking.

DOI: 10.5220/0012330400003660

Paper published under CC license (CC BY-NC-ND 4.0)

In Proceedings of the 19th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP 2024) - Volume 4: VISAPP, pages

98-109

ISBN: 978-989-758-679-8; ISSN: 2184-4321

age can also be used within virtual reality (VR) appli-

cations, e.g., being overlaid with a color image of a

traditional panoramic camera.

Thermal and panoramic imaging have been exten-

sively, yet separately studied. With PanoTherm, we

close the gap between these ﬁelds and introduce a

panoramic thermal camera, assembled with low-cost

longwave infrared thermal sensors. Compared to ex-

isting, very expensive panoramic thermal hardware

with a rotating sensor, the major advantage of Pano-

Therm is the uninterrupted omnidirectional view at

comparably high frame rates, coming at a price tag in

the low 5-digit USD area. The proposed algorithms

are largely independent of the underlying hardware

and can therefore be seamlessly applied to a wide

range of devices. Our main contributions are:

• a thermal image calibration workﬂow that allows

the calibration of multiple rigidly assembled ther-

mal sensors with very little preparation;

• a working prototype for an omnidirectional thermal

camera. This includes the 3D printed and assem-

bled camera case as well as the software framework

for stitching the thermal images in real-time with-

out the need for re-calibration;

• a methodology to register the panoramic thermal

camera rig with respect to its environment, giving

measurements a context in the 3D world; and

• an example of object detection and tracking in the

360

◦

thermal domain in a vehicular setup.

Finally, we demonstrate our contributions in the

ﬁeld and track targets not only across the individual

camera views but also estimate the real-world posi-

tion, visualize it in a radar-like manner, and discuss

our ﬁndings in the latter of this work.

2 RELATED WORK

In the following, we quickly review the areas of cam-

era calibration, panoramic imaging, and thermal ob-

ject detection and tracking.

2.1 Camera Calibration

Intrinsically and extrinsically calibrated cameras are

the foundation of any image-based real-world related

measurements. Being a well-studied ﬁeld, almost

every camera model features its own methodology.

Modern takes on single and multi-camera array cal-

ibration were presented by Zhang et al., (Zhang,

2000). The foundation of all algorithms is to es-

tablish correspondences between real-world patterns

Figure 1: Exemplary thermal imagery taken with one of

PanoTherm off-the-shelf thermal seek cameras.

(e.g., checkerboard) and their imagery. By reducing

the measured error, the camera parameters can be es-

timated. In the thermal domain, geometric features

blocking infrared (IR) light have to be exploited to

derive such correspondences (Lag

uela et al., 2011;

Yang et al., 2011; Ellmauthaler et al., 2013). Wired

metal structures in combination with resistors (Ng and

Du, 2005), patterns printed from aluminum (Usamen-

tiaga et al., 2017), heated (Saponaro et al., 2015), or

cooled (Herrmann et al., 2019) structures, lead to ac-

ceptable results for thermal cameras. However, with

PanoTherm we implement a circular cutout board

for the calibration of all cameras, described in Sec-

tion 3.1.

2.2 Thermal Imagery Applications

Thermal cameras are a commonly used alternative

to visible-light cameras due to their independence

from visible light sources and their capacity to sense

through obstructions such as fog or smoke. Gade

and Moeslund (Gade and Moeslund, 2014) demon-

strate advantages in complex environmental scenar-

ios. Furthermore, thermal imaging technology has

led to the development of smaller and more affordable

thermal cameras (Systems, 2011), expanding their us-

ability and impact across different sectors such as in-

dustry, medical, or home automation. Such devices

utilize uncooled microbolometer sensors to capture

long-wave infrared (LWIR) emissions.

Microbolometers contain vanadium oxide or

amorphous silicon resistors, which change their elec-

trical resistance depending on the impact of the elec-

tromagnetic radiance of certain wavelengths.

More speciﬁcally, LWIR sensors can capture in-

frared radiations with wavelengths between 8 − 14

µm. Jewett and Serway (Jewett and Serway, 2008)

mention that, according to Wiens law, the emit-

ted electromagnetic radiance of room temperature

(20

◦

) objects peaks at roughly 9 µm. Applications

reach from agronomy (Yanmaz et al., 2007; Ha-

lachmi et al., 2019), agriculture (Vadivambal and

Jayas, 2011; Messina and Modica, 2020), mainte-

PanoTherm: Panoramic Thermal Imaging for Object Detection and Tracking

Figure 2: Assembly of the panoramic thermal camera in a rectangular shape. Each camera has about 105

◦

horizontal FOV,

leaving an overlapping area between the individual cameras of approximately 15

◦

nance (Hoegner and Stilla, 2009; Venegas et al.,

2022), medical diagnostics (Ring and Ammer, 2012;

Sousa et al., 2017), robotics (Fehlman and Hin-

ders, 2009), surveillance (Kri

sto et al., 2020), to au-

tonomous driving (Rosique et al., 2019).

2.3 Panoramic Imaging

The process of combining images into mosaics is a

long-standing computer vision problem, elaborated

by Milgram (Milgram, 1975) and others. Due to the

advantages of panoramic images, i.e., the increased

ﬁeld-of-view (FOV) and higher resolutions, or the

same frame of reference – the applications are mani-

fold. These include the creation of panoramic scenes

for augmented reality (AR) (Wagner et al., 2010),

scene understanding in the autonomous driving do-

main (Zhang et al., 2021), depth estimation (Zioulis

et al., 2018), visual odometry (Zhang et al., 2016),

or other robotics applications (Cheng et al., 2019).

This versatility is also reﬂected in underlying cam-

era systems. Besides the utilization of smartphones

to take scenic photos, panoramic or 360

◦

images

can be captured by vastly different sensors setups,

e.g., rigid multi-camera arrays (Lin et al., 2018), fast

rotating single cameras (Thomas et al., 2019), ﬁsh-

eye lenses (Ho and Budagavi, 2017), or the usage of

panoramic annular lenses (Powell, 1994).

In the context of thermal imaging, sensor se-

tups resort to rotating single cameras (Thomas et al.,

2019), the application of a catadioptric mirror (Zeng

et al., 2020), or, similar to our approach, a multi-

camera setup (Cowan et al., 2019).

However, all approaches rely on either knowing

rigid transformations between individual cameras or

the pan-tilt-yaw state of acquired images. Such infor-

mation is leveraged to obtain a single panorama.

2.4 Thermal Object Detection

Numerous (semi-) autonomous applications like col-

lision avoidance, obstacle detection, pedestrian detec-

tion, and more, beneﬁt from object detection and ob-

ject tracking. Since this is a well-researched area, we

will only highlight a view of relevant works within the

thermal domain. With the success of deep learning-

based object detection, single-stage object detection

methods such as you only look once (YOLO) (Red-

mon et al., 2016) or single shot detector (SSD) (Liu

et al., 2016) were applied in the thermal domain.

Namely, Krivsto et al., (Kri

sto et al., 2020) employ

YOLO-based methods, and Dai et al., (Dai et al.,

2021) implement a method based on the SSD archi-

tecture. This method called TIRNet, modiﬁes the

original SSD model architecture by extending it with

residual branches, to extract features from the low-

resolution IR images.

The tracking-by-detection (TBD) paradigm suc-

cessfully dominates various multi-object tracking

benchmarks, such as MOT17 (Milan et al., 2016)

or MOT20 (Dendorfer et al., 2020). TBD meth-

ods, e.g., simple online and realtime (SORT) (Bew-

ley et al., 2016)-based trackers or ByteTrack (Zhang

et al., 2022) associate bounding boxes resulting from

the object detection method to create tracks. In Sec-

tion 6, we demonstrate the use of YOLO and Byte-

Track with PanoTherm to precisely detect, track, and

localize humans for accident avoidance.

3 MULTI CAMERA ARRAY

Traditional thermal cameras for small-scale monitor-

ing feature a very limited horizontal ﬁeld-of-view

(FOV) of 15

◦

to 60

◦

. The PanoTherm assembly re-

quires FOVs of at least 105

◦

with four cameras. One

capable and cost-efﬁcient candidate is the Thermal

Seek Mosaic Cores C312NPX

featuring a 2.2mm

lens and 320 × 240 pixels at a frame rate of about

27Hz. The horizontal FOVs measure 105

◦

, yielding a

◦

degree overlap between the views. In Figure 1 we

showcase an exemplary image taken with one of the

Mosaic Core: https://www.thermal.com/mosaic-core-320x240-2mm.html

VISAPP 2024 - 19th International Conference on Computer Vision Theory and Applications

100

utilized thermal cameras. Figure 2 depicts the custom

3D printed enclosure, the rectangular arrangement of

the cameras, and their approximate FOV projections.

Each camera is connected to a USB3-Hub based

on the assembly, such that only a single power and

a single data cable connect the device to a comput-

ing platform (e.g., laptop computer, single board com-

puter). Figure 4 depicts a panoramic image cap-

tured with PanoTherm consisting of the four individ-

ual cameras. Due to the comparably low amount of

data (i.e., only a fraction of modern color images) a

serial USB connection is more than sufﬁcient to cap-

ture images simultaneously.

3.1 Camera Calibration

Because of their shared characteristics and compara-

ble lens geometry, the pinhole camera model (Hartley

and Zisserman, 2003) can be applied to commonly

used thermal cameras. Thus, allowing us to apply the

same geometric and mathematical properties to com-

pute intrinsic properties. Points in images x

∈ R

and the 3D points in the world x

∈ R

are expressed

as homogeneous coordinates, i.e., x

= (x

,1)

⊤

and

= (x

,1)

⊤

. Following the pinhole camera

model, the relation between image and world points is

formalized by the camera projection matrix P ∈ R

3×4

as follows

µ x

= P x

with P = K [R | t]. (1)

Here, µ corresponds to an arbitrary scale factor. The

projection matrix P is composed of the extrinsic pa-

rameters rotation R ∈ R

3×3

and translation vector

t ∈ R

as well as the intrinsic parameters, i.e., the

camera matrix K ∈ R

3×3

. Camera calibration with

a planar calibration pattern uses this relation, by as-

suming the z coordinate of 3D world points to be 0.

Expressing this assumption with Equation (1) yields

µ x

= K





















= H









(2)

Here, H ∈ R

corresponds to the homography, which

relates the real-world calibration points with the de-

tected positions in pixel space. Since H is deﬁned up

to a scale factor, at least 4 point-to-point correspon-

dences are necessary to obtain a solution.

Real cameras deviate from the optimal pinhole

camera model due to imperfections introduced by

their lenses, i.e., the pinhole aperture, as well as x

and x

, being not co-linear. Hence, the mathemat-

ical model has to account for lens distortion. Fol-

lowing (Hartley and Zisserman, 2003; Heikkila and

Figure 3: (Left) The common image calibration approach

utilizing an asymmetric grid in front of an LCD screen.

(Right) Thermal proﬁle of the asymmetric circular calibra-

tion pattern in front of an LCD.

Silv

en, 1997), radial and tangential distortion can be

approximately corrected by

= x

+ F

,δ), (3)

with x

∈ R

being the distorted image pixel, x

∈ R

being the corrected image pixel and F

being a higher

order polynomial with coefﬁcients

δ = (k

,..., p

, p

,. ..)

⊤

. (4)

Given n images with m calibration points each,

Zhang (Zhang, 2000) proposed to apply the

Levenberg-Marquardt algorithm (Mor

e, 1977) for the

maximum-likelihood estimation (MLE), i.e.,

∑

∥x

i j

−

x(K,δ,R

W, j

∥

. (5)

3.2 Pattern Calibration

Pattern-based calibration approaches have success-

fully been shown by Vidas et al., (Vidas et al., 2012),

being a direct inspiration to our approach. We uti-

lize an asymmetric circular calibration pattern. To

detect the pattern in the thermal image, the IR light

is blocked by simple irregular circular cut-outs. The

resulting plate is placed in front of the liquid-crystal

display (LCD) to block off light around the irregular

circles. Figure 3 (left) shows the detected pattern and

Figure 3 (right) the thermal response, where the peaks

are the passed-through IR radiation of the monitor.

Backlights used in modern computer screens are

signiﬁcantly warmer than the ambient room tempera-

ture. Hence, the temperature difference between the

calibration board in front and the screen itself is vis-

ible in the thermal image, consequently allowing us

to apply traditional calibration methods. Commonly

used tools like the OpenCV (Bradski, 2000) can be

directly applied to the calibration process.

PanoTherm: Panoramic Thermal Imaging for Object Detection and Tracking

101

Implementation Details. The pattern is detected

using OpenCV’s findCirclesGrid function. How-

ever, if the blob detection algorithm is not successful,

we simply apply the adaptiveThreshold function.

Passing the resulting image to findCirclesGrid

usually succeeds in extracting the calibration pattern

from the thermal image. Hence, we can make use of

the calibrateCamera function to retrieve a calibra-

tion.

3.3 Image Acquisition

Acquiring images in a ﬁxed multi-camera arrange-

ment poses speciﬁc challenges that do not require ad-

dressing within a single-camera panoramic pipeline,

e.g., synchronization of the utilized camera heads or

bandwidth considerations. Especially in the context

of panoramic images that capture moving objects,

synchronization is highly important to avoid ghosting

and other artifacts.

PanoTherm relies on event-driven camera call-

backs and polling to achieve synchronization. Since

each camera supports frame rates of up to 27 Hz, a

registered call-back function for each camera head

updates the separate data structures once a frame

is ready in an asynchronous manner. To create a

panorama, the data structures are naively polled and

processed as depicted in Figure 6. On modern com-

puter hardware, the cost of storage and processing of

a panorama sequence is negligible. Hence, in our ex-

periments, the synchronicity of the acquired frames

proved to be sufﬁcient as we did not witness artifacts

between frames for moving objects.

4 IMAGE STITCHING

Image stitching is a fundamental computer vision task

that involves combining multiple overlapping images

to create a panoramic scene. The number of differ-

ent algorithms and approaches showcase the long-

standing research interest in this problem (Szeliski

et al., 2007).

In many image stitching workﬂows, e.g., the case

of panoramic imaging from a single camera (Wag-

ner et al., 2010), the camera positions are unknown

and therefore feature tracking and matching, as well

as global image alignment, is necessary to produce a

panoramic image. This is different in a rigid camera

setup. Once calibrated, the panoramic image can be

composed by simply utilizing the previously derived

camera matrices.

4.1 Stereo Calibration

To ﬁnd the extrinsic parameters between cameras,

feature matching is usually employed (Hartley and

Zisserman, 2003). However, in cases with nar-

row, distorted, or low-resolution overlapping regions,

feature-extraction methods such as SIFT (Lowe,

1999) and ORB (Rublee et al., 2011), are not insuf-

ﬁcient to extract congruent features. Therefore, we

utilize easily detectable patterns, i.e., the calibration

pattern depicted in Figure 3, and place them in the

overlapping FOV of adjacent cameras to ﬁnd corre-

spondences. With this, standard stereo calibration can

be applied to estimate the rotation and translation be-

tween the cameras.

Given a point in image i x

= (x

,1)

⊤

and in the

same pixel coordinates which correspond to the point

in image j x

= (x

,1)

⊤

, the relation

= 0 (6)

is leveraged to calculate the extrinsic parameters.

F ∈ R

3×3

corresponds to the fundamental matrix

which encapsulates the algebraic representation of the

epipolar geometry of the scene. We refer the reader to

Hartley and Zisserman (Hartley and Zisserman, 2003)

for more details.

4.2 Cylindrical Warping

The obtained extrinsics are utilized in cylindrical im-

age warping. We chose to warp and blend our ob-

tained scenes, due to its advantages for visualization

and registration. This includes the possibility to eas-

ily unwrap the map on a planar surface and utilize the

geometric properties for our registration purposes.

To warp the images on the cylindrical canvas, we ap-

ply backward mapping. Following (Wagner et al.,

2010), we apply

R = O · µ(x

cyl

) (7)

camera

= K · δ (π(R)) (8)

pixel

= interpolate (x

camera

)) (9)

to project points on the cylinder x

cyl

into the pixel

coordinates x

pixel

. First, we apply the function µ to

project the point on the cylinder to a ray. This ray is

then rotated according to an orientation O. With the

function π, we convert R into the image plane. After

applying radial distortion with δ, we can project the

point into the camera frame by multiplying it with the

intrinsic matrix K. Finally, the pixel value is chosen

by interpolation. This yields x

pixel

, the color value for

the panoramic coordinate x

cyl

VISAPP 2024 - 19th International Conference on Computer Vision Theory and Applications

102

Figure 4: A naively stitched panorama (top) in comparison with a calibrated stitched panorama (bottom) of the same scene.

The improvements are visible in the overlapping regions, e.g., there are no duplicate persons in the center of the panorama or

the structure of objects is consistent across cameras.

4.3 Limitations

The number of cameras we employ is minimal con-

cerning our goal to capture a full 360

◦

panorama.

While Figure 4 shows good geometric accuracy,

seams in the overlap areas are visible. Our use case

targets the detection of thermal signatures, which

means that the real temperature values, respectively

the visual appeal of the panorama, are less impor-

tant. Each camera is automatically adapting to spread

the available value range, i.e., [0, 255], to maximize

the preserved information content. This implies that

the individual ranges correspond to different abso-

lute temperature ranges, resulting in a panoramic view

similar to one created from multiple RGB cameras

missing brightness and contrast adaption.

As PanoTherm is constructed out of separate cam-

eras, the parallax between frames is apparent. These

limitations could be addressed with geometrical algo-

rithms, e.g., multi-perspective plane sweep (MPPS)

algorithm (Kang et al., 2004; Uyttendaele et al.,

2004) or image based solutions, e.g., Stretching

Stereo (Kang et al., 2004).

5 CAMERA-TO-WORLD

REGISTRATION

In order to give measurements in the panoramic view

meaning in terms of direction and distance with re-

gards to the given environment, the camera rig has

to be registered in a coordinate system. We em-

ploy a modiﬁed version of the known Perspective-

3-Point (P3P) algorithm (Fischler and Bolles, 1981)

Image Coordinate System

Cylindrical Coordinate

System

3D World Coordinate

System

artk

Figure 5: Illustration of the image to 3D world point map-

ping through the cylindrical coordinate system. Image

coordinates x

are mapped into cylindrical coordinates

,β

. Knowing the 3D positions of the marker corners,

i.e., x

, we can solve the PnP problem and resolve

the transformation between C

and C

used for panorama localization in Arth et al., (Arth

et al., 2011).

PanoTherm is placed in a given environment with

one of the cameras facing to the front to reduce the

impact of image distortion. In the ﬁeld of view of

this camera, a ﬂat AR toolkit (ARTK+) marker (Wag-

ner and Schmalstieg, 2007) is placed to form the

world coordinate center C

. As our panoramic im-

age is a mapping onto a cylindrical surface each pixel

in the panorama can be represented as a ray ema-

nating from the center of the cylinder through the

cylindrical surface with two angles, α and β. More

speciﬁcally, a point in the panoramic image domain

pano

= (x

pano

) is converted into cylindrical co-

PanoTherm: Panoramic Thermal Imaging for Object Detection and Tracking

103

ImageAcquisition

Pa noramicStitching

Detection

RealWorld

Positi on

Estimator

and

Mapper

Localizedresultswithi nthepanoramicview,in3Dandrelativetotheregisteredvehicle

Figure 6: Block diagram representation of the proposed concept around PanoTherm. The images acquired by individual

sensors (top left) are registered and stitched into a panoramic view (top middle). Similarly, MOT is used to detect humans in

the actual imagery (top right). Using the detection results, the registration, and the cylindrical representation, we can represent

the position and motion of detected objects with respect to the vehicle registered (bottom).

ordinates x

cyl

= (x

cyl

) as follows

cyl

= sα = stan

−1

pano

, (10)

cyl

= sβ = s

pano

+ f

. (11)

Here, s is a scaling factor and f corresponds to the

focal length (Szeliski et al., 2007).

We can thereby convert image measurements

to rays emanating from the center of C

and

going through individual points x

cyl

, and mea-

sure their pairwise angle. The corresponding 3D

points of the marker corners all have coordinates

(±l

artk

/2,±l

artk

/2,h

artk

). In a general setup, we as-

sume h

artk

= 0 which results in the marker center be-

ing the center of the world coordinate system C

Figure 5 depicts an illustration of these relation-

ships. Reformulating the P3P problem we solve the

polynomial equation system similar to the one solved

in the pinhole case, giving us the transformation be-

tween the coordinate systems C

and C

in the form

of [R

5.1 Real World Position Estimator

For the composition of the panoramic image, we uti-

lize backward warping (see Section 4.2). To project

a point from the panoramic image into the real world,

we apply forward warping.

In a practical scenario, we can assume that the ob-

served environment is mostly ﬂat and can be approxi-

mated through a ground plane P. Putting the ARTK+

marker upright on the ground during registration, P is

trivially given as z = 0. First, we can convert x

cyl

fol-

lowing Equations (10) and (11). Then, we convert it

into a ray emanating the coordinate center C

and go-

ing through x

cyl

. Note that the 3D world coordinates

of the center of C

are given by

= −R

. (12)

Lastly, we intersect this ray with the plane P to deter-

mine the approximate 3D position, i.e., the distance

and heading, relative to the world coordinate system

An illustration of this concept is later shown in our

experimental setup in Figure 10. The relative heading

′

determines the relative orientation of an object to

, respectively the distance d can be calculated from

′

. From the assumption that the object in question

is oriented upwards, we can additionally use β

′

to de-

termine the approximate object height.

6 EXPERIMENTS

To evaluate PanoTherm, we employ the sensor sys-

tem on a real car. This includes the registration of our

system with respect to the car itself, the detection and

tracking of pedestrians in proximity to the car in the

panoramic image, and the utilization of the cylindri-

cal geometry to re-project the detected objects con-

cerning the car into real-world coordinates. Figure 6

displays the workﬂow of our use case.

VISAPP 2024 - 19th International Conference on Computer Vision Theory and Applications

104

CAM1

CAM2

CAM3

CAM4

artk

Figure 7: Registration with respect to a given environment.

As the camera rig is internally calibrated and represents

the coordinate center C

, it is sufﬁcient to use a standard

ARTK+ marker with a known size on the engine hood with

a known approximate height above ground.

6.1 Registration

As we are mainly interested in blind-spot avoidance

on a vehicle, we perform an illustrative setup as

shown in Figure 7 and leave out exact measurements

for now. PanoTherm is mounted on the roof of a ve-

hicle with one of the cameras facing to the front. A

ﬂat ARTK+ marker (Wagner and Schmalstieg, 2007)

made from steel with a known edge length l

artk

mounted on the engine hood pointing upwards. For

simplicity, we measure the approximate height above

ground h

artk

. Similarly, we measure the approximate

distance of the marker from the front end of the car,

artk

, with a standard meter. Mounting the marker in

the middle of the hood, we can further infer the dis-

tance of the marker center to w/2 with w denoting the

known width of the car.

Once the engine of the car warms up, the marker

becomes visible in the image of the front-facing cam-

era, allowing us to detect it in the thermal image

. An

example image is shown in Figure 8. By using h

artk

for the z-coordinate of the marker corners in the reg-

istration step described in Section 5, we can place the

world coordinate center C

perpendicularly below the

marker center on the ground plane P, such that our as-

sumption of P given by z = 0 holds. We ﬁnally arrive

at a registration of PanoTherm with respect to C

, be-

ing a coordinate system anchored to the vehicle.

Note that we used manual thresholding for simplicity,

however, a targeted implementation of ARTK+ could detect

the marker automatically as well.

Figure 8: Thermal signature of the marker as seen from the

front-facing camera.

6.2 Object Detection and Tracking

We utilize a YOLOv8 model and train it on the

KAIST (Hwang et al., 2015) dataset before applying

ﬁne-tuning on a small custom dataset. As described

previously, the thermal range between single cameras

is not equalized over the whole panorama to improve

the visible range of each image. Therefore, we abstain

from using image-based re-identiﬁcation models and

leverage motion-based predictions in the multi-object

tracking (MOT) module. Note that our main goal is to

prove the proposed concept and not the evaluation of

different detection and tracking methods in terms of

performance in the thermal domain. An example of

human detection using this setup is shown in the red

frame in Figure 9.

6.3 Reprojection and Height Estimation

The MOT framework leaves us with bounding boxes

B ∈ R

, each deﬁned by identiﬁer, class, height,

width, and location in the image domain. To project

the lower edge of the bounding box into real-world 3D

coordinates, we utilize the geometrical properties of

the cylindrical projection (see Section 5.1). The posi-

tion of the bounding box in 3D is further processed to

obtain the height of the detected object. For this, we

again intersect the vertical plane of the detection with

the ray resulting from the upper edge of the bounding

box. Since PanoTherm is fully registered, the con-

version between 3D world coordinate system C

and

cylindrical coordinate system C

is solved. Hence, we

can simply apply [R

] to convert the coordinates

from one frame into another.

Figure 10 showcases the geometrical relation of

these two coordinate frames in our use case. This

leaves us not only with the 3D position in world coor-

dinates but also with the height of the object.

Since the viewing angle is obstructed by the car

roof, we apply a manually picked threshold to the

PanoTherm: Panoramic Thermal Imaging for Object Detection and Tracking

105

Figure 9: Tracking a human across several frames in the panoramic space. Left, the panorama, the center bottom point, and the

raw output of the MOT are shown. Middle, the ray from the center of C

through the panorama wrapped onto the cylindrical

surface is shown. Right, the motion between individual frames with respect to C

, respectively the vehicle is depicted.

’

0°

’

0°

a’

Figure 10: Left, the top-down view to determine the heading of an object relative to the vehicle, i.e., α

′

is depicted. Right, the

side view to determine the distance d of an object through the angle β

′

is shown. If we assume an object to be upright, we

can additionally use β

′

to determine the approximate height of the object as well.

height and width of the bounding box. With this, we

can guess if the feet of the pedestrian correspond with

the lower edge of the detected bounding box. If this

is not the case, we assume an average body height of

1.7m. Therefore, we again are able to infer the posi-

tion of the pedestrian by intersecting the ray through

the upper edge of the bounding box with the bounding

box plane which is perpendicular to the ground. Note,

that our workﬂow is not restricted to the detection and

tracking of pedestrians.

6.4 Discussion

During our experiments, we noticed a set of issues

that need further investigation. First, the use of ap-

proximate measurements certainly harms the accu-

racy of our registration method. Depending on the ac-

tual use case, mounting the marker in a different setup

w.r.t. to the vehicle and adapting the assumptions

about the ground plane P might be advantageous.

Second, the accuracy of our setup was only empiri-

cally evaluated. However, measuring the end-to-end

accuracy using different technologies, e.g., laser dis-

tance meters, is required to create more trustworthy

results. Third, technically the current hardware se-

tups cannot deliver results at full frame rate. This is

an inherent ﬂaw of certain parts of our implementa-

tion and the intermediate hardware used, which can

be resolved. Finally, we did neither evaluate the per-

formance of the MOT module nor did we test Pano-

Therm under adverse environmental conditions yet.

This is subject to a pending in-depth investigation.

7 CONCLUSION

In this work, we introduce PanoTherm, a concept

to build a thermal panoramic camera from low-cost

longwave infrared thermal sensors. We describe an

algorithm for simple calibration of the individual sen-

sors into a camera rig. Similarly, we outline a method

to register the camera to its environment to establish

real-world measurements. Finally, we demonstrate

the plausibility of this concept on detecting and track-

ing a person in the near environment of the vehicle.

While we demonstrate this as a proof-of-concept,

several open points are subject to investigation in the

future, both in the algorithmic and the technical do-

main. The calibration routines we use still suffer from

VISAPP 2024 - 19th International Conference on Computer Vision Theory and Applications

106

inaccuracies related to the simplicity of the method

and the very narrow ﬁeld of overlap between individ-

ual cameras. Additional improvements in this domain

would automatically enhance both the visual appeal

of a stitched panorama, as well as - more importantly

- the geometric accuracy of any subsequent calcula-

tion. In terms of the technical components we use,

the real outdoor applicability of the current setup will

depend on the creation of a more ruggedized housing

for the sensor, together with proper mounting options

on a given vehicle. Future improvements will focus

on the development of further concepts in the domain

of off-highway machinery, such as excavators, snow

groomers or reach stackers.

ACKNOWLEDGEMENTS

Funded by the European Union under Grant Agree-

ment No. 101092861. Views and opinions expressed

are however those of the author(s) only and do not

necessarily reﬂect those of the European Union or the

European Commission. Neither the European Union

nor the granting authority can be held responsible for

them.

REFERENCES

Arth, C., Klopschitz, M., Reitmayr, G., and Schmalstieg, D.

(2011). Real-Time Self-Localization From Panoramic

Images on Mobile Devices. In Int. Symposium on

Mixed and Augmented Reality (ISMAR).

Bewley, A., Ge, Z., Ott, L., Ramos, F. T., and Upcroft, B.

(2016). Simple Online and Realtime Tracking. In Int.

Conference on Image Processing (ICIP).

Bradski, G. (2000). The OpenCV Library. Dr. Dobb’s Jour-

nal of Software Tools.

Cheng, R., Wang, K., Lin, S., Hu, W., Yang, K., Huang, X.,

Li, H., Sun, D., and Bai, J. (2019). Panoramic Annular

Localizer: Tackling the Variation Challenges of Out-

door Localization Using Panoramic Annular Images

and Active Deep Descriptors. In Intelligent Trans-

portation Systems Conference.

Cowan, L. V., Babington, J., Carles, G., Perciado, M. A.,

Wood, A., and Harvey, A. R. (2019). 360° snapshot

imaging with a convex array of long-wave infrared

cameras. In Proc. of Propagation Through and Char-

acterization of Atmospheric and Oceanic Phenomena.

Dai, X., Yuan, X., and Wei, X. (2021). Tirnet: Object De-

tection in Thermal Infrared Images for Autonomous

Driving. Applied Intelligence, 51:1244–1261.

Dendorfer, P., Rezatoﬁghi, H., Milan, A., Shi, J., Cre-

mers, D., Reid, I., Roth, S., Schindler, K., and Leal-

Taix

e, L. (2020). MOT20: A Benchmark for Multi-

Object Tracking in Crowded Scenes. arXiv CoRR,

abs/2003.09003.

Ellmauthaler, A., da Silva, E. A., Pagliari, C. L., Gois, J. N.,

and Neves, S. R. (2013). A Novel Iterative Calibration

Approach for Thermal Infrared Cameras. In Int. Con-

ference on Image Processing (ICIP).

Fehlman, W. L. and Hinders, M. K. (2009). Mobile Robot

Navigation With Intelligent Infrared Image Interpre-

tation. Springer.

Fischler, M. A. and Bolles, R. C. (1981). Random Sample

Consensus: A Paradigm for Model Fitting With Ap-

plications to Image Analysis and Automated Cartog-

raphy. Communications of the ACM, 24(6):381–395.

Gade, R. and Moeslund, T. B. (2014). Thermal Cameras

and Applications: A Survey. Machine vision and Ap-

plications, 25:245–262.

Halachmi, I., Guarino, M., Bewley, J., and Pastell, M.

(2019). Smart Animal Agriculture: Application of

Real-Time Sensors to Improve Animal Well-Being

and Production. Annual Review of Animal Bio-

sciences, 7:403–425.

Hartley, R. and Zisserman, A. (2003). Multiple View Geom-

etry in Computer Vision. Cambridge University Press.

Heikkila, J. and Silv

en, O. (1997). A Four-Step Camera

Calibration Procedure With Implicit Image Correc-

tion. In Conf. on Computer Vision and Pattern Recog-

nition (CVPR), pages 1106–1112. IEEE.

Herrmann, T., Migniot, C., and Aubreton, O. (2019). Ther-

mal Camera Calibration With Cooled Down Chess-

board. In In Proc. of the Quantitative InfraRed Ther-

mography Conference.

Ho, T. and Budagavi, M. (2017). Dual-Fisheye Lens Stitch-

ing for 360-Degree Imaging. In Int. Conf. on Acous-

tics, Speech, and Signal Processing.

Hoegner, L. and Stilla, U. (2009). Thermal Leakage De-

tection on Building Facades Using Infrared Textures

Generated by Mobile Mapping. In Proc. of the Joint

Urban Remote Sensing Event. IEEE.

Hwang, S., Park, J., Kim, N., Choi, Y., and Kweon, I. S.

(2015). Multispectral Pedestrian Detection: Bench-

mark Dataset and Baselines. In Conf. on Computer

Vision and Pattern Recognition (CVPR).

Jewett, J. W. and Serway, R. (2008). Physics for Scientists

and Engineers with Modern Physics. Vectors, 7th edi-

tion.

Kang, S. B., Szeliski, R., and Uyttendaele, M. (2004).

Seamless Stitching Using Multi-Perspective Plane

Sweep. Technical Report MSR-TR-2004-48, Mi-

crosoft Research.

Kri

sto, M., Ivasic-Kos, M., and Pobar, M. (2020). Ther-

mal Object Detection in Difﬁcult Weather Conditions

Using YOLO. IEEE Access, 8:125459–125476.

Lag

uela, S., Gonz

alez-Jorge, H., Armesto, J., and Arias,

P. (2011). Calibration and Veriﬁcation of Thermo-

graphic Cameras for Geometric Measurements. In-

frared Physics & Technology, 54(2):92–99.

Lin, H.-S., Chang, C.-C., Chang, H.-Y., Chuang, Y.-Y.,

Lin, T.-L., and Ouhyoung, M. (2018). A Low-Cost

Portable Polycamera for Stereoscopic 360 Imaging.

IEEE Transactions on Circuits and Systems for Video

Technology, 29(4):915–929.

PanoTherm: Panoramic Thermal Imaging for Object Detection and Tracking

107

Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S.,

Fu, C.-Y., and Berg, A. C. (2016). SSD: Single Shot

Multibox Detector. In Europ. Conf. on Computer Vi-

sion (ECCV).

Lowe, D. G. (1999). Object Recognition From Local Scale-

Invariant Features. In Int. Conf. on Computer Vision

(ICCV).

Luis Technology (2023). Unf

alle im toten Winkel –

Studie zu Unfalltypen. https://www.abbiegeassistent.

de/studie-zu-unfalltypen/. Accessed Oct. 20, 2023).

Messina, G. and Modica, G. (2020). Applications of Uav

Thermal Imagery in Precision Agriculture: State of

the Art and Future Research Outlook. Remote Sens-

ing, 12(9):1491.

Milan, A., Leal-Taix

e, L., Reid, I., Roth, S., and Schindler,

K. (2016). MOT16: A Benchmark for Multi-Object

Tracking. arXiv CoRR, abs/1603.00831.

Milgram, D. L. (1975). Computer Methods for Creat-

ing Photomosaics. IEEE Transactions on Computers,

100(11):1113–1119.

Mor

e, J. J. (1977). The Levenberg-Marquardt algorithm:

implementation and theory. In Proc. of the Conference

on Numerical Analysis, pages 105–116.

Ng, Y.-M. H. and Du, R. (2005). Acquisition of 3D Surface

Temperature Distribution of a Car Body. In IEEE Int.

Conference on Information Acquisition, pages 16–20.

Powell, I. (1994). Panoramic lens. Applied Optics,

33(31):7356–7361.

Redmon, J., Divvala, S., Girshick, R., and Farhadi, A.

(2016). You Only Look Once: Uniﬁed, Real-Time

Object Detection. In Conf. on Computer Vision and

Pattern Recognition (CVPR).

Ring, E. and Ammer, K. (2012). Infrared Thermal Imaging

in Medicine. Physiolog. Measurement, 33(3):33–46.

Rosique, F., Navarro, P. J., Fern

andez, C., and Padilla, A.

(2019). A Systematic Review of Perception System

and Simulators for Autonomous Vehicles Research.

Sensors, 19(3):648–677.

Rublee, E., Rabaud, V., Konolige, K., and Bradski, G.

(2011). ORB: An efﬁcient alternative to SIFT or

SURF. In Int. Conf. on Computer Vision (ICCV).

Saponaro, P., Sorensen, S., Rhein, S., and Kambhamettu,

C. (2015). Improving Calibration of Thermal Stereo

Cameras Using Heated Calibration Board. In Int.

Conference on Image Processing (ICIP), pages 4718–

4722. IEEE.

Sousa, E., Vardasca, R., Teixeira, S., Seixas, A., Mendes,

J., and Costa-Ferreira, A. (2017). A Review on the

Application of Medical Infrared Thermal Imaging in

Hands. Infrared Physics & Technology, 85(4):315–

323.

Spitzer, P. (2023). Sehen und Gesehen wer-

den: Unf

alle im toten Winkel und aufgrund

von Sichtbehinderungen – Fokusreport 2023.

https://grosse-schuetzen-kleine.at/publikationen/

sehen-und-gesehen-werden-unfaelle-im-toten-\

winkel-und-aufgrund-von-sichtbehinderungen-\

fokusreport-2023. Accessed Oct. 20, 2023).

Systems, F. C. V. (2011). Uncooled Detectors for Thermal

Imaging Cameras. http://www.ﬂirmedia.com/MMC/

CVS/Appl Stories/AS 0015 EN.pdf.

Szeliski, R. et al. (2007). Image alignment and stitching:

A tutorial. Foundations and Trends® in Computer

Graphics and Vision, 2(1):1–104.

Thomas, A., Leboucher, V., Cotinat, A., Finet, P.,

and Gilbert, M. (2019). UAV Localization Using

Panoramic Thermal Cameras. In Conf. on Computer

Vision Systems (ICVS), pages 754–767. Springer.

Usamentiaga, R., Garcia, D., Ibarra-Castanedo, C., and

Maldague, X. (2017). Highly Accurate Geometric

Calibration for Infrared Cameras Using Inexpensive

Calibration Targets. Measurement, 112:105–116.

Uyttendaele, M., Criminisi, A., Kang, S., Winder, S.,

Szeliski, R., and Hartley, R. (2004). Image-Based

Interactive Exploration of Real-World Environments.

Computer Graphics and Applications, 24(3):52–63.

Vadivambal, R. and Jayas, D. S. (2011). Applications of

Thermal Imaging in Agriculture and Food Industry—

A Review. Food and Bioprocess Technology, 4:186–

199.

Venegas, P., Ivorra, E., Ortega, M., and S

aez de Oc

ariz,

I. (2022). Towards the Automation of Infrared Ther-

mography Inspections for Industrial Maintenance Ap-

plications. Sensors, 22(2):613.

Vidas, S., Lakemond, R., Denman, S., Fookes, C., Sridha-

ran, S., and Wark, T. (2012). A Mask-Based Approach

for the Geometric Calibration of Thermal-Infrared

Cameras. IEEE Trans. on Instrumentation and Mea-

surement, 61(6):1625–1635.

Wagner, D., Mulloni, A., Langlotz, T., and Schmalstieg, D.

(2010). Real-Time Panoramic Mapping and Track-

ing on Mobile Phones. In Lok, B., Klinker, G., and

Nakatsu, R., editors, IEEE Conf. on Virtual Reality

and 3D User Interfaces.

Wagner, D. and Schmalstieg, D. (2007). ARToolKitPlus

for Pose Tracking on Mobile Devices. In Proc. 12th

Computer Vision Winter Workshop (CVWW), 2007.

Yang, R., Yang, W., Chen, Y., and Wu, X. (2011). Geomet-

ric Calibration of IR Camera Using Trinocular Vision.

Journal of Lightwave Technology, 29(24):3797–3803.

Yanmaz, L. E., Okumus, Z., and Dogan, E. (2007). Instru-

mentation of Thermography and Its Applications in

Horses. Journal of Animal and Veterinary Advances,

6(7):858–862.

Zeng, X., Lan, J., and Gao, X. (2020). The Design of a

Catadioptric Omnidirectional Thermal Imaging Sys-

tem Based on the Combination of Genetic Algorithm

and Gradient Descent. Optics & Laser Technology,

122:1058–1061.

Zhang, J., Ma, C., Yang, K., Roitberg, A., Peng, K., and

Stiefelhagen, R. (2021). Transfer Beyond the Field of

View: Dense Panoramic Semantic Segmentation via

Unsupervised Domain Adaptation. Intelligent Trans-

portation Systems Conference, 23(7):9478–9491.

Zhang, Y., Sun, P., Jiang, Y., Yu, D., Weng, F., Yuan,

Z., Luo, P., Liu, W., and Wang, X. (2022). Byte-

Track: Multi-Object Tracking by Associating Every

VISAPP 2024 - 19th International Conference on Computer Vision Theory and Applications

108

Detection Box. In Europ. Conf. on Computer Vision

(ECCV).

Zhang, Z. (2000). A Flexible New Technique for Camera

Calibration. IEEE Trans. on Pattern Analysis and Ma-

chine Intelligence, 22(11):1330–1334.

Zhang, Z., Rebecq, H., Forster, C., and Scaramuzza, D.

(2016). Beneﬁt of Large Field-Of-View Cameras for

Visual Odometry. In Int. Conf. on Robotics and Au-

tomation (ICRA).

Zioulis, N., Karakottas, A., Zarpalas, D., and Daras, P.

(2018). Omnidepth: Dense Depth Estimation for In-

doors Spherical Panoramas. In Europ. Conf. on Com-

puter Vision (ECCV).

PanoTherm: Panoramic Thermal Imaging for Object Detection and Tracking

109