Video based Swimming Analysis for Fast Feedback

Paavo Nevalainen

, Antti Kauhanen

, Csaba Raduly-Baka

Mikko-Jussi Laakso

and Jukka Heikkonen

Department of Information Technology, University of Turku, FI-20014 Turku, Finland

Sport Academy of Turku Region, Kaarinankatu 3, 20500 Turku, Finland

Keywords:

Athletics, Swimming, Motion Tracking, Camera Calibration, Signal Smoothing, Movement Cycle Registra-

tion.

Abstract:

This paper proposes a digital camera based swimming analysis system for athletic use with a low budget. The

recreational usage is possible during the analysis phase, and no alterations of the pool environment are needed.

The system is of minimum complexity, has a real-time feedback mode, uses only underwater cameras, is ﬂexi-

ble and can be installed in many types of public swimming pools. Possibly inaccurate camera placement poses

no problem. Both commercially available and tailor made software were utilized for video signal collection

and computational analysis and for providing a fast visual feedback for swimmers to improve the athletic

performance. The small number of cameras with a narrow overlapping view makes the conventional stereo

calibration inaccurate and a direct planar calibration method is proposed in this paper instead. The calibration

method is presented and its accuracy is evaluated. The quick feedback is a key issue in improving the athletic

performance. We have developed two indicators, which are easy to visualize. The ﬁrst one is the swimming

speed measured from the video signal by tracking a marker band at the waist of the swimmer, another one is

the rudimentary swimming cycle analysis focusing to the regularity of the cycle.

1 INTRODUCTION

This paper describes the swimming analysis system

being developed at the Impivaara public swimming

center in Turku, Finland. Starting a new site for swim-

ming analysis requires usually considerable resources

and our economical approach with 5-7ke budget for

hardware and software licenses should be of interest

to any swimming coach considering a basic comput-

erized real-time feedback at a local site.

Budget reasons forced us to use 3 cameras only

and video coverage of 18 m. Another major constraint

was to get the system up and running with no special

initial procedure and without disturbing recreational

swimmers. The system can be expanded in the future

by a fourth camera at the grey dot depicted in Fig 1.

This setup will cover the whole 25 m pool length.

We use the video image series of the light-

reﬂective marker on the waist of the swimmer to

record the movement of the swimmer. The marker

moves along the tracking plane, which resides 200

mm aside towards the cameras from the centerline of

the swimming lane. The distance has been chosen so

that it approximates the dimensions of the pelvis of an

average-sized adult male and female. The real move-

ment of the marker is naturally a non-planar one, but

the planar approximation is a useful ﬁrst step to sim-

plify the swimmer movement analysis. The tracking

plane of swimming lane 7 is depicted in Fig. 1. The

tracking plane has c. 1 m depth at the shallow end and

c. 2 m depth in the deep end of the pool. All the cali-

bration measurements were constrained on this plane,

and the calibration result is a geometric mapping from

the image pixels to global coordinates of the tracking

plane. Two lanes with numbers 7 and 8 were cal-

ibrated. Lane 8 is used occasionally for swimming

analysis purposes, but camera views do not cover the

whole length of the tracking plane as seen in Fig. 4.

The camera positions are constrained to win-

dowsills at the sides of the pool at the depth of 560

mm. The image mapping was constructed directly in

relation to the tracking plane, and this method does

not require usual camera model, camera locations and

orientations. The stereo calibration method proved in-

ferior because of very limited overlap between cones

of visibility, see Fig. 4.

The design emphasizes the possibility to a fast

feedback. Thus there are features which are designed

to operate in real-time during the session of athletic

Nevalainen, P., Kauhanen, A., Raduly-Baka, C., Laakso, M-J. and Heikkonen, J.

Video based Swimming Analysis for Fast Feedback.

DOI: 10.5220/0005753704570466

In Proceedings of the 5th International Conference on Pattern Recognition Applications and Methods (ICPRAM 2016), pages 457-466

ISBN: 978-989-758-173-1

457

Figure 1: The general layout of the site seen from above.

The tracking plane of lane 7 is emphasized.

performance. There are also some features which are

based on the post-session phase. The forms of the

feedback implemented are:

1. Real-time marker movement tracking embedded

on the video.

2. Different performances presented side-by-side for

visual comparison and evaluation. It can be real-

time and post-session.

3. Stroke variation visualization, which is designed

so that it can be monitored by the athlete in the

pool. It can be real-time or post-session.

4. Geometric transformation of the video stream

from pixels to global coordinates. It can be real-

time or post-session.

The geometric mapping algorithm can project the

raw video to a real-time 25 fps mono-color visualiza-

tion on the tracking plane. The full quality color video

has processing speed of c. 5 fps and cannot be per-

formed in real-time. The geometric mapping will be

a crucial part for a seamless swimmer-focused view

after the swimmer detection has been implemented.

The marker tracking routine introduces both

stochastic and algorithmic noise to the signal. After

the pixel signal is transformed to global coordinates,

the signal needs to be smoothed to eliminate the noise.

The Kalman smoothing uses a basic dynamical model

of a swimmer body. Smoothing requires the record of

the whole performance as input and thus is a post-

session step.

So far the coaching routine with verbal and video

feedback has been established, but already the proce-

dure is used on weekly basis and it requires no extra

technical personnel on site.

The rest of the paper is organized as follows.

Sec. 2 is a short presentation about the current re-

search. Sec. 3 documents the architecture of the sys-

tem, tracking and swimming cycle registration. Sec. 4

presents the used in-plane calibration method in de-

tail, since this aspect is heavily dictated by the budget

limitations yet opens possibilities for future research

as well. Sec. 5 is about the real-time tracking visual-

ization.

The swimmig cycle registration and comparison

presented in Sec. 6 has been an important early facil-

ity for the coaching. The post-session analysis phase

of Sec. 7 is one adaptation to the budget limitations.

Sec. 8 summarizes the design choices made to achieve

the real-time system response. Sec. 9 has conclu-

sions and discussion about the possible future devel-

opments.

2 LITERATURE REVIEW

There are several approaches to swimming analysis.

The oldest one is using wire. (Jean-Claude, 2003)

reports about measuring the force in the wire while

some object is dragged behind, another method is

measuring the swimmer speed directly using the wire.

The mechanical method is used especially to verify

the video installments.

Video analysis is the dominant mode of perfor-

mance analysis nowadays, see e.g. a review of the

ﬁeld in (Kirmizibayrak et al., 2011). A typical ap-

proach is:

• to produce the continuous video stream from mul-

tiple cameras

• and trace anchor points of the body (marked or

nonmarked) and then

• combine the acquired information to a biome-

chanical or 3D visualization model.

There are several commercial tools available,

many of them summarized in (Kirmizibayrak et al.,

2011). Typical examples are Dartﬁsh (Dartﬁsh, 2015)

and Sports Motion (Sportsmotion, 2015).

Wearable accelerometers are relatively new tools.

These are developing smaller and lighter, also the op-

eration time is increasing due the lower power re-

quirements and increased battery capacity. (Dadashi

et al., 2013) shows an arrangement with only one

accelerometer to record the swimming performance

over the full length of the pool. The data link is

usually radio-linked in bursts like e.g. (James et al.,

2011), or in the end of the performance, like in

(Dadashi et al., 2013).

A typical large scale system design can be found

from (Mullane et al., 2010). They provide an ex-

cellent analysis of what feedback should be provided

real-time and what at post-session phase.

Swedish Center for Aquatic Studies has AIM

(Athletes in Motion) system which can combine

views from submerged and above-water cameras,

ICPRAM 2016 - International Conference on Pattern Recognition Applications and Methods

458

see (Haner et al., 2015). The calibration process re-

sembles our approach albeit they use striped poles

while we use chessboard pattern. AIM has been de-

veloped by efforts of Chalmers and Lund Universities.

Chalmers University has a multiple accelerometer

arrangement, which is coupled with a video analysis

and a biomechanical model. Accelerometers can be

placed by suction cups to various areas of the body.

One study shows how a relatively low frequency still

provides adequate biomechanical modeling (Siirtola

et al., 2011).

Another video technique is the virtual camera

technique, where a moving viewpoint is synthetized

between two adjacent cameras. It is possible to in-

terpolate the view between stationary spots like in

(Makoto et al., 2002).

Head is a popular choice of strapping the athlet

with a sensor device. One can wear a colored cap, or

swimming glasses with an accelerometer, see (Pansiot

et al., 2010).

Relatively new approach is the video analysis

without markers (Ceseracciu, 2011). From athlete

point of view it is much less intrusive and enables au-

tomation of the analysis process.

The trend in research seems to be towards 3D

visualization and increasing usage of biomechanical

models. Actual analysis is quite developed and re-

maining goals are at quick performance feedback and

well visualized and conceptually simple performance

measures.

As a summary, existing systems are well-

developed and serve the coaching activities well. Of-

ten the implementation is rather involved requiring

technical assistance, set-up times and high initial and

running costs. Our aim was to produce a cheap and

simple non-intrusive alternative with stable basis for

further improvement.

3 SYSTEM DESCRIPTION

The system consists of:

• one 2-core 3.2 GHz 64 bits computer with. 2 TB

of disc space

• 3 permanently placed 50 fps cameras at the side

wall of the 25 m pool. The maximum image size

is 750 × 2044. The camera placement is dictated

by the construction of the pool.

• one movable extra camera for above-water usage

and one movable underwater camera. Usage of

the extra cameras is just for visual observation and

verbal feedback only.

• movement marker and band at the hip of the

swimmer.

The cameras and computer record and store over 50

fps high resolution digital video in uncompressed for-

mat. The image size is 750 × 2044 pixels. An indi-

vidual pixel of the geometrically transformed image

corresponds to 4.0 × 4.0 mm

and 2.3 × 2.3 mm

lanes 7 and 8, correspondingly.

The uncompressed data from three cameras

amounts to about 1GB for a 10 second clip. All cam-

eras are synchronized so that they capture images at

the same time. The time stamps are stored in the video

ﬁles and they can be used in determining how to stitch

the tracking results.

Marker tracking algorithm utilizes OpenCV pack-

age (Bradski, 2000). Camera calibration was done

with self-developed software.

Process is divided to real-time and post-session

phases. Figure 2 illustrates the various steps of the

process. The recorded video is stored in a raw uncom-

pressed ﬁle format speciﬁc to the camera manufac-

turer and is later accessed by the post-session phase.

Figure 2: The processes and data ﬂow. Post-session steps

are indicated by the dashed outline.

4 CAMERA CALIBRATION

Camera calibration is a preliminary measurement pro-

cess delivering either the camera model (mapping

from pixels to normal vectors of the pinhole camera

idealization) or the direct geometric mapping from

pixels to global positions.

Three calibration methods, stereocamera

(Bouguet, 2008), mono-camera (Bouguet, 2008)

and our own direct planar calibration were tested.

The stereo-calibration is an industry standard

method since it is able to produce depth information

(3D) and is not limited to the tracking plane G of

Fig. 1. It calibrates the full camera view just like the

mono-camera method. It also provides an early qual-

ity check in the form of relative camera positions de-

Video based Swimming Analysis for Fast Feedback

459

picted in Fig. 3. The position error was c. 85

mm even after the best possible calibration measure-

ments described in (Bouguet, 2008).

Figure 3: The camera positions from stereo-calibration.

Reason for the low accuracy is the difﬁcult geom-

etry of the camera positions dictated by the location of

the windowsills of the pool, see Fig. 1. The amount of

overlap of the camera views is only c. 22%, see Fig. 4

with refraction included, whereas the overlap ratio in

a usual stereo calibration analysis is above 50%.

Figure 4: The area of visibility per each camera on the

tracking plane. Colors (red/green/blue) correspond to cam-

eras (1/2/3). Only c. 22% of the view is overlapping at lane

7. The lane 8 is closer to cameras, and there is no overlap-

ping anymore.

The small overlap also rules out the homography

approach described in (Chum et al., 2005) applied to

all the sample points at the whole tracking plane at

once. Otherwise that method would have been excel-

lent, since it is able to use the expected location errors.

The mono-camera approach is very close to

stereo-calibration, except the location and orientation

of each camera is a separate subject of the match-

ing process, when a ﬁt is made to the tracking plane

data. The mono-camera method is also close to the

direct planar calibration presented in Sec. 4.1. The

main difference is that the direct calibration requires

no camera model (not even the camera location) and

that the mapping from pixels to global locations can

be arbitrarily chosen. The mono-camera calibration

produces better mapping quality at the image borders

than the direct planar calibration. The difference is

aesthetic only, since the accurate zone of the direct

calibration can be made large enough to accommo-

date all the swimmer motions. Also the mono-camera

approach has been omitted from this presentation.

The stereo and mono calibrations were done with

Matlab Camera Calibration Toolbox, see (Bouguet,

2008). The theory of the toolbox is given at (Zhang,

1999) and (Heikkil

a and Silven, 1997).

The most accurate method was the direct planar

calibration proposed in Sec. 4.1. This method can be

categorized as an ad hoc approach answer to two con-

straints: sparsely placed camera array and potential

for real-time video transformation. Nearest reference

is (Luo et al., 2006), which uses a camera model and

requires the co-planarity of the camera image plane

and the tracking plane. Our method requires no cam-

era model. The direct planar method is presented in

the following.

4.1 Direct Plane Calibration

The geometric calibration was done for lanes 7 and

8 of ten available lanes. The calibration data for the

direct method was gathered by ﬂoating a calibration

chessboard along the surface at the tracking plane and

recording its position at each picture, see Fig. 5. The

chessboard had buoys at the top and weight at the bot-

tom. The global position x

of the board was mea-

sured within 10 mm accuracy std.

Figure 5: Direct calibration on the tracking plane. Above:

an individual x

position of the calibration board at camera

2 view. Below: the cumulated observation set U of camera

1 consisting of corner point pixels p and global positions

g ∈ R

from all images of lane 7 at camera 1. Only the x

component of the global position g depicted.

ICPRAM 2016 - International Conference on Pattern Recognition Applications and Methods

460

To ease the presentation, some deﬁnitions are

needed. An image I = (P,J) of size n × m is a pair of

set of pixels p ∈ P = [1, n] × [1,m] ⊂ Z

and their in-

tensities J = { j(p)|p ∈ P}, where an individual pixel

p has intensity j(p). Images come in three varieties,

source images I

= (P

), geometrically transformed

target images I

= (P

), and calibration images.

The tracking plane G = {g ∈ R

| (g − g

) · n

0} is deﬁned by one insident point g

= (0, 0, z

)

where z

= 5050 mm in case of lane 7. The 200 mm

aside of the center of the swimming lane. The normal

vector n

is a unit vector aligned with the global z

axis. The marker is assumed to move along this track-

ing plane. Fig. 1 depicts the tracking plane G and the

global coordinates x,y,z.

A chessboard corner pixel and its corresponding

global positions form a measurement pair (p,g) ∈ U.

The calibration data set U is cumulated over all cali-

bration images. One measurement image is depicted

at the upper part of Fig. 5. The lower part shows the

measurement set U (only x component of g shown).

The pixel samples of U cover only a part of the im-

age pixels P whereas the end result of the direct plane

calibration maps all pixels of the source image onto

the tracking plane P

→ G. In that sense this is an

interpolation problem.

In the following, only the mapping of the x com-

ponent is detailed. The y component has a similar

treatment and is omitted for brevity. There would

be some advantage to use a coupled mapping p →

(x,y) e.g. a mono-camera model for interpolation but

even this rudimentary approach with two independent

mappings produced encouraging results.

A piecewise bilinear smoothing function:

x = f (p, α

∗

) (1)

with shape parameters α

∗

∈ R

and automatic tiling

heuristics is used to set a map p → x from pixels p to

the global coordinate x. The shape parameter dimen-

sion d ∈ N

varies case by case because of the imple-

mentation (D’Errico, 2006) but is in this application

d ≈ 80. The regularization parameter λ

∈ R

con-

trols the smoothness. The deﬁnition below uses func-

tionals A

for error penalty and B for non-smoothness

penalty:

∗

= argmin

( f (.,α)) +λ

B( f (.,α))

( f (.,α)) =

∑

(p,g)∈ U

( f (p,α) − x)

(2)

B( f (.,α)) =

∑

(p,g)∈ U

(mean

q∈N

f (q,α) − f (p,α))

In the above, N

is the set of neighboring pix-

els of p at pixel radius r = 2. The function f (.,.)

is implemented as Matlab gridﬁt.m with ’bilinear’

and ’laplacian’ options, see (D’Errico, 2006). Val-

ues λ

= 120, λ

= 180 were chosen to keep the non-

smoothness measure B(.)/A(.) tolerable.

4.2 Precomputed Mapping

Let us combine Eqs. 2 and 1 for further treatment.

The source image pixels p

are mapped to tracking

plane G by:

) = ( f (p

,α

∗

), f (p

,α

∗

),z

) ∈ G, p

∈ P

(3)

The image of I

on G is now F

). The target image

is mapped to tracking plane G by:

) = g

+ γ





0 1

1 0

0 0





∈ G, p

∈ P

(4)

The target image pixel size γ = 4.0 mm for lane 7 and

γ = 2.3 mm for lane 8. Image location g

is speciﬁc

for each camera view on each swimming line. Pixel

p = (i, j) has row and column indices as depicted in

Fig. 6. It is now possible to interpolate the intensity

value of p

using Shepard interpolation of Eq. 6 at the

tracking plane G.

First, some deﬁnitions. Let |P

| be the number of

pixels in the image I

and M ∈ N

|×4

be a reference

pixel matrix. One row M

speciﬁed in Eq. 5, holds

4 reference pixels from P

for a pixel p

∈ P

. The

nearest neighbor operator NN

(x,X) selects k nearest

neighbors of x from a set X. Sets are treated as vec-

tors whenever there is a unique enumeration of the

set. The location of p

is g

. N

is the set of 4 nearest

neighbors of the sourcce image locations ar G.

= F

)

= NN

)) ⊂ G, k = 4

= F

−1

) ⊂ P

, i = 1...4 (5)

= {s(kg

− g

)k) | g

∈ N

}

, i = 1..4 (6)

s(r) = 1/ max(r, 0.05 mm)

where W ∈ R

|×4

is a radial interpolation weight ma-

trix with a correspondence to same indexing as ma-

trix M. Function s(r) is the radial weight used. The

normalization in Eq. 6 happens by L

norm: w

∑

| for a general weight row w.

Considering pixel intensities J

and J

as vectors

indexed by pixels, the transformation becomes:

) :=

∑

i=1

(7)

By selecting k = 1 one gets the real-time transforma-

tion:

) := J

) (8)

Video based Swimming Analysis for Fast Feedback

461

Eq. 8 corresponds to the Nearest Neighbor refer-

encing which can be computed at 25 fps (measured

with Matlab). The case k = 4 of Eq. 7 can be com-

puted at 5 fps and thus is not usable in real-time. The

quality of k = 1 case is adequate to a video stream,

see Fig. 6. If quality par the original is required, one

can use Eq. 7. The balance between speed and quality

can be tuned further by choosing k = 2 or k = 3.

The geometric image mapping is efﬁcient and

simple, see Eqs. 8 and 7. The formulation used also

makes it possible to combine the three separate video

signals accurately to one single video. This feature

will be implemented when an automated swimmer

targeting is added to the system.

Figure 6: Quality of the fast mapping, a detail at the oppo-

site wall. Above: the source image I

with pixel coordinates

i and j. Below: the target image I

with global coordinates

x and y.

4.3 Error analysis

The measurement points U in Fig. 5 are in approx-

imate horizontal rows. There is c. 150 mm vertical

gap between rows and c. 50 mm average horizontal

distance between points. This requires the interpolant

to have rather high penalty for non-smoothness.

The pixel detection was done with Matlab de-

tectCheckerBoard.m function, theory of which is con-

tained in (Zhang, 2000). The pixel detection error

is p ≈ (1,1). The mechanical placement accuracy of

the measured points (p,g) ∈ U is ∆g ≈ (10,10,10 +

0.01z)

mm as an approximate std. The error is un-

biased and the ﬁnal accuracy of F

) is much better.

A pixel-wise geometric mapping error measure

e(p) is formulated by Eq. 9 and depicted at Fig. 7:

e(p) = kg − F

(p)k, (p,g) ∈ U (9)

Since the sample set U is of rather good quality and

since the function F

is rather smooth, the error stays

almost constant even if the tuning of the shape param-

eters α

,α

in Eq. 2 is subjected to cross-validation

over subsets of U. The error is largest in occasional

points at the border and grows rapidly when extrap-

olating. The border areas are seldom occupied by a

swimmer, though, and the problem is more of aes-

thetical nature. The border error can be eliminated in

the future by applying a different interpolant instead

of one in Eq. 3.

0 5 10 15 20 25 30

Geometric error distributions, tracks 7−8, cams 1−3

||∆ g|| (mm)

freq.

std(Delta g): 1.68 (mm) (= 0.6 pixels)

2.05 2.1 2.15 2.2 2.25 2.3 2.35 2.4 2.45 2.5

x 10

500

1000

1500

2000

x (mm)

y (mm)

geom. error, t8,c3

lane 7 cam 1

lane 7 cam 2

lane 7 cam 3

lane 8 cam 1

lane 8 cam 2

lane 8 cam 3

Figure 7: The geometric mapping error e(p), p ∈ P

deﬁned

by Eq. 9. Large errors happen occasionally at the borders of

the sampled area U.

5 REAL-TIME TRACKING

The swimmer tracking method is based on marker

tracking using the blob tracking facilities of OpenCV

(Bradski, 2000). The marker is placed on a colored

ﬂexible band worn by the athlete in order to improve

accuracy and reduce noise. The band is selected and

installed so that it will not hinder the performance of

the swimmer. The yellow color of the band is chosen

to increase its visibility and identiﬁability in the envi-

ronment. The band can be occluded by the swimmer’s

hand, or by bubbles in the water.

The color of the band is selected based on the

color hue distribution of the environment. Tests on

the site showed that the environment colors (water,

swimmer, light, walls, etc) are between the interval

ICPRAM 2016 - International Conference on Pattern Recognition Applications and Methods

462

◦

− 270

◦

, leaving the rest of the hue circle open.

We selected a yellow color band. The choice leaves

room for one or two extra markers, if needed in some

future analysis.

The pixel trajectory of the marker is being visu-

alized for the user in real-time, see Fig. 8. The pixel

trace is then being mapped to global coordinates on

the tracking plane using the geometric mapping de-

scribed in Sec. 4. Tracking performs well with the

current 50 fps speed allowing the real-time rendering.

The visualized points are referring to image pixel

coordinates and are not suitable for speed analy-

sis. The swimmer speed is calculated by converting

these points using the geometric mapping described

in Sec. 4.

The real-time swimmer trace is provided as an

overlay curve at the video area, see Figure 8. The

current frame position is highlighted. The recorded

session can be replayed immediately.

Figure 8: Presenting the tracking results to the user. The

emphasised square is for the user only and it does not cor-

respond to the tracking plane. The track is based on pixel

information for reducing response time.

6 SWIMMING CYCLE

REGISTRATION

A swimming cycle is the state history over a time in-

terval during which the swimmer state returns rela-

tively close to the initial state. The maximum per-

formance requires rather monotonic strokes, yet the

rhythm may vary based on metabolical optimum. De-

tecting the regularity of swimming strokes is of im-

portance. Cycles seem to have a distinctive decelera-

tion phase just before each hand stroke. This enables

a simple cycle registration by ﬁnding a local spike

heuristically in phase signal at times t

∈ R

, i ∈ N

Using a peace-wise linear parameter τ:

t(τ) = t

(τ−i)+t

i+1

(i+1 −τ), i = bτc, τ ∈ R

(10)

one can compare the shapes of two cycles i and j di-

rectly in a duration-invariant way on their own relative

time scale t

+ τ. Let us deﬁne the duration of a cycle

i as T

= t

i+1

− t

. Now, the dissimilarity d

i j

between

two cycles i and j can be deﬁned:

i j

= [

(t(i + τ)) − v

(t( j + τ)))

dτ]

1/2

+ λ|T

− T

| (11)

The horizontal velocity v

in Eq. 11 is based on pixel

information with a moving average smoothing, since

we noticed the raw pixel signal is enough for the cycle

detection. The last summand of Eq. 11 sets a weight

on the duration difference between two strokes. The

duration difference penalty is open to experimenta-

tion, currently we use value: λ = 4. We have used

the vertical velocity v

(t) as the target signal for sim-

ilarity analysis. The target signal could be also the

horizontal velocity or a vector combination of both,

in which case a vector norm should be used in Eq. 11.

The swimming cycle registration is a post-session

process, which will be implemented as a real-time

feature in the future. A similarity matrix is cumu-

lated from last few strokes (last 5 in cases depicted

in Fig. 9). The visualization is designed to be seen

directly from the pool. The colors are scaled so

that black is a serious deviation from allowed, white

means identical strokes. Each stroke is compared to

others and no judgement is made towards the quality

of the swimming performance in general. The gray-

scale used is an arbitrary choice at the moment.

7 POST-SESSION PHASE

The post-session phase occurs when there is a pause

in the athletic performance. First, the recorded video

is stored on the hard disc. The trainer opens the video

ﬁle with the tracking software and it is able to provide

feedback to the coach and swimmer in a reasonable

time (at most a few minutes).

The software allows the overlaying of multiple

tracking results of different athletes. The trainer can

Video based Swimming Analysis for Fast Feedback

463

Figure 9: Swimming stroke regularity visualization. Rows

and columns are individual strokes. White means zero dif-

ference and black a difference d

i j

= 0.2 m/s.

use these data overlays to compare a trainee with a

reference (trained) swimmer performance.

The speed graphs acquired by geometric trans-

form of the original pixel trace are displayed in a sep-

arate area of the screen under the video frame. The

graphs span the whole observed length.

A number of quantitative measures are displayed

on the current swimmer performance, like aver-

age speed, distance, time, minimum and maximum

speeds. A speciﬁc time period can be highlighted

in the speed graph, to restrict the numeric display to

measurements on this area.

There will be further experiments on visualizing

various swimming characteristics. Preference will be

given to the real-time feedback.

7.1 Kalman Smoothing

Kalman smoothing (J. Hartikainen and and S

arkk

2011), is applied to pixel trace to get smoothed plots

of the position and velocity components over time.

Fig. 10 depicts the smoothed position and velocity

history. Further swimming style analysis and move-

ment registration operations can be based on this sig-

nal.

The marker observations are described as com-

ing from a linear dynamical system of Eq. 12 with

Gaussian noise w ∼ N(0, dσ

,σ

c) as the driving

force component. Our numerical choice was: σ

10 N, σ

= 20 N. Other constants of Eq. 12 are spec-

iﬁed in Eq. 13:

g(t) + c

g(t) + k g = w(t) (12)

The system is further discretized to given non-regular

observation times and transformed to a discrete-time

linear dynamical model with Gaussian noise term, see

details from (J. Hartikainen and and S

arkk

a, 2011).

The numerical values of the swimmer model of

Eq. 12 are chosen for an average swimmer, and

the dampening parameters approximate the observed

0 2 4

−2

−1

x(t) (m)

y(t) (m)

Position

Smoothed

Measured

0 0.5 1 1.5 2

−1.5

−1

−0.5

0.5

(t) (m/sec)

Velocity

Smoothed

Measured

0 1 2 3 4 5

0.01

0.02

0.03

Moving deviation

t (sec)

location deviation (m)

σ x

σ y

0 1 2 3 4 5

0.1

0.2

0.3

0.4

t (sec)

velocity deviation (m/s)

Moving deviation

σ v

Figure 10: The smoothed position and velocity of the

marker tracking. Above: Measured and smoothed signals.

The current tracking brings large procedural noise compo-

nent to velocity. Below: The moving deviation estimates.

Only one camera view was used in this demonstration.

speed resistance of swimming. The model will be im-

proved later e.g. using Gaussian process formulation

instead of Kalman, adding more biomechanical au-

thentity to the model and physically interesting latent

forces, see e.g. (Hartikainen et al., 2012):

m = I

× 60 kg (mass)

c = d11.7 16.3c kg/s (dampening) (13)

k =



0 1

1 2



× 0.05 kg/s

(spring)

The average error (std.) of the procedure is at the mo-

ment ∆x = 0.018 m, ∆y = 0.008 m, ∆v

= 0.2 m/s,

∆v

= 0.2 m/s for positions and velocities along x and

y axes. Improvements will be made by applying a bet-

ter tracking method less sensitive to bubbles.

Kalman smoothing is a post-processing task, too.

8 SYSTEM PERFORMANCE

A principal objective of the system is to provide im-

mediate trainer feedback. To achieve that, the imple-

mentation is based on the following principles:

• The frames are processed and tracked in their

uncalibrated shape (containing all camera distor-

tions, refraction etc.). The physical meaning for

the tracking signal can be attached only after the

geometric mapping, which is a postprocessing

step, see Fig. 2.

• The tracked area is limited manually, see the high-

lighted area in Fig. 8. Usually swimmers occupy

only a narrow band on the screen. Limiting the

tracking to this area of interest reduces compu-

tation signiﬁcantly. At the moment, the tracking

box (see Fig. 8) is selected manually, but there

ICPRAM 2016 - International Conference on Pattern Recognition Applications and Methods

464

will be few prerecorded tracking areas for differ-

ent swimming styles in the future.

• The dissimilarity measure of Eq. 11 is also based

on the raw pixel information. The formula is com-

putationally cheap, and it has to be evaluated on

a separate processor once when swimmer passes

a camera. Full real-time indicator will be imple-

mented when a second computer and a monitor

will be added to the system.

• The geometric mapping of video images uses re-

duced quality to deliver real-time performance.

• Seamless (combined from 3 cameras) geomet-

rically accurate visualizations like video, stroke

regularity indication and physical speed analysis

are all left as a post-processing step. At this phase,

the geometric mapping has been done to images

and data already.

9 CONCLUSIONS

We have presented a simple video-based swimming

analysis system which is easy to install, is of low cost

and is simple to calibrate without any technical assis-

tance. It can be installed to a wide variety of pool

types. It is maintenance free and based on our experi-

ence so far, it can be operated by one person only. In

ordinary use no technical assistance is needed.

The proposed system provides swimming speed

analysis and instant visual feedback. The system is a

good basis for further expansion e.g. with swimming

gait analysis, biomechanical modeling etc.

The current system can be easily upgraded by a

fourth camera at the location indicated by a grey cir-

cle in Fig. 1. The video monitoring would then span

whole the pool length. A second video screen will be

added in the future to serve the athletes better.

There are many off-the-shelf analysis systems

with a wide spectrum of functionality available today.

Usually these systems are much more complex and

expensive than one presented here. Our choice was to

implement the real-time pixel trace of the marker and

swimming cycle regularity visualization.

The tracking system needs to be improved in the

near future. At the moment it falls off-the-track too

often, especially when a hand moment occludes the

already lost marker.

The current system has been used by Finnish na-

tional swimming teams both on senior and junior level

since autumn 2014. Automated tracking has made it

possible to give faster and more accurate feedback to

athletes. Thus it has been possible to test a large num-

ber of athletes in relatively short time during national

team camps, when previously only a few of the top

swimmers were able to get the service due to time in-

vestment required using the older version of the sys-

tem. According to national team coach the system

has been a major asset in developing technical skills

of national team athletes. The ﬁndings have also been

used in national coaches’ education to provide insight

into swimming performance.

The proposed direct planar calibration method

used is aimed for efﬁcient real-time video stream

transformation. The efﬁciency is possible due to the

restriction to 2D tracking plane projection only. There

is potential for the same formulation to be generalized

for 3D motion capture at the overlapping view zones

(2 × 2 m at the current system, 3 × 2 m after one cam-

era will be added). The proposed calibration method

may be of use in other applications where conditions

in camera placement rule the stereo-calibration out

and where planar observations sufﬁce.

The most important future goals are a reli-

able markerless tracking and implementing a record

database with automated input from the site and a sup-

port for rudimentary searches and comparisons.

The swimming gait registration based on the pro-

ﬁle shape of the body of the swimmer is a potential

development.

Automated detection of different phases of the

swimming performance remains the last goal. It is

the hardest since there are a lot of different swimming

styles each with somewhat differing phases, and fe-

male and male swimming costumes differ.

ACKNOWLEDGEMENTS

The project is a joint venture of University of Turku

IT department and Sports Academy of Turku region

and it has been funded by city of Turku, National

Olympic Committee, Finnish Swimming Federation,

Urheiluopistos

ati

o and University of Turku.

REFERENCES

Bouguet, J. Y. (2008). Camera calibration toolbox for mat-

lab.

Bradski, G. (2000). Opencv. Dr. Dobb’s Journal of Software

Tools.

Ceseracciu (2011). New frontiers of markerless motion cap-

ture: application to swim biomechanics and gait anal-

ysis. PhD thesis, Padova University.

Chum, O., Pajdla, T., and Sturm, P. (2005). The geometric

error for homographies. Comput. Vis. Image Underst.,

97(1):86–102.

Video based Swimming Analysis for Fast Feedback

465

Dadashi, F., Millet, G., and Aminian, K. (2013). Iner-

tial measurement unit and biomechanical analysis of

swimming: an update. Sportmedizin, 61:21–26.

Dartﬁsh (2011-2015). Dartﬁsh video analysis tool.

http://www.sportmanitoba.ca/page.php?id=116.

D’Errico, J. (2006). Surface ﬁtting using gridﬁt. Technical

report, MATLAB Central File Exchange.

Haner, S., Sv

arm, L., Ask, E., and Heyden, A. (2015).

Joint under and over water calibration of a swimmer

tracking system. In Proceedings of the International

Conference on Pattern Recognition Applications and

Methods, pages 142–149. ScitePress.

Hartikainen, J., Sepp

anen, M., and S

arkk

a, S. (2012). State-

space inference for non-linear latent force models with

application to satellite orbit prediction. CoRR.

Heikkil

a, J. and Silven, O. (1997). A four-step camera cal-

ibration procedure with implicit image correction. In

Proc. IEEE Conference on Computer Vision and Pat-

tern Recognition, pages 1106–1112.

J. Hartikainen and, A. S. and S

arkk

a, S. (2011). Optimal

ﬁltering with kalman ﬁlters and smoothers, a manual

for the matlab toolbox ekf/ukf. Technical report, Dept.

of Biomedical Eng. and Comp.Sci., Aalto University

School of Science.

James, D. A., Burkett, B., and Thiel, D. V. (2011). An unob-

trusive swimming monitoring system for recreational

and elite performance monitoring. In Procedia Engi-

neering, 5th Asia-Paciﬁc Congress on Sports Technol-

ogy (APCST), volume 13, pages 113–119.

Jean-Claude, C., editor (2003). Biomechanics and Medicine

in Swimming IX. IXth International World Symposium

on Biomechanics and Medicine in Swimming, Uni-

versit

e de Saint-Etienne.

Kirmizibayrak, J., Honorio, J., Xiaolong, J., Mark, R., and

Hahn, J. K. (2011). Digital analysis and visualization

of swimming motion. The International Journal of

Virtual Reality, 10(3):9–16.

Luo, H., Zhu, L., and Ding, H. (2006). Camera calibra-

tion with coplanar calibration board near parallel to

the imaging plane. Sensors and Actuators A: Physi-

cal, 132:480486.

Makoto, H. S., Kimura, M., Yaguchi, S., and Inamoto, N.

(2002). View interpolation of multiple cameras based

on projective geometry. In In: International Workshop

on Pattern Recognition and Understanding for Visual

Information.

Mullane, S. L., Justham, L. M., West, A. A., and Conway,

P. P. (2010). Design of an end-user centric information

interface from data-rich. In Procedia Engineering,

volume 2, pages 2713–2719. 8th Conference of the

International Sports Engineering Association (ISEA).

Pansiot, J., Lo, B., and Guang-Zhong, Y. (2010). Swimming

stroke kinematic analysis with bsn. In Body Sensor

Networks (BSN), 2010 International Conference on,

pages 153–158.

Siirtola, P., Laurinen, P., Roning, J., and Kinnunen, H.

(2011). Efﬁcient accelerometer-based swimming ex-

ercise tracking. In IEEE Symp. on Computational In-

telligence and Data Mining (CIDM), pages 156–161.

IEEE.

Sportsmotion (2011-2015). motion analysis system.

http://www.sportsmotion.com/.

Zhang, Z. (1999). Flexible camera calibration by viewing a

plane from unknown orientations. In in ICCV, pages

666–673.

Zhang, Z. (2000). A ﬂexible new technique for camera cali-

bration. In IEEE Transactions on Pattern Analysis and

Machine Intelligence, volume 22, page 13301334.

ICPRAM 2016 - International Conference on Pattern Recognition Applications and Methods

466