LOW LATENCY RECOGNITION AND REPRODUCTION OF

NATURAL GESTURE TRAJECTORIES

Ulf Großekath

ofer

, Amir Sadeghipour

, Thomas Lingner

Peter Meinicke

, Thomas Hermann

and Stefan Kopp

Center of Excellence Cognitive Interaction Technology (CITEC), Bielefeld University, Bielefeld, Germany

Department of Bioinformatics, Institute of Microbiology and Genetics, Georg-August-University, G

ottingen, Germany

Keywords:

Ordered means models, Time series prototyping, Time series reproduction.

Abstract:

In human-machine interaction scenarios, low latency recognition and reproduction is crucial for successful

communication. For reproduction of general gesture classes it is important to realize a representation that is

insensitive with respect to the variation of performer speciﬁc speed development along gesture trajectories.

Here, we present an approach to learning of speed-invariant gesture models that provide fast recognition and

convenient reproduction of gesture trajectories. We evaluate our gesture model with a data set comprising

520 examples for 48 gesture classes. The results indicate that the model is able to learn gestures from few

observations with high accuracy.

1 INTRODUCTION

In human-human interaction we ﬁnd that gestures be-

tween communicating people are tightly interweaved

and thereby successfully support and structure inter-

action. Obviously, the interactands make sense of

their observations incrementally and can foresee the

continuation and/or intervene and react themselves

gesturally without signiﬁcant delay. Such processing

scheme seems to be a crucial prerequisite for success-

ful communication, and gives rise to the question how

we can implement or optimize similar low-latency re-

sponses with machine learning approaches – partic-

ularly in case of the continuous multivariate observa-

tions that occur in body gestural communication. This

application becomes furthermore relevant as we wit-

ness a dramatic evolution in sensing technology over

the past years, starting with high-end time-of-ﬂight

cameras in general and continued with low-cost sys-

tems such as the Microsoft Kinect

, which promise

to make gestural communication available as standard

interface.

Gestures can be understood and represented as

multivariate state trajectories of joint/end effector

state over time, and their correct recognition and in-

terpretation are most relevant for multimodal dia-

logue systems. However, in addition to recognition, it

would also be most useful if the system could also re-

produce (or imitate) gestures, using one and the same

model. Particularly imitation is a behavior pattern ob-

served frequently in human-human interaction. Be-

yond communicative functions, gesture reproduction

is also needed if machines are to learn motion pat-

terns from example, thus allowing to command future

robots or agents by just showing an interaction. In

such a context, an abstraction of temporal variation of

gesture execution enables a speed-invariant modeling

and reproduction, and assures a most ﬂexible applica-

bility.

A common approach for the analysis of gesture

trajectories are hidden Markov models (HMMs) (Ra-

biner, 1989). HMMs provide good representation

properties for time series data and reach excellent re-

sults in various applications (Rabiner, 1989; Garrett

et al., 2003; Kellokumpu et al., 2005). In case of

gesture data, HMMs can not only represent gesture

classes but can also be used to generate new ges-

tures (Kuli

c et al., 2008; Kwon and Park, 2008; Wil-

son and Bobick, 1999). Furthermore, HMMs have

also been applied to imitation learning of body move-

ments (Calinon et al., 2010; Inamura et al., 2003;

Amit and Mataric, 2002). However, the training of

HMMs usually requires a large number of examples

which complicates their application in gesture learn-

ing. In particular, for estimation of transition proba-

bilities many observations are necessary. In general,

supervised learning techniques, such as support vec-

tor machines, can successfully be used with a much

154

Großekathöfer U., Sadeghipour A., Lingner T., Meinicke P., Hermann T. and Kopp S. (2012).

LOW LATENCY RECOGNITION AND REPRODUCTION OF NATURAL GESTURE TRAJECTORIES.

In Proceedings of the 1st International Conference on Pattern Recognition Applications and Methods, pages 154-161

DOI: 10.5220/0003770901540161

 SciTePress

smaller number of examples but they do not provide

a model for reproduction of gestures. Furthermore, a

rejection class or criterion would be difﬁcult to realize

with a merely discriminative learning approach.

We approach the problem of learning prototype

representations from few data examples in the con-

text of learning gestures, i.e., expressive wrist move-

ments executed in free space. We collected a data set

that contains 3-dimensional trajectories of the right

hand wrist for 48 gesture classes. For data anal-

ysis, we used a simpliﬁed, speed invariant genera-

tive model whose parameters are interpretable in data

space. Its model architecture is similar to the archi-

tecture known from HMMs, but does not include any

transition probabilities. We conduct experiments re-

garding prototype and generalization properties for

gesture trajectories when only few examples are avail-

able.

2 SETUP AND DATA

Our setup is optimized towards imitation learn-

ing during human-agent interaction. It com-

prises a time-of-ﬂight camera, a marker-free track-

ing software and a humanoid virtual agent called

Vince (see Figure 1). The time-of-ﬂight camera

(a SwissRanger

SR4000

) captures the scene in 3d

at a frequency of ≈ 30 fps. The scene data are used by

the software iisu

2.0

to map a human skeleton on

the present user in the scene. We extract the relevant

information of the skeleton, such as the user’s height,

spatial positions of the wrists and the center of mass

to compute the normalized 3d positions of the wrists

with respect to the user’s body size. Within a body-

correspondence-solver submodule, the wrists’ posi-

tions are transformed (rotated and scaled) from the

coordinate system of the camera to egocentric space

of the virtual agent which stays face-to-face to the

human demonstrator. In the current study we focus

on the right wrist and record these data as time series

for each performed gesture. During data acquisition,

Vince imitates the subject’s right hand movements in

real time. In this way, the demonstrator receives vi-

sual feedback on how Vince would perform those ges-

tures. It is worth noting that the ambiguous position

of the elbow at each time step is not captured but com-

puted with the aid of inverse kinematic (Tolani et al.,

2000).

Overall, 520 examples from 48 different gesture

classes were captured in the format of 3d wrist move-

http://www.mesa-imaging.ch

http://www.softkinetic.net

ment trajectories with time stamps. Each trajec-

tory starts from and ends at the rest position of the

right hand, whereas the gestures were demonstrated

at different velocities and require an average execu-

tion of 4.75 seconds. The performed gestures ranged

from conventional communicative gestures (“wav-

ing”, “come here” and “calm down”) over iconic ges-

tures (“circle”, “spiky”, “square”, “surface” and “tri-

angle”) to deictic gestures (“pointing” to left, right

and upward). These gestures have been performed

as 48 different classes, each with respect to some of

the following variant features: size (e.g. small and

big circle), performing space (e.g. drawing a circle at

the right side or in front of oneself), direction (clock-

wise or counter-clockwise), orientation (horizontal or

vertical), repetition (repeating some subparts of the

movement, such as drawing a circle once or twice, or

swinging the hand for several times during waving).

The complete data set is available as supplementary

material

Here, we use a simpliﬁed gesture model to pro-

vide robust recognition and learning from few ex-

amples. The essential simpliﬁcation arises from a

speed invariant representation of gesture trajectories,

since the meaning and intention of most gestures are

independent of the temporal variation of execution

speed. Moreover, the ﬂuctuation of speed might lead

to an over-detailed representation which lacks sufﬁ-

cient generalization.

3 ORDERED MEANS MODELS

In order to learn speed invariant prototype representa-

tions, we use a specialized generative model which

we refer to as an ordered means model (OMM).

OMMs have been successfully applied to classiﬁca-

tion of time series data before (W

ohler et al., 2010;

Grosshauser et al., 2010). Similar to HMMs, OMMs

are generative state space models that emit a sequence

of observation vectors O = [o

, . . . , o

] out of K hid-

den states. As a distinguishing feature, OMMs do not

include any transition probabilities between states.

This leads to a simpliﬁed model architecture that in-

trinsically provides a speed invariant representation of

time series such as the gestures trajectories analyzed

in this study.

3.1 Model Architecture

In OMMs, the network of model states follows a left-

to-right topology, i.e. OMMs only allow transitions to

http://www.techfak.uni-bielefeld.de/ags/ami/publica-

tions/GSLMHK2012-LLR/

LOW LATENCY RECOGNITION AND REPRODUCTION OF NATURAL GESTURE TRAJECTORIES

155

Figure 1: In our setup, demonstrated right hand gestures are captured and preprocessed within the motion capture module

and the resulting 3d trajectories are stored as time series. OMM prototypes are computed from different demonstrations and

performed by the virtual agent Vince, as the result of the prototype learning process.

Figure 2: This ﬁgure shows screenshots from the gesture videos. The ﬁrst row shows video screenshots of a human demon-

strator during data acquisition. In the second row Vince, a virtual agent, performs the corresponding OMM prototype. The

gesture in these videos is from class ”waving head 2.5 swings”.

states with equal or higher indices as compared to the

current state. The emissions of each state k are mod-

eled as probability distributions b

(·) and are assumed

to be Gaussian with b

) = N (o

;µ

, σ). The stan-

dard deviation parameter σ is identical for all states

and is used as a global hyperparameter.

With regard to the above model architecture, an

OMM Ω is completely deﬁned by an ordered se-

quence of reference vectors Ω = [µ

, . . . , µ

], i.e. the

expectation values of the emission distributions b

(·).

3.2 Length Distribution

To provide a fully deﬁned generative model, OMMs

require the deﬁnition of an explicit length distribu-

tion P(T ) either by domain knowledge or by estima-

tion from the observed lengths in the training data.

This, however, may not be possible due to missing

knowledge or non-representative lengths of the ob-

servations. To circumvent the deﬁnition and estima-

tion of a length model we assume a ﬂat distribution

in terms of an improper prior with equally probable

lengths.

For a given length T, we deﬁne each valid

path q

= q

. . . q

through the model to be equally

likely:

P(q

|Ω) =



· P(T ) if q

≤ q

≤ .. ≤ q

0 else

(1)

where M

is the number of valid paths for a time se-

ries of length T through a K-state model:

= |{q

: q

≤ q

≤ ··· ≤ q

}| (2)



K + T − 1



. (3)

Since all paths are equally likely, there is no equiva-

lent realization in terms of transition probabilities in

HMMs.

3.3 State Duration Probabilities

The absence of state transition probabilities leads

to modiﬁed state duration probabilities in OMMs.

The state duration probabilities of HMMs depend on

the transition probabilities and are geometrically dis-

tributed. In OMMs, the probability P

(τ) to stay τ

time steps in state k depends on the sequence length

ICPRAM 2012 - International Conference on Pattern Recognition Applications and Methods

156

T and the number of model states K. Considering

the combinatorics of the path generation process (see

Eq. 1 and Eq. 2), the duration probability distributions

of OMMs follow

(τ) =



T +K−2−τ

K−2





T +K−1

K−1



. (4)

Note that for OMMs the state duration probabilities

depend on T , the length of the analyzed time series

examples and, therefore, varies for times series of dif-

ferent length. This is also the reason why there exists

no equivalent realization of such an modeling in terms

of transition probabilities in HMMs.

3.4 Parameter Estimation

In order to estimate particular model parameters

[µ

, . . . , µ

] by a set of observations O = {O

, .., O

}

we maximize the log-likelihood

L =

∑

i=1

ln p(O

|Ω) (5)

with respect to the mean vectors µ

To solve this optimization problem, we use an it-

erative expectation maximization algorithm (Demp-

ster et al., 1977), similar to the well-known Baum-

Welch algorithm from HMMs (Rabiner, 1989). First,

we compute the so-called responsibilities

i,k,t

p(O

, q

= k|Ω)

p(O

|Ω)

(E-step) (6)

and then re-estimates the model parameters according

∑

i=1

∑

t=1

i,k,t

· o

i,t

∑

i=1

∑

t=1

i,k,t

(M-step). (7)

These steps are repeated until convergence.

3.5 Efﬁcient Computation of

Production Likelihoods and

Responsibilities

To compute the production likelihoods p(O

|Ω) and

the responsibilities (Eq. 6) in a computationally ef-

ﬁcient way, we use a dynamic programming solution

that is similar to the forward-backward algorithm (Ra-

biner, 1989) known from HMMs, but only omit tran-

sition probabilities.

We deﬁne the forward variable according to

i,k,t

∝ p(o

i,1

. . . o

i,t

≤ k, Ω). (8)

Since α

i,k,t

depends only on the variable of the previ-

ous state k − 1 and of the previous point in time t −1,

this yields a fast dynamic programming solution:

i,k,t

= α

i,k,t−1

· b

i,t

) + α

i,k−1,t

(9)

that is initialized with α

i,k,0

= 1, and α

i,0,t

= 0. Simi-

larly, we compute the backward variable

i,k,t

= β

i,k,t+1

· b

i,t

) + β

i,k+1,t

(10)

∝ p(o

i,t

..o

i,t

≥ k, Ω). (11)

by means of recursion, initialized with β

i,k,T +1

= 1

and β

i,K+1,t

= 0.

The production likelihood then is

p(O

|Ω) = α

i,K,T

= β

i,1,1

(12)

and the responsibilities can be computed by

i,k,t

i,k,t−1

· b

i,t

) · β

i,k,t+1

i,K,T

. (13)

3.6 Classiﬁcation

To use OMMs for classiﬁcation, i.e. to assign an

unseen gesture trajectory to one of J classes, J

class-speciﬁc models Ω are ﬁrst estimated from the

data. Assuming equal prior probabilities, an un-

known gesture O then is assigned to the class associ-

ated with the model that yields the highest production

likelihood p(O|Ω

) of all models.

To extend the proposed system to classiﬁcation in

a continuous gesture trajectories stream, some exten-

sions would be necessary, e.g. detection of beginning

of gestures, a rejection scheme in case a user does not

perform a gesture, etc. A common approach is to par-

tition the data stream via a sliding window and reject

gestures by thresholds on the posteriori probabilities.

3.7 Prototype Property

An OMM Ω is completely represented by an ordered

sequence of reference vectors Ω = [µ

, . . . , µ

], which

correspond to the expectation values of the emission

distributions b

(·). Since the expectation values are

elements of the same data space as the observed data

examples, the series of reference vectors is fully in-

terpretable as a time series prototype in data space.

4 EXPERIMENTS

In order to evaluate OMMs for learning of speed-

invariant gesture prototypes from few data, we de-

signed an experimental setup to investigate the fol-

lowing research questions:

LOW LATENCY RECOGNITION AND REPRODUCTION OF NATURAL GESTURE TRAJECTORIES

157

−1.0

−0.5

0.0

0.5

1.0

−1.0

−0.5

0.0

0.5

1.0

−1.0

−0.5

0.0

0.5

1.0

−1.0

−0.5

0.0

0.5

1.0

−1.0

−0.5

0.0

0.5

1.0

−1.0

−0.5

0.0

0.5

1.0

−1.0

−0.5

0.0

0.5

1.0

−1.0

−0.5

0.0

0.5

1.0

−1.0

−0.5

0.0

0.5

1.0

−1.0

−0.5

0.0

0.5

1.0

−1.0

−0.5

0.0

0.5

1.0

−1.0

−0.5

0.0

0.5

1.0

(a) example gestures

trajectory

−1

1 2 3 4 5 6 7 8 9 10

number of states

−1

(b) K = 10 (c) K = 30 (d) K = 210

Figure 3: This ﬁgure shows plots of a gesture trajectory together with corresponding OMM prototypes trained with different

values for the number of model states K, as 3-dimensional plots (ﬁrst row) and as x-, y-, z-location coordinates varying in

time (second row). In column one, a randomly selected gesture from class “circle, big, front, clockwise, vertical, one time” is

plotted, columns two to four show the corresponding OMM prototypes with 10, 30, and 210 model states.

1. Do the learned OMM parameters provide an in-

terpretable prototype for a set of gestures?

2. How do these prototypes perform in terms of gen-

eralization accuracy, even if only few training data

are available?

3. What inﬂuence does the number of model states

in OMMs have on the classiﬁcation accuracy and

the computational demands?

To address the ﬁrst question, we trained an OMM for

each gesture class and examined the resulting model

parameters. Subsequently, we let the virtual agent

Vince execute the trained prototypes and captured

these executions on video (cf. supplementary mate-

rial).

In order to address the second question, we com-

pared OMM classiﬁers to a standard classiﬁcation

technique in terms of classiﬁcation accuracy and run-

ning times on artiﬁcially reduced training data sets.

For comparison, we chose nearest neighbor classi-

ﬁers based on a dynamic time warping (Chiba and

Sakoe, 1978) distance function (NN

DTW

). We eval-

uated both classiﬁers with subsets from the training

data set, whereby the amount of training data per class

ranged from one to seven examples. Additionally, we

conducted classiﬁcation experiments with all avail-

able training data. To obtain the ﬁnal error rate, we

Figure 4: Dependency between the number of OMM states

K, the emission distribution parameter σ and the area-

under-curve rate in leave-one-out evaluation on the training

data set.

applied the resulting classiﬁers to the dedicated test

data set.

We evaluated the inﬂuence of the number of

model states K in a similar way. We trained classi-

ﬁers with a reduced number of model states K and

number of training examples and tested their general-

ization capabilities with the complete test data set.

For all experiments, we partitioned the data set

into a training set (369 example gestures) and a test

set (the remaining 151 examples). All data was nor-

malized to zero mean and unit variance according to

the training data. We identiﬁed optimal OMM hy-

ICPRAM 2012 - International Conference on Pattern Recognition Applications and Methods

158

(a) Classiﬁcation accuracy of test data set. (b) Average running time for classiﬁcation of one unseen gesture.

Figure 5: This ﬁgure shows the accuracy (Fig. (a)) and average running times (Fig. (b)) for single trial classiﬁcation of unseen

gestures for NN

DTW

and OMM classiﬁer depending on the number of training examples per class. For OMM classiﬁer,

different values for the number of model states K ∈ {10, 50, 210} are plotted.

perparameters K and σ by means of leave-one-out

cross validation on the training data set. As the cri-

terion for hyperparameter optimization, we chose the

area under the receiver operating curve (AUC). We

trained a model with the training data of one class

(except the one left out). We then used the left out

gesture as a positive and the gesture data from the re-

maining classes as negative examples for AUC analy-

sis. We chose eleven equidistant values for the num-

ber of OMM states K ∈ {10, 30, . . . , 230}, the set

of values for the standard deviation parameter was

σ ∈ {2 · 0.75

|y = 1, . . . , 10}. We initialized the iter-

ative model estimation scheme of OMMs with ran-

domized deterministic assignments, i.e. a randomly

selected combination of model states for each train-

ing example, of all training data to the model states.

The resulting random paths are forced to follow the

restrictions induced by the left-to-right model topol-

ogy.

We measured the running time in terms of the sin-

gle core CPU time on an Intel Xeon CPU with 2.5

GHz. The OMM and NN

DTW

algorithms used in

this study were implemented in the Python program-

ming language. Time-critical parts such as the dy-

namic programming code were realized using the C

programming language. We provide an OMM Python

package as well as the complete OMM source code

for download as supplementary material.

5 RESULTS AND DISCUSSION

Figure 3 shows graphical representations of a ran-

domly selected gesture from class “circle, big, front,

clockwise, vertical, one time” together with plots of

corresponding OMM prototypes with different val-

ues for the number of model states K. The ﬁrst

row shows the data as 3-dimensional plots and the

second row shows the same data as location coordi-

nates developing over time. For the prototypes, we

chose three different values for the number of states

K ∈ {10, 30, 210}. As standard deviation parameter,

we chose σ = 0.84375, the value that reached the

highest AUC value of ≈ 0.95 in leave-one-out vali-

dation.

The plots indicate a clear correspondence between

an underlying gesture class and the learned OMM pa-

rameters, and it is obvious that OMMs are able to ex-

tract prototype representations of the gestures. Even

a prototype with K = 10 model states reveals a tra-

jectory that is similar to the genuine circle gesture.

For K = 30 the plot of the prototype fully represents

a circle gesture that, in comparison to a model with

K = 210 states, only differs in length.

To underline the abstraction capacity, we executed

all gesture prototypes with our virtual agent Vince. In

the supplementary material, we attached a video that

contains example gestures from all 48 gesture classes,

and recordings of the virtual agent performing the re-

lated gesture prototypes. Additionally, ﬁgure 2 shows

screenshots of these video recordings. The ﬁrst row

shows video screenshots from a human demonstra-

tor performing a gestures during data acquisition, the

screenshots in the second row show how the virtual

agent Vince is executing the learned OMM parame-

ters from the matching class. In this video the demon-

strator and Vince are performing the gesture “waving

head 2.5 swings”.

In general, both results—the examination of the

learned prototype as well as the videos of Vince who

executes these prototype gestures—indicate that the

speed-invariant architecture of OMMs is able to de-

duce essential gesture features from a set of example

trajectories. However, some videos (e.g. all “come”

and “surface” prototypes) suggest that using only a

limited body model, i.e. the right hand wrist, might

not be adequate to fully reproduce a gesture. E.g.,

LOW LATENCY RECOGNITION AND REPRODUCTION OF NATURAL GESTURE TRAJECTORIES

159

Vince’s performance of the “come” prototype lacks

the orientation of his hand palm. Even though the

plain hand wrist trajectory matches the subject’s hand

wrist trajectory, the incorrectly oriented palm might

make it difﬁcult for a human user to identify the in-

tended gesture. Presumably, a more detailed body

model, as e.g. in (Bergmann and Kopp, 2009), would

improve the prototype representation. In contrast,

other gesture classes are sufﬁciently represented only

by hand wrist trajectories, e.g., Vinces performances

of all waving related classes are easy to comprehend.

Figure 4 illustrates the dependency between the

hyperparameters (K, σ) and the leave-one-out ac-

curacy in terms of AUC rates. This ﬁgure clearly

demonstrate that the accuracy remains stable and is

almost independent of the number of states K and the

value of emission distribution parameter σ. Only for

values of σ < 0.2 or K = 10 the AUC accuracy sub-

stantially decreases.

Figure 5(a) shows the performance results of our

evaluation in terms of classiﬁcation accuracy on the

test set depending on the number of training exam-

ples per class for different classiﬁers. These in-

clude OMMs with a high number of states according

to maximum classiﬁcation accuracy (K = 210) and

OMMs with a reduced number of states (K = 10, 50).

In general, all classiﬁers are able to recognize un-

seen gesture trajectories with high accuracy, although

OMM classiﬁer with 10 model states reach substan-

tially lower accuracy. Using all training examples,

DTW

as well OMMs classify gestures with high ac-

curacy of ≈ 0.94, although the performance for OMM

classiﬁers with only K = 10 is noticeably lower. The

plot also shows that a reduction of the number of

training examples does not substantially reduce the

classiﬁcation accuracy. Only for OMMs with a low

number of states, a degradation can be observed be-

low three examples.

The slightly higher recognition performance of

DTW

classiﬁers comes at the cost of substantially

increased computational demands. In the scenario

with all available training data, OMMs classiﬁers pro-

vide an average speed-up factor of at least ≈ 3. For

decreasing number of models states the speed-up fac-

tor increases once more. OMM classiﬁer with K = 50

respond in ≈ 0.14 seconds, with K = 10 OMM clas-

sify an unseen gesture in 0.04 seconds. In comparison

to the average classiﬁcation times of NN

DTW

this is an

acceleration between 3 and 44 times. This allows low-

latency recognition of gesture performances which is

a requirement for interaction with humans.

6 CONCLUSIONS

We applied ordered means models (OMMs) to recog-

nize and reproduce natural gesture trajectories. The

results from our classiﬁcation experiment show that

OMMs are able to learn gestures from multivariate

times series even if only few observations are avail-

able. Furthermore, our run time measurements indi-

cate, that OMMs are well suited for low latency ges-

ture recognition. Even though more complex mod-

els and methods might further increase the recogni-

tion performance, in particular in human computer

interaction scenarios the response time is crucial.

Here, OMMs are able to provide a suitable trade-

off between accuracy and computational demands.

We showed that OMMs with few model states can

still reach competitive accuracy indices while consid-

erably decreasing computational demands to ensure

low latency capability. The combination of abstract-

ing and reproducing prototypical gesture trajectories,

the achievable response times, and the high recogni-

tion accuracy even for small training data sets makes

OMMs an ideal method for human computer interac-

tion.

In our ongoing research we focus on the auto-

matic optimization of classiﬁcation in online use on

continuous interaction streams. Additionally, we are

working on a porting the gesture tracking system

to Microsofts Kinect

. To further improve dis-

crimination performance in supervised setups, future

work in this context will include the use of Fisher

kernels (Jaakkola et al., 1999), which are straight-

forward to derive from OMMs.

ACKNOWLEDGEMENTS

This work is supported by the German Research

Foundation (DFG) in the Center of Excellence for

Cognitive Interaction Technology (CITEC). Thomas

Lingner has been funded by the PostDoc program of

the German Academic Exchange Service (DAAD).

REFERENCES

Amit, R. and Mataric, M. (2002). Learning movement se-

quences from demonstration. In ICDL ’02: Proceed-

ings of the 2nd International Conference on Develop-

ment and Learning, pages 203–208, Cambridge, Mas-

sachusetts. MIT Press.

Bergmann, K. and Kopp, S. (2009). Gnetic – using bayesian

decision networks for iconic gesture generation. In

Proceedings of the 9th Conference on Intelligent Vir-

tual Agents, pages 76–89. Springer.

ICPRAM 2012 - International Conference on Pattern Recognition Applications and Methods

160

Calinon, S., D’halluin, F., Sauser, E., Caldwell, D., and Bil-

lard, A. (2010). Learning and reproduction of gestures

by imitation. Robotics Automation Magazine, IEEE,

17(2):44 –54.

Chiba, S. and Sakoe, H. (1978). Dynamic programming

algorithm optimization for spoken word recognition.

IEEE Transactions on Acoustics, Speech and Signal

Processing, 26(1):43.

Dempster, A. P., Laird, N. M., and Rubin, D. B. (1977).

Maximum likelihood from incomplete data via the em

algorithm. Journal of the Royal Statistical Society, Se-

ries B, 39(1):1–38.

Garrett, D., Peterson, D., Anderson, C., and Thaut, M.

(2003). Comparison of linear, nonlinear, and fea-

ture selection methods for EEG signal classiﬁcation.

Neural Systems and Rehabilitation Engineering, IEEE

Transactions on, 11(2):141–144.

Grosshauser, T., Großekath

ofer, U., and Hermann, T.

(2010). New sensors and pattern recognition tech-

niques for string instruments. In International Con-

ference on New Interfaces for Musical Expression,

NIME2010, Sydney, Australia.

Inamura, T., Toshima, I., and Nakamura, Y. (2003). Ac-

quiring motion elements for bidirectional computation

of motion recognition and generation. In Siciliano,

B. and Dario, P., editors, Experimental Robotics VIII,

volume 5, pages 372–381. Springer-Verlag.

Jaakkola, T., Diekhaus, M., and Haussler, D. (1999). Us-

ing the ﬁsher kernel method to detect remote protein

homologies. Proceedings of the Seventh International

Conference on Intelligent Systems for Molecular Biol-

ogy, pages 149–158.

Kellokumpu, V., Pietik

ainen, M., and Heikkil

a, J. (2005).

Human activity recognition using sequences of pos-

tures. In Proceedings of the IAPR Conference on Ma-

chine Vision Applications (MVA 2005), Tsukuba Sci-

ence City, Japan, pages 570–573. Citeseer.

Kuli

c, D., Takano, W., and Nakamura, Y. (2008). Incre-

mental learning, clustering and hierarchy formation

of whole body motion patterns using adaptive hidden

markov chains. The International Journal of Robotics

Research, 27(7):761.

Kwon, J. and Park, F. (2008). Natural movement genera-

tion using hidden markov models and principal com-

ponents. Systems, Man, and Cybernetics, Part B: Cy-

bernetics, IEEE Transactions on, 38(5):1184–1194.

Rabiner, L. (1989). A tutorial on hidden markov models

and selected applications in speech recognition. Pro-

ceedings of the IEEE, 77(2):257–286.

Tolani, D., Goswami, A., and Badler, N. (2000). Real-

time inverse kinematics techniques for anthropomor-

phic limbs. Graphical models, 62(5):353–388.

Wilson, A. and Bobick, A. (1999). Parametric hidden

markov models for gesture recognition. Pattern Anal-

ysis and Machine Intelligence, IEEE Transactions on,

21(9):884–900.

ohler, N.-C., Großekath

ofer, U., Dierker, A., Hanheide,

M., Kopp, S., and Hermann, T. (2010). A calibration-

free head gesture recognition system with online capa-

bility. In International Conference on Pattern Recog-

nition, pages 3814–3817, Istanbul, Turkey. IEEE

Computer Society, IEEE Computer Society.

LOW LATENCY RECOGNITION AND REPRODUCTION OF NATURAL GESTURE TRAJECTORIES

161