Subjective Baggage-Weight Estimation from Gait: Can You Estimate

How Heavy the Person Feels?

Masaya Mizuno

, Yasutomo Kawanishi

2,1

, Tomohiro Fujita

, Daisuke Deguchi

and Hiroshi Murase

Graduate School of Informatics, Nagoya University, Chikusa-ku, Nagoya, Japan

RIKEN Guardian Robot Project, Souraku-gun, Kyoto, Japan

Keywords:

Subjective Baggage-Weight, G2SW (Gait to Subjective Weight), Graph Convolution, Multi-Task Learning.

Abstract:

We propose a new computer vision problem of subjective baggage-weight estimation by deﬁning the term sub-

jective weight as how heavy the person feels. We propose a method named G2SW (Gait to Subjective Weight),

which is based on the assumption that cues of the subjective weight appear in the human gait, described by

a 3D skeleton sequence. The method uses 3D locations and velocities of body joints as input and estimates

subjective weight using a Graph Convolutional Network. It also estimates human body weight as a sub-task

based on the assumption that the strength of a person depends on body weight. For the evaluation, we built a

dataset for subjective baggage-weight estimation, consisting of 3D skeleton sequences with subjective weight

annotations. We conﬁrmed that the subjective weight could be estimated from a human gait and also conﬁrmed

that the sub-task of body weight estimation pulls up the performance of the subjective weight estimation.

1 INTRODUCTION

Robots have been widely developed for various ap-

plications. Especially, in daily environments, vari-

ous kinds of human support robots have been pro-

posed (Yamamoto et al., 2019; Yuguchi et al., 2022).

Such a robot that works in our living space should

have a function of environmental recognition but also

provide proactive support. In this study, we focus on

such a support provided by a robot to a person carry-

ing heavy baggage.

Robots that can carry the baggage should be de-

veloped to support a person carrying heavy baggage.

Still, it is also important to develop a function that de-

termines whether to support a person. If the robot tries

to support a person who does not need the support, the

person may get irritated with the robot. In that case,

the person will not accept the robots. To avoid this sit-

uation, we focus on the functions that determine the

needs of the support and provide the support at an ap-

propriate time.

To make such decisions, the robots need to esti-

mate how heavy the person feels. In this study, we

deﬁne subjective weight as how heavy a person feels.

The greater the subjective weight, the more difﬁcult a

person to think carrying baggage. To quantify the sub-

jective weight, we employ the New Borg Scale (Gun-

nar, 1982), which is a measure of subjective load dur-

ing exercise. By estimating the subjective weight, the

Skeleton Sequence

of Walking

Subjective Weight

Very

Heavy

RGB Sequence

of Walking

Figure 1: Estimation of the subjective baggage-weights

from human gait.

Subjective weight 5

(Physical weight 15kg)

Subjective weight 10

(Physical weight 15kg)

Figure 2: Example of the gait with the baggage of same

physical weight.

robot will decides whether the person needs support.

To make the problem setting clearer and simpler, we

assume a situation where one person is walking with

one piece of baggage as shown in Fig. 1.

Since the subjective weight is related to the actual

weight of baggage (which is called physical weight),

physical weight can be used as a clue for subjective

Mizuno, M., Kawanishi, Y., Fujita, T., Deguchi, D. and Murase, H.

Subjective Baggage-Weight Estimation from Gait: Can You Estimate How Heavy the Person Feels?.

DOI: 10.5220/0011668900003417

In Proceedings of the 18th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP 2023) - Volume 5: VISAPP, pages

567-574

ISBN: 978-989-758-634-7; ISSN: 2184-4321

 2023 by SCITEPRESS – Science and Technology Publications, Lda. Under CC license (CC BY-NC-ND 4.0)

567

weight estimation. However, even if the baggage is

the same, the physical weight varies depending on

what is inside. Therefore, it is difﬁcult to estimate the

physical weight from the appearance of the baggage

itself. Additionally, if the physical weight is even the

same, the subjective weight varies from person to per-

son. For example, a piece of baggage of the same

physical weight may be felt heavy by a physically

weak person like a child while it may be felt light by

a physically strong person like a muscular person.

When we humans see someone walking with bag-

gage, we can probably guess how heavy the person

feels to carry the baggage from the walking behav-

ior, that is, human gait. Hyung et al. (Hyung et al.,

2016) reported the relationship between the physi-

cal weight of baggage and human gait, in which the

pelvic tilt increases when the physical weight of bag-

gage increases. Additionally, the paper also shows

that the relationship between pelvic tilt and physical

weight varies from person to person, which depends

on his/her body weight. As shown in Fig. 2, even

when the physical baggage-weights are the same, the

human gait are different when the subjective weight is

different. In this study, we propose a method that esti-

mates the subjective weights by focusing on a human

gait.

A human gait can be found in temporal varia-

tions of human skeleton sequences (Kato et al., 2017).

The temporal variations of the human skeleton are

often represented by the skeleton sequence, which

has been used in gait recognition, action recogni-

tion, etc. (Teepe et al., 2021; Liu et al., 2016; Yan

et al., 2018; Liu et al., 2020; Nishida et al., 2020;

Temuroglu et al., 2020). These methods use graph

representations for human skeleton sequences to re-

alize the recognition tasks. In this study, we use 3D

human skeleton sequences during walking as a repre-

sentation of human gait.

Based on the above, we propose G2SW (Gait to

Subjective Weight) that is a method for estimating the

subjective weight from the human gait represented by

a skeleton sequence (Fig. 1). We modify a graph-

based action recognition method to estimate the sub-

jective weight estimation.

We focus on the fact that the gait is the repetition

of two steps (one walking cycle) and regard 3D body

joint locations in one walking cycle to represent the

gait. While graph-based action recognition methods

usually accept a ﬁxed length input, we need to nor-

malize the length of walking cycles to a ﬁxed length,

by resampling frames in a walking cycle. However,

this result in the loss of information on gait speed,

since all gait cycles have the same length. For this

problem, we introduce velocities as additional feature

for each body joints. This locations-and-velocities

representation retains velocity information but has a

ﬁxed length.

As noted above, the subjective baggage-weight is

affected by body weight of the person. To take this

into account in the estimation, the proposed method

simultaneously estimates the body-weight of the per-

son as a sub-task in the training phase. By using

the sub-task, the network is trained to consider body

weight in the subjective baggage-weight estimation.

The main contributions of this work are summa-

rized as follows:

• We propose a new computer vision problem

of subjective baggage-weight estimation of a

piece of baggage, by deﬁning subjective baggage-

weight as how heavy a person feels, quantiﬁed by

the New Borg Scale (Gunnar, 1982).

• We propose G2SW, an estimation method of sub-

jective baggage-weight from the human gait. The

method uses velocity information as additional

feature, and body-weight estimation is added as

a sub-task to focus on the difference of persons.

• We built a novel dataset of human skeleton se-

quences with subjective baggage-weight annota-

tions.

The following section 2 presents the related work

in this literature. Section 3 presents the proposed

method that estimates the subjective weights. Then,

section 4 presents the experimental evaluation. Fi-

nally, section 5 summarizes and discusses future is-

sues.

2 RELATED WORK

2.1 Baggage Weight Estimation

Yamaguchi et al. (Yamaguchi et al., 2020) have pro-

posed a method to estimate baggage weight from

body sway. Body sway is the slight swaying of a per-

son’s body even when standing upright and stationary.

The method estimates the baggage weight by focus-

ing on the characteristic that the heavier the weight,

the greater the body sway. Because this method re-

quires observing a stationary standing person from a

bird’s-eye view, the method is not directly applicable

to a robotic application.

Oji et al. (Oji et al., 2018) have proposed a weight

estimation method from lifting motion. The method

estimates the weight of an object from a hand motion

by focusing on the fact that the hand motion changes

depending on the weight of the object when lifting an

VISAPP 2023 - 18th International Conference on Computer Vision Theory and Applications

568

object. However, it requires object lifting motion, its

applicable situation is limited.

2.2 Action Recognition by a Body

Skeleton Sequence

Long Short-Term Memory(LSTM), which can cap-

ture temporal information, is often used in action

recognition from a skeleton sequence (Liu et al.,

2016; Liu et al., 2017; Ullah et al., 2018; Majd and

Safabakhsh, 2020).

In recent years, the Graph Convolutional Network

(GCN), which consists of the graph convolution lay-

ers, has become the mainstream of action recognition.

It regards a skeleton as a graph. Generally, each body

joint and each limb are represented as a vertex and

an edge in a graph, respectively. ST-GCN (Yan et al.,

2018) is a method of action recognition from a skele-

ton sequence that considers the skeleton sequence

as a temporally-connected graph. This method ex-

tracts spatial features by applying graph convolution

for each frame, followed by temporal convolution for

each temporal sequence of a body joint to extract tem-

poral features. By the structure, the method can con-

sider the skeleton structure and motion in the action

recognition task.

There are several methods that extend the ST-

GCN. One of the extensions is the multi-scale di-

rection, represented by MS-G3D (Liu et al., 2020).

The main component of the method is the G3D mod-

ule, which is a graph convolution version of the I3D

module (Carreira and Zisserman, 2017). The module

consists of graph convolutions over a spatio-temporal

graph corresponding to a skeleton sequence. The

method further extends the module to multi-scale us-

ing multiple graphs of different multi-hop connec-

tions. The multi-hop connections of the graphs al-

low us to directly connect body joints that are skele-

tally distant from each other but are important for the

recognition task.

3 PROPOSED METHOD

3.1 Overview

This paper proposes a method for estimating the sub-

jective baggage-weight from a human gait, named

G2SW (Gait to Subjective Weight). In this study, we

use a 3D human skeleton sequence as a representa-

tion of human gait. A 3D human skeleton sequence is

a set of (X, Y, Z) coordinates of joint locations in the

world coordinate system. Here, (X

, Y

, Z

)

⊤

denotes

Subjective baggage-weight estimator training

Subjective baggage-weight

estimation

Training

Estimation

A location and

velocity graph

Subjective weight

Pre-processing

A 3D human

skeleton

sequence of one

walking cycle

+ Body weight

A location and

velocity graph

Pre-processing

A 3D human

skeleton

sequence of one

walking cycle

Figure 3: The training and estimation processes of the pro-

posed method.

the location of the j-th body joint in t-th frame. Ad-

ditionally, noting that walking is a repetition of two

steps, we deﬁne two steps as one cycle of walking

and use the 3D human skeleton sequence S

for one

cycle of walking as input.

In this section, we describe the detail of G2SW,

a regression-based method for estimating the subjec-

tive weight based on the human gait represented by

a 3D human skeleton sequence of one cycle of walk-

ing. Figure 3 shows the ﬂowchart of the training and

estimation steps of the proposed method. As a pre-

processing, the i-th 3D human skeleton sequence S

converted into a location and velocity graph

. Then

the location and velocity graph is input to the sub-

jective weight estimator (G2SW) to estimate the sub-

jective weight. For training the G2SW, we employ

multi-task learning where body weight estimation is a

sub-task.

In the following, ﬁrst, we deﬁne the subjective

weight in section 3.2. Then, the pre-processing for

the input is explained in section 3.3. The network ar-

chitecture and its multi-task learning are explained in

section 3.4.

3.2 Deﬁnition

In this study, we deﬁne subjective weight as how

heavy a person feels. We employ the New Borg

Scale (Gunnar, 1982) to quantify the subjective

weight. Originally, the New Borg Scale quantiﬁes

how hard the activity is as shown in Table 5. We use

the scale to quantify how heavy a person feels, and

the proposed method G2SW estimates the value of

the New Borg Scale. In the new Borg Scale, there are

scale values that do not have subjective descriptions.

In the subjective weight assessment, it is possible to

choose these values if participant feels that subjective

weight exists between these subjective descriptions.

Subjective Baggage-Weight Estimation from Gait: Can You Estimate How Heavy the Person Feels?

569

Very

Heavy

GCN-based feature

extractor

Fully-connected layers for

subjective weight estimation

G2SW: Gait to Subjective Weight

60kg

Fully-connected layers for

body weight estimation

Main task

Sub task

A location and velocity graph

Body weight

Subjective Weight

Figure 4: The network architecture of the proposed G2SW.

Scale Description

10 Very, very Hard

7 Very Hard

5 Hard

Scale Description

4 Somewhat Hard

3 Moderate

2 Light

1 Very Light

0.5 Ver, very Light

0 Nothing at all

Figure 5: New Borg Scale.

3.3 Pre-Processing

In the proposed method, a 3D human skeleton se-

quence of one walking cycle is assumed to be cropped

beforehand based on the frame where the positions of

the left and right legs are most distant.

Since the 3D human skeleton sequence of one

walking cycle captured by a sensor is described in the

world coordinate, it varies according to the location

and orientation of the person. The recognition should

be robust to the location and orientation. Also, the

lengths of walking cycles are different. For accurate

recognition, these variations should be normalized.

When simply sampling frames of a ﬁxed length

from a walking cycle, walking speed information will

be ignored. Therefore, we enhance the input skeleton

sequences by adding the velocity information of each

body joint.

Therefore, the pre-processing consists of i) loca-

tion and orientation normalization, ii) velocity calcu-

lation, and iii) frame sampling for the ﬁxed length.

After the pre-processing, an input 3D skeleton se-

quence will be normalized and enhanced. We named

the output of the pre-processing as location and ve-

locity graph.

I) Location and Orientation Normalization.

First, the location of the skeletons of each frame is

normalized by aligning the input skeletons so that

the position of the pelvis in each frame becomes the

origin (0, 0, 0)

⊤

. Also, the orientation is normalized

by rotating so that the locations of both hips are on

the X-Z plane where the horizontal plane is X-Y. By

this process, the locations of the j-th body joint in

t-th frame will be (X

, Y

, Z

)

⊤

II) Velocity Calculation. The velocity here is de-

ﬁned as the difference between the locations of each

body joint in adjacent frames.

(

)

⊤

= (X

, Y

, Z

)

⊤

− (X

t−1

, Y

t−1

, Z

t−1

)

⊤

(1)

The calculated velocity of each body joint is appended

to the corresponding body joint so that the j-th body

joint in the t-th frame has a 6-dimensional feature

, Y

, Z

)

⊤

III) Frame Sampling for the Fixed Length. Since

the length of one walking cycle is different among

3D human skeleton sequences, the length should be

ﬁxed to input them into a graph convolutional net-

work. M frames are sampled from the original se-

quence at approximately equivalent intervals by inter-

polating from adjacent frames.

3.4 The Proposed G2SW and Its

Multi-Task Training

In the proposed G2SW, subjective weight is estimated

from the location and velocity graph S

, which is pre-

processed output of the i-th 3D skeleton sequence of

one walking cycle S

The architecture of the proposed G2SW is shown

in Fig. 4. In the proposed G2SW, a feature representa-

tion is calculated using a GCN-based feature extractor

f as

= f (S

;θ

, A), (2)

VISAPP 2023 - 18th International Conference on Computer Vision Theory and Applications

570

where A denotes an adjacent matrix that deﬁnes the

adjacency of human body joints. This function f con-

sists of multiple graph convolution layers. In this

study, as the feature extraction function f , the two

consecutive blocks of the MS-G3D module (Liu et al.,

2020) are used. Here, parameters in the network are

represented by θ

. After the MS-G3D blocks, the

graph-shaped output is reshaped to a 1-dimensional

vector p

Then, subjective weight is calculated using fully-

connected layers g. At that time, body weight is also

calculated using fully-connected layers h, simultane-

ously.

= g(p

;θ

), (3)

= h(p

;θ

). (4)

These two functions g and h consist of four fully-

connected layers whose parameters are θ

and θ

, re-

spectively. Leaky ReLU (Maas et al., 2013) is used as

the activation function for the hidden layers.

Given a batch of S

and corresponding ground

truth of subjective weight and body weight (

the network is trained in multi-task learning manner.

The parameters θ

, θ

, and θ

are updated using back-

propagation to minimize the mean squared error of

the loss L consists of subjective weight loss L

and

body weight loss L

L = λ

+ λ

, (5)

∑

−

)

, (6)

∑

−

)

, (7)

where λ

and λ

are the weight of the loss. Here, the

ranges of subjective weight and body weight are nor-

malized to be a similar scale.

4 EVALUATION

4.1 Dataset

Because there are no publicly available datasets that

consist of 3D skeleton sequences with annotations of

the subjective baggage weights, we originally cap-

tured a dataset for the evaluation. This section de-

scribes the details of our dataset.

In this study, we assume a situation where one per-

son is walking with a piece of baggage. The 3D hu-

man skeleton sequences were collected by observing

each participant walking with a piece of baggage us-

ing a Microsoft Azure Kinect sensor installed from

a height of 2 m. The frame rate was 30 fps. Fig. 6

shows captured images and the 3D human skeleton

sequences of each type of baggage.

Since subjective weight may be affected by body

size and gender, the set of participants should not have

biases in body size and gender. We employed 30 par-

ticipants (15 males and 15 females) of diverse heights

and weights for the dataset. Figure 7 shows the distri-

bution of the participants’ heights and weights.

We prepared ﬁve types of baggage, consisting of a

handbag, shoulder bag, backpack, cardboard box, and

shopping basket , and we prepared six variations of

the contents weight of the baggage consisting of 0 kg,

5 kg, 7.5 kg, 10 kg, 12.5 kg, and 15 kg. The subjective

weights were annotated by a questionnaire survey to

the participants themselves. In the questionnaire, par-

ticipants scored how hard they felt after walking with

each baggage according to the New Borg Scale (Gun-

nar, 1982) (Table 5). To prevent the participants from

knowing the actual value of the physical weight, the

contents of the baggage were hidden from them.

In a session, a participant walked with a piece of

prepared baggage, and a short break was inserted after

each session to avoid the effect of the previous ses-

sion. In this experiment, 30 patterns (ﬁve baggage

types × six weights) of 3D human skeleton sequences

were captured for each subject.

All the participants consented to the use and dis-

closure of their captured data for research purposes. It

should be noted that the Ethics Committee at Nagoya

University has approved this experiment.

4.2 Evaluation Protocol and Metrics

In this experiment, we performed 5-fold cross-

validation that splits 5 people for evaluation and the

rest of 30 people for training from the dataset.

Because the total number of pre-processed 3D

human skeleton sequences in the dataset was only

24,015 walking cycles, data augmentation was ap-

plied. From an input 3D skeleton sequence of one

walking cycle, three frames are randomly dropped.

We performed this ten times for each walking cycle,

thus increasing the data volume to 240,150 walking

cycles. In the experiment, the frame length of a loca-

tion and velocity graph is set to M = 50 after this data

augmentation.

We evaluated the G2SW performance for the sub-

jective weight estimation for each type of baggage.

As an evaluation metric, we employ the mean abso-

lute error (MAE) of the estimation results.

MAE =

∑

i=1

−

|, (8)

where N represents the number of the 3D skeleton se-

quences.

Subjective Baggage-Weight Estimation from Gait: Can You Estimate How Heavy the Person Feels?

571

Shopping basket

Physical weight

0kg

Subjective weight

Handbag

Physical weight

15kg

Subjective weight

Shoulder Bag

Physical weight

7.5kg

Subjective weight

Backpack

Physical weight

10kg

Subjective weight

Cardboard Box

Physical weight

12.5kg

Subjective weight

Figure 6: Examples of captured images and 3D human skeletons of each type of baggage.

Figure 7: The distribution of the participants’ heights (cm)

and weights (kg) (blue:male, red:female).

Figure 8: A plot of body weight Estimation (blue points:

estimation plot of each person, blue line: linear ﬁtting result

of estimation plot, red line: grand truth (the unit of body

weight is kg)).

In terms of the application for deciding whether

to support a person, we also evaluated the perfor-

mance of estimation within a tolerance error thresh-

old, named Tolerance Accuracy (TA).

= 100

, (9)

where τ is the tolerance error threshold, and NW

rep-

resents the number of data within the estimation error

τ.

4.3 Preliminary Experiment: Body

Weight Estimation

In this study, body weight estimation was performed

as a sub-task of G2SW. However, if the accuracy of

weight estimation from skeleton features is low, it is

Grand Truth

7.0

Estimated

6.98

Grand Truth

Estimated

4.95

Figure 9: Example of estimation results (skeleton sequence

represents gait).

inappropriate to use body weight estimation as a sub-

task. Therefore, we conﬁrmed whether it is possible

to estimate the body weight from skeleton features.

Figure 8 shows the plots of the mean of each person’s

weight estimate in the test data and the linear ﬁtting

result. From this, it can be seen that the weight es-

timates are correlated with the true values. From the

above ﬁnding, we can say that the estimation of the

body weight can be estimated from a 3D skeleton se-

quence of one walking cycle.

4.4 Main Experiment: Subjective

Baggage-Weight Estimation

Table 1 shows the mean absolute errors of the sub-

jective weight estimation and Tolerance Accuracy of

τ = 1, 2, and 3. Figure 9 shows an example of the

estimation results. From Table 1, we conﬁrmed that

G2SW could estimate subjective weights with the

mean absolute error of 1.33 in New Borg Scale as the

average of the entire baggage. And, G2SW could esti-

mate 50.6% in Tolerance Accuracy with τ = 1(TA

74.7% in Tolerance Accuracy with τ = 2(TA

), and

88.4% in Tolerance Accuracy with τ = 3(TA

) as the

average of the entire baggage.

4.5 Discussion

Through the experiments, we conﬁrmed that G2SW

can estimate subjective weight.

VISAPP 2023 - 18th International Conference on Computer Vision Theory and Applications

572

Table 1: G2SW’s Evaluations for subjective baggage-weight (subjective weight: 0–10).

Type of Baggage MSE↓ TA

↑ TA

↑

Handbag 1.42 46.6% 71.1% 87.0%

Shoulder bag 1.09 57.8% 81.6% 93.6%

Backpack 1.38 49.5% 73.8% 87.1%

Cardboard box 1.44 47.0% 72.5% 86.7%

Shopping basket 1.32 52.2% 74.8% 88.0%

Average 1.33 50.6% 74.7% 88.4%

Table 2: Comparison of subjective weight estimation accuracy with and without velocity information.

MSE↓ TA

↑ TA

↑

with velocity 1.23 50.6% 74.7% 88.4%

without velocity 1.42 47.8% 72.2% 86.6%

4.5.1 Difference Among Baggage Types

Table 1 conﬁrmed that the subjective weight estima-

tion for carrying a shoulder bag is more accurate than

that for carrying other baggage. This is because the

degree of postural change caused by subjective weight

tends to be larger when carrying a shoulder bag than

other baggages.

A possible cause of greater postural change due to

subjective weight is gait stability. The more unstable

the gait, the more likely the posture changes by exter-

nal factors such as the weight of baggage. A hand-

bag, shoulder bag, and shopping basket are held with

only one shoulder or one hand, making gait unstable,

while backpacks and cardboard boxes are held with

both shoulders or hands, making walking more stable

relatively.

Another cause of greater postural change due to

subjective weight is the distance between the baggage

and the human center of gravity. When the distance

is large, the human needs to change his posture more

signiﬁcantly to maintain balance than when the dis-

tance is shorter. Among the baggage which cause un-

stable gait, the shoulder bag has the furthest distance

to the human center of gravity. Therefore, the human

needs to change his posture larger when carrying a

shoulder bag than the rest baggage.

4.5.2 Effectiveness of the Velocity Feature

In the method, we propose the location and velocity

graph to preserve velocity information in the ﬁxed se-

quence length of one cycle walking. To conﬁrm the

effectiveness of the additional velocity features as in-

put, we compared our method with a method that did

not uses velocity information. Table 2 shows a com-

parison of subjective weight estimation accuracy with

and without velocity information. From the table, it

was conﬁrmed that the accuracy of subjective weight

estimation was improved by using the velocity infor-

mation as additional information.

4.5.3 Challenges of the Practical Use

In the proposed method G2SW, the estimation is per-

formed on the sequence for one cycle of walking

cropped from the sequence during walking; however,

in reality, several walking cycles are obtained from

a captured sequence of walking. Therefore, multiple

estimation results are obtained for a single sequence

during walking. In the future, it will be necessary to

consider how to integrate the multiple estimation re-

sults obtained.

5 CONCLUSION

In this study, we proposed a new computer vision

problem of the subjective baggage-weight estimation

when a person is walking with a piece of baggage and

established G2SW which is an estimation method for

subjective weights. To quantify the subjective weight,

we deﬁned it using the New Borg Scale. Since subjec-

tive weights affect the human gait, we proposed a sub-

jective weight estimation method from a human gait,

represented by a 3D human skeleton sequence. The

method uses locations and velocities of body joints as

input and estimates human body weight as a sub-task

based on the assumption strength of a person depends

on body weight.

Future work includes a further update of the

gait representation describing the motion of skeletons

more effectively.

Subjective Baggage-Weight Estimation from Gait: Can You Estimate How Heavy the Person Feels?

573

ACKNOWLEDGEMENTS

This work was supported in part by the JSPS Grant-

in-Aid for Scientiﬁc Research (17H00745).

REFERENCES

Carreira, J. and Zisserman, A. (2017). Quo Vadis, action

recognition? a new model and the kinetics dataset. In

Proceedings of the 2017 IEEE Conference on Com-

puter Vision and Pattern Recognition, pages 4724–

4733, Honolulu, HI. IEEE.

Gunnar, B. (1982). Psychophysical bases of perceived ex-

ertion. Medicine & Science in Sports & Exercise,

14(5):377–381.

Hyung, E.-J., Lee, H.-O., and Kwon, Y.-J. (2016). Inﬂu-

ence of load and carrying method on gait, speciﬁcally

pelvic movement. Journal of Physical Therapy Sci-

ence, 28(7):2059–2062.

Kato, H., Hirayama, T., Kawanishi, Y., Doman, K., Ide, I.,

Deguchi, D., and Murase, H. (2017). Toward describ-

ing human gaits by onomatopoeias. In Proceedings of

the 16th IEEE International Conference on Computer

Vision Workshops, pages 1573–1580. IEEE.

Liu, J., Shahroudy, A., Xu, D., and Wang, G. (2016).

Spatio-temporal LSTM with trust gates for 3D human

action recognition. In Leibe, B., Matas, J., Sebe, N.,

and Welling, M., editors, Computer Vision – ECCV

2016, volume 9907, pages 816–833. Springer Interna-

tional Publishing.

Liu, J., Wang, G., Hu, P., Duan, L.-Y., and Kot, A. C.

(2017). Global context-aware attention LSTM net-

works for 3D action recognition. In Proceedings of

the 2017 IEEE Conference on Computer Vision and

Pattern Recognition, pages 3671–3680, Honolulu, HI.

IEEE.

Liu, Z., Zhang, H., Chen, Z., Wang, Z., and Ouyang, W.

(2020). Disentangling and unifying graph convolu-

tions for skeleton-based action recognition. In Pro-

ceedings of the 2020 IEEE/CVF Conference on Com-

puter Vision and Pattern Recognition, pages 140–149,

Seattle, WA, USA. IEEE.

Maas, A. L., Hannun, A. Y., and Ng, A. Y. (2013). Rectiﬁer

nonlinearities improve neural network acoustic mod-

els. In Proceedings of the 30th International Confer-

ence on Machine Learning Workshop, pages 1–6.

Majd, M. and Safabakhsh, R. (2020). Correlational convo-

lutional LSTM for human action recognition. Neuro-

computing, 396:224–229.

Nishida, N., Kawanishi, Y., Deguchi, D., Ide, I., Murase,

H., and Piao, J. (2020). SOANets: Encoder-Decoder

based skeleton orientation alignment network for

white cane user recognition from 2D human skeleton

sequence. In Proceedings of the 15th International

Joint Conference on Computer Vision, Imaging and

Computer Graphics Theory and Applications, pages

435–443, Valletta, Malta. SCITEPRESS - Science and

Technology Publications.

Oji, T., Makino, Y., and Shinoda, H. (2018). Weight esti-

mation of lifted object from body motions using neu-

ral network. In Prattichizzo, D., Shinoda, H., Tan,

H. Z., Ruffaldi, E., and Frisoli, A., editors, Proceed-

ings of the Haptics: Science, Technology, and Appli-

cations, volume 10894, pages 3–13. Springer Interna-

tional Publishing, Cham.

Teepe, T., Khan, A., Gilg, J., Herzog, F., H

ormann, S., and

Rigoll, G. (2021). GaitGraph: Graph convolutional

network for skeleton-based gait recognition. In Pro-

ceedings of the 2021 IEEE International Conference

on Image Processing, pages 2314–2318.

Temuroglu, O., Kawanishi, Y., Deguchi, D., Hirayama, T.,

Ide, I., Murase, H., Iwasaki, M., and Tsukada, A.

(2020). Occlusion-aware skeleton trajectory represen-

tation for abnormal behavior detection. In Ohyama,

W. and Jung, S. K., editors, Proceedings of the

2020 International Workshop on Frontiers of Com-

puter Vision, volume 1212, pages 108–121, Singa-

pore. Springer Singapore.

Ullah, A., Ahmad, J., Muhammad, K., Sajjad, M., and Baik,

S. W. (2018). Action recognition in video sequences

using deep bi-directional LSTM with CNN features.

IEEE Access, 6:1155–1166.

Yamaguchi, Y., Kamitani, T., Nishiyama, M., Iwai, Y., and

Kushida, D. (2020). Extracting features of body sway

for baggage weight classiﬁcation. In Proceedings of

the 9th IEEE Global Conference on Consumer Elec-

tronics, pages 345–348. IEEE.

Yamamoto, T., Terada, K., Ochiai, A., Saito, F., Asahara,

Y., and Murase, K. (2019). Development of human

support robot as the research platform of a domestic

mobile manipulator. ROBOMECH Journal, 6(1):1–

15.

Yan, S., Xiong, Y., and Lin, D. (2018). Spatial temporal

graph convolutional networks for skeleton-based ac-

tion recognition. In Proceedings of the 32nd AAAI

Conference on Artiﬁcial Intelligence, pages 7444–

7452.

Yuguchi, A., Kawano, S., Yoshino, K., Ishi, C. T., Kawan-

ishi, Y., Nakamura, Y., Minato, T., Saito, Y., and Mi-

noh, M. (2022). Butsukusa: A conversational mo-

bile robot describing its own observations and internal

states. In Proceedings of the 2022 17th ACM/IEEE In-

ternational Conference on Human-Robot Interaction,

pages 1114–1118, Sapporo, Japan. IEEE.

VISAPP 2023 - 18th International Conference on Computer Vision Theory and Applications

574