Pure Physics-Based Hand Interaction in VR

Mohammed-Bashir Mahdi

, Erwan Guillou

, Alexandre Meyer

1 a

, Arash Habibi

and Sa

ıda Bouakaz

1 b

LIRIS, CNRS, Univ. Lyon, Universit

e Claude Bernard Lyon 1, 69100 Villeurbanne, France

ICube, CNRS, Universit

e de Strasbourg, 300 bd S

ebastien Brant, 67400 Illkirch, France

Keywords:

Virtual Reality, Interaction, Hooke’s Spring Law, Physics, Physical Interaction, Hands, Natural Interaction,

Visual Feedback.

Abstract:

Interaction in Virtual Reality is still mainly done using controllers. However, since the early 2000s, there has

been a desire to ﬁnd a way to interact within the virtual environment using only our hands. Pinch motion

detection was introduced to detect a grasping action. Since then, hands’ motion capture has been highly

developed until being directly integrated in VR headsets. Thus, multiple research projects were done in order

to exploit this technology for better grasping techniques. Recent works tend to bring physical hand interaction

to VR. However, they introduce physical heuristics to determine if an object is grasped or not, but in reality,

motion is purely kinematic. In our paper, we introduce a purely physical method based on Hooke’s spring

law that eliminates the need for a grasping mode. Additionally, we incorporate visual feedback methods to

compensate for the absence of the sense of touch. Consequently, with this approach, we can lift objects, throw

them, stack them, and interact with them naturally. We carried out extensive tests with several people who had

no previous experience, to validate our technique.

1 INTRODUCTION

Interacting with virtual environment has always been

a hot topic. For instance, we can observe the evolution

of controllers usage for interacting in video game on

console or computers over the years. However, with

the advent of new technologies like VR headset, this

question arises again. Virtual reality gives a new way

to explore the virtual environment and even to interact

with it. However, most of commercial applications

are made using controllers. This tends to evolve as

new available interaction methodologies emerge.

Indeed, vendors themselves (HTC, Meta), provide

SDK to exploit headset cameras in order to capture

hand movements (HTC, 2023; Meta, 2022). When

hands are captured, it is then possible to play with

virtual object without using any wearable such as

controllers. However, the methods proposed by ven-

dors are purely kinematic, and lay on predeﬁned hand

poses to detect interactions (Buchmann et al., 2004).

Extensive research efforts have been directed

towards achieving physically grounded interactions

within the realm of virtual reality (Kim and Park,

https://orcid.org/0000-0002-0249-1048

https://orcid.org/0000-0001-7091-7859

2015; H

oll et al., 2018; Liu et al., 2019). However,

none of these approaches are fully physics-based. In-

stead, existing approaches predominantly establish

heuristics for discerning the presence of interactions.

They have two modes of interaction: a physics mode

where an object moves following physical laws and

a grasping mode where a grasped object follows the

hand kinematics. These methods retain the grasping

paradigm established by controller-based approaches,

although with heuristics founded upon physical prin-

ciples for determining whether an object is being held

or not.

Our objective is to introduce a purely physics-

based hand interaction method that avoids any form of

grabbing mechanism altering the mode of interaction.

Additionally, since we do not use additional wearable

such as haptic gloves that provide tactile feedback, we

incorporate visual feedback techniques to enhance the

user’s understanding of the environment.

2 RELATED WORK

This section ﬁrst introduces the ﬁeld of hand motion

capture, with the goal of enabling natural interactions

228

Mahdi, M., Guillou, E., Meyer, A., Habibi, A. and Bouakaz, S.

Pure Physics-Based Hand Interaction in VR.

DOI: 10.5220/0012320400003660

Paper published under CC license (CC BY-NC-ND 4.0)

In Proceedings of the 19th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP 2024) - Volume 1: GRAPP, HUCAPP

and IVAPP, pages 228-235

ISBN: 978-989-758-679-8; ISSN: 2184-4321

in virtual reality through hand movements. Subse-

quently, various interaction techniques focusing VR

domain are exposed.

2.1 Hand Capture

In order to capture hand pose data, different sorts of

sensors may be used. We distinguish between ap-

proaches that capture the body and position of the

hand, and those that capture hand movements includ-

ing the ﬁngers.

Complex and expensive solutions such as Opti-

Track (Point, 2023) are capable of tracking and pro-

viding a comprehensive body and hand skeleton but

at the cost of a cumbersome and inconvenient instal-

lation. Inertial measurement unit devices (IMU) can

be used. The output data of these sensors are accel-

erations and angular velocities. IMUs are used for

instance in PIP (Physical Inertial Poser) (Yi et al.,

2022), coupled with a neural network to infer body

joint positions. However, they reconstruct the body

pose without the ﬁngers. In the same category of cap-

ture methodology, LoBSTr (Yang et al., 2021) and

Quest Sim (Winkler et al., 2022) use the headset and

the controllers position to extrapolate the whole body

pose. Quest Sim uses a physics engine alongside a

neural network in order to ﬁnd the torques needed

to move the different joints. With such an approach,

they have a physical character which allows for phys-

ical interaction. With the increasing popularity of ma-

chine learning, multiple motion capture methods us-

ing only one color camera emerge (Cao et al., 2019;

Zhang et al., 2020; Mehta et al., 2020). For example,

Controller Pose (Ahuja et al., 2022) proposes putting

wide angle cameras on the controllers, looking to-

wards the user in order to capture its movement. As

the method suggests, the user has to hold the con-

trollers, so we are lacking ﬁnger pose. Geng (Geng

et al., 2021) make the pipeline lighter as they use only

one RGB camera, as XNect (Mehta et al., 2020).

Still, previous techniques do not provide the hand

skeleton. Sensors such as Leap Motion (ultraleap,

2023) can provide good hand tracking. The device

is equipped with several infrared camera sensors and

LEDs that capture hand and ﬁnger movements in real

time. UmeTrack (Han et al., 2022) and MEgATrack

(Han et al., 2020) use multiple RGB cameras in or-

der to focus on capturing hand poses. These meth-

ods make use of the four cameras on the Oculus .

There are solutions using only one usual cameras.

Liu (Liu et al., 2022) use architecture borrowed from

deep learning for that. Indeed, transformer and at-

tention network help to contextualize the articulation

rotations and positions regarding the pose of the arm

and forearm. Google proposes a modern computer

vision framework including real time hand capture

system based on a single color camera called Me-

diapipe (Zhang et al., 2020). The implemented ar-

chitecture is based on Blaze Face (Bazarevsky et al.,

2019), a neural architecture developed for facial mo-

tion capture.

In our use case, the emphasis is on achieving hand

placement with ﬁnger motion. We use the body skele-

ton only for precise hand positioning. Machine learn-

ing techniques for hand pose recognition are conve-

nient because they often do not require the use of

complex additional equipment to accomplish the task.

This attribute makes them attractive because they can

be easily deployed in a variety of conditions, includ-

ing in small rooms or at home. Some headsets, such

as the HTV Vive Pro 2 , already feature deep learning-

based hand pose reconstruction, eliminating the need

for additional hardware. For the interaction, our aim

is to perform hand capture without any accessory and

to combine this capture with a completely physical in-

teraction mode, with no need to switch to a kinematic

mode.

2.2 Interaction in VR

In this section, we focus on full virtual interaction us-

ing hands to interact with virtual objects.

One of the most popular ways to interact in VR is

done using controllers such as Oculus Touch and HTV

Vive Pro 2 controllers. Even if the interaction is con-

strained by the controllers, researchers tend to make

it feel more natural by implementing grasping mecha-

nisms (Han et al., 2023; Fern

andez et al., 2022; Oprea

et al., 2019) that are at least visually more convincing.

(Oprea et al., 2019) use physics to determine hand and

ﬁnger positions when holding an object, depending on

how the user approaches it in the virtual scene. (Han

et al., 2023) use a deep neural network to compute

torques applied on hand joints in order to move the

hand more realistically and grab the targeted object

using only a physical approach. This shows that even

with only controllers, we can subdivide the interac-

tion in two categories: geometry-based and physics-

based interactions.

Another family of approaches relies on the use of

cameras, often integrated into a modern VR headset,

to capture hand movements and determine interac-

tions within a preconﬁgured set of available actions.

For instance, Hung in Puppeteer (Hung et al., 2022)

use hand motion to control character motion in the

environment of video games. Regarding hand-object

interaction, Buchmann (Buchmann et al., 2004) pro-

pose an early method where they introduced the pinch

Pure Physics-Based Hand Interaction in VR

229

movement to trigger a grasping. We still see its legacy

today, as most of the headset vendors still use this

movement. However, Schafer et al. (Sch

afer et al.,

2022) let the user deﬁne its own grasping gesture as

the pinch gesture is neither natural, nor adapted de-

pending on The object the user wants to grab. They

showed that letting the user deﬁne its own gesture in-

creases the naturalness feeling of the interaction for

that user.

Finally, pursuing the goal of making interactions

more realistic and natural, many researches are made

towards physics-based hand interactions (H

oll et al.,

2018; Kim and Park, 2015; Kaminski et al., 2022; Liu

et al., 2019). MuJoCo HAPTIX (Kumar and Todorov,

2015) is purely physical, but at the cost of a heavy set-

up: haptic glove for force feedback and hand skele-

ton retrieval, paired with an OptiTrack system to po-

sition a hand in the world. The spring model from

Borst (Borst and Indugula, 2006) is interesting as it

performs physics-based interactions too, but still uses

CyberGloves and works on hand joint torques. We

aim at making the setup simpler with no additional

wearable, and thus adapt the spring model. Regard-

ing the other listed methods, the interaction is physi-

cal as long as the hand is not grasping any object, ob-

ject movement becomes kinematic otherwise, which

makes these techniques hybrid.

(Kim and Park, 2015) in his method selects 1000

vertices from a hand mesh in order to add sphere col-

liders on each one of them. These colliders are used to

detect collisions with the objects in the scene. They

realized that they were unable to have a stable sys-

tem for fast paced movements, thus when grasping is

detected they switch the moving object to kinematic

mode to make it follow hand moves. This principle of

having objects physically interact with the hand and

switching to geometric interaction when grasping is

detected is also found in the paper (Liu et al., 2019).

In (H

oll et al., 2018) physical approach is only used

for the grasping heuristic. It relies on Coulomb fric-

tion model computation, not to move the object, but

to determine if the applied force on the object satis-

ﬁes the friction model. If so, then the object goes

into a grabbed mode. Moreover, here one can see the

hand being teleported as collision detection happens

earlier. This is due to the increase of the object col-

lider size, in order to prevent any physic engine in-

stabilities. There is still ongoing research on making

interaction being fully physical in VR as shown by

the work of Kaminski (Kaminski et al., 2022). Un-

fortunately, they claim being only able to slide ob-

jects around, and grasping is not possible. When in-

teracting in VR, one must use a proper hand repre-

sentation in order to increase the user’s experience

quality, regarding the set goal. Ferran (Argelaguet

et al., 2016) shows that using a simple hand repre-

sentation increases the feeling of embodiment. More-

over, Dewez (Dewez et al., 2023) discussed the in-

terest of using a ghost hand representation when han-

dling objects.

As described, current methods rely on a grasp-

ing mechanism. Indeed, most methods are hybrid,

employing a physics heuristic to determine whether

the object is grasped or not; if it is, they switch to a

kinematic movement. Furthermore, these systems ﬁx

the position of the hand’s ﬁngers and palm, allowing

only the wrist to move. Consequently, this limitation

makes it impossible to perform more complex move-

ments. This emphasizes the idea of having a pure

physical interaction method similar to how interaction

works in real life, in order to have a more natural feel-

ing.

3 PHYSICAL INTERACTION

Our goal is to design a method for purely physical

interaction, enabling the exploration of new move-

ments. To accomplish this, we implement a force

model based on principles of spring physics. We sug-

gest leaving the hand bare, without the need for a

controller or glove to capture it. Our method relies

only on a hand motion capture system. Therefore, we

must deﬁne new feedback techniques to address the

absence of tactile feedback, a crucial aspect of real-

life interactions with objects.

3.1 Our Approach

Figure 1: Scheme of both hand representations. The left

one, in red, is the physical hand. The right one, in blue, is

the ghost hand.

The best way to achieve completely natural interac-

tion is to immerse the virtual world in a physical sim-

ulation, including ﬁngers and hands. We base our

approach on a dual representation of the hand: one

physical hand and one ghost hand as shown in Fig-

ure 1. The physical hand interacts physically with the

environment. The ghost hand takes the data received

from the motion capture system, and its joints are po-

GRAPP 2024 - 19th International Conference on Computer Graphics Theory and Applications

230

sitioned according to the position data received, re-

gardless of the virtual environment. The ghost hand

drives the physical hand. The physical hand can touch

an object, move it, or if the object is too heavy for ex-

ample, the hand is blocked by this object. It can not

pass through objects. On the contrary, the ghost hand

will act like a ghost, meaning that it goes through ob-

jects and has no physical interactions with its environ-

ment.

The key part of the model is how the joints of the

physical hand are moved. The ghost joints will act

like targets for the physical ones. The physical joints

will try to reach their targets, but they are moved using

forces and physics engine, thus they might be blocked

if, for example, they encountered an obstacle on their

way. Our model is related to the Proportional Deriva-

tive controller approach (Yin and Yin, 2020), except

that PD controller provides usually torques whereas

our model applies forces directly on joint positions.

Using spring model for our interaction methods, we

ﬁrst refer to Hooke’s law (Halliday et al., 2013) which

states that the force of an elastic spring is the product

of its length and a stiffness factor as shown in equa-

tion (1).

= −k

× x (1)

is the resulting elastic force, k

the rigidity constant

(with k

> 0) and x the displacement of the spring.

The spring’s rest length is zero.

This ﬁrst F

term aims to attract the joint position

towards the target. In order to complete this spring

model, we add a second term to ensure that the veloc-

ity of the joint targets the velocity of the target joint.

We compute this force acting on the derivative by (2).

= −k

× ∆v (2)

With F

the damping force, k

the gain of this force

(with k

> 0) and ∆v the difference between the speed

and a target speed.

Finally, the total spring force F

is the sum of the

two previous forces (3).

= F

+ F

(3)

As described earlier, the applied force uses a

spring model. The x value of equation (1) is the dif-

ference between the new ghost joint position and the

physical joint position. It means that the spring seeks

to get a length of zero. ∆v from equation 2 is the dif-

ference between the velocities of both hand represen-

tations. Finally, we add the two previous forces and

end up with the spring force like in (3).

The algorithm 1 shows how we implemented the

model in order to compute the spring force applied

to the physical joints. During each iteration of the

physics engine, this algorithm is performed on each

Data: p

, p

, k

, ∆

, k

Result: Added force to the physical joint.

d ← p

− p

;

← −k

× d ;

← (p

− p

prev g

)/∆

;

← −k

× (v

− v

);

rb.AddForce( f

+ f

);

prev g

← p

;

Algorithm 1: Computing the force.

hand joint. d is the displacement used to compute the

elastic force f

, with p

the position of the physical

joint, and p

the position of the ghost joint. We com-

pute the speed v

of the ghost joint by dividing the

difference between its current position and its previ-

ous position p

by the time for one iteration of

the physics engine ∆

. After computing the derivative

force (with v

the speed of the physical joint, pro-

vided directly by the physics engine in our case), we

sum the two forces to get the total spring force. Then

we add the force to the rigid body rb of the physical

joint in order to make it move.

Figure 2: Scheme of how the spring forces (grey spring) are

applied on the physical joint (in red) in order to move an

object (in brown) with the ghost hand (in blue) not being

stopped by the object.

Schemes in Figure 2 show how the forces are ap-

plied. After computing the force for each joint, it is

added to the rigid body of the corresponding joint.

Then, if the force is high enough, it will generate the

required amount of friction between the joints and the

object to make the object not slide from the ﬁngers.

The physics engine will handle the friction calcula-

tion by itself as well as the force transfer between the

joints and the object. Moreover, the force direction

will adapt regarding of the movement. For example,

if the capture joints in the new frame are positioned

higher than the current physical joints, the direction

will be equal to the vector between both kind of joints.

Pure Physics-Based Hand Interaction in VR

231

3.2 Visual Feedback

When we grasp an object in real life, we know we

are in contact with the object because of the sense of

touch. If we rely only on the visual sense, it makes

the task more difﬁcult: we get the data that the ﬁn-

gers behind are touching the object when we see the

cup moving, or at least when waves appear in the cof-

fee. To deal with these problems, one can use a hap-

tic glove like (Kumar and Todorov, 2015) suggested.

However, our goal is to remove wearable as much as

possible (controllers, gloves), thus we believe that it

is possible to help these problems using visual feed-

back only (Prachyabrued and Borst, 2014; Vosinakis

and Koutsabasis, 2018; Ahmed et al., 2021).

We introduced several visual notiﬁcations. First of

all, when a joint, from the ghost hand, gets into con-

tact with an object, the corresponding physical joint

color turns into a light blue, and switch back to its

original color when the contact is lost (object falling,

joint no more in contact with the object). At the same

time, when there is contact, the corresponding object

becomes translucent. This emphasizes the informa-

tion of contact, but provides also the possibility to see

through the object and thus be able to know if the hand

joints which were originally occluded by the object

are in contact or not.

As mentioned previously, we use a double repre-

sentation of the hand. We decided to let both of them

visible. By being able to see the ghost hand, it will

help the user to infer the amount of pressure he is

applying with the joint being in contact with the ob-

ject. Indeed, as described in section 3.1, the ghost

joint penetrates the object and the distance between

the ghost joint and the physical joint gives the amount

of force to be applied. Thus, by being able to see both

joints, the user can then see the distance between them

and estimate how much pressure he needs to apply.

Figure 3: Scheme of the visual feedbacks. Left when there

is no contact. Right when there is contact between the hand

and an object. The scheme is reduced to the interaction of

one joint.

The Figure 3 shows the working of the visual feed-

backs. The change of color of the different elements

(physical joint and the object getting transparent), and

the ghost joint being visible to see its level of penetra-

tion in the object.

4 RESULTS

In order to implement our model, we used the Unity

engine and its built-in physics engine PhysicX. The

value k

of the elastic force is set to 1000, and the

value k

of the derivative force to 20. Those values

have been determined empirically, and ﬁxed to the

most stable values found. The VR headset used is the

HTV Vive Pro 2 and we use its SDK (HTC, 2023) to

perform the hand motion capture. Everything was im-

plemented in a computer equiped with an Intel Xeon

CPU (Xeon E5-1603 at 2.80GHz), a GTX 1080 GPU

with 8GB of VRAM and 16GB of RAM. We decided

to color the joints in red for the left hand and in green

for the right one.

For the testing phase, we designed four different

scenes that require different skills:

• Throw: the user must throw a cube at a pyramid

in order to destroy it.

• Sphere: the user must ﬂick a sphere to make it ﬂy

into a predeﬁned area.

• Displacement: the user must take a cube and place

it into a tilted hole in a wall.

• Stack: the user must make a stack with three

cubes.

To validate our model, twelve subjects tested all

scenes. All of them come from different backgounds.

Most of them had never used VR or just once. If they

did use VR, it was for playing video games, using

controllers, or for watching a movie. All of them

stated that it was their ﬁrst time interacting in a vir-

tual environment in VR by using only their hands and

without any controller.

To compare users’ performances, we record a

reference time established by an additional experi-

mented person. Table 1 shows the ratio of a novice

subject completion time over the reference user com-

pletion time (to get the actual time of a subject i, sub-

ject ration must be multiplied by the reference com-

pletion time R × Si). One can see that R was beaten

by a small margin three times, in three different exer-

cises: S3 in throw by 1s, S12 in sphere by 0.7s and

S11 in placement by 0.4s. Moreover, 79% of the total

number of experiments were successful, with a 100%

success rate for the sphere scene. Moreover, the aver-

age time differences shows that one requires between

30 and 110 seconds in order to perform a task with-

out previous training. Finally, the median in Table 3

is interesting as it tells that half of the users have been

able to perform each task with in a minute or less, and

with throw and sphere tasks completed in both in less

than 30 seconds.

GRAPP 2024 - 19th International Conference on Computer Graphics Theory and Applications

232

Table 1: Ratio of completion time by each subject over reference time (

). FAIL when the experiment was not completed by

the subject. S

for subject i. R is the reference user’s time in second. T: throw, S: Sphere, P: placement, S: Stack.

S1 S2 S3 S4 S5 S6 S7 S8 S9 S10 S11 S12 R

T FAIL 3.84 0.85 5.65 FAIL FAIL FAIL FAIL 27.24 1.14 3.14 5.19 6.8

S 11.15 16.61 19.51 5.23 5.84 3.05 4.44 17.03 10.47 11.23 6.41 0.77 3.2

P 17.50 3.64 8.98 7.97 23.29 1.55 FAIL 8.84 2.84 FAIL 0.96 4.74 8.6

S 2,68 1.14 FAIL 8.27 FAIL 1.42 12.75 FAIL 2.19 6.02 1.84 1.01 28.3

Table 2: Table of the average times, the variance and the

standard deviation, based on the measure of Table 1.

AVG Variance Standard deviation

Throw 45.7 3940.9 62.8

Sphere 29.5 363.0 19.1

Placement 68.8 3859.0 62.1

Stack 110.6 13191.9 114.9

Table 3: Table of the quartiles and median of the measures

from Table 1.

quartile median 3

quartile

Throw 14.6 26.1 36.8

Sphere 15.9 26.7 39.8

Placement 26.1 54.5 76.6

Stack 40.0 61.9 170.1

Furthermore, the reference time for throw, sphere

and placement are low (resp. 6.8s, 3.2s and 8.6s).

For the stack scene, the times are higher because we

had to cope with the hand motion capture which tends

to freeze. This problem was pointed by every user,

but they still claimed that the interaction itself feels

natural.

Figure 4: Joints and objects changing color when interact-

ing with each others.

Figure 5 shows that it is possible to lift objects

with our method. It is worthy to note that it is possible

to lift an object such as a cube, even though it has no

handle which could prevent the object from sliding.

We see that we are able to cage the object with our

hand to properly lift it. Finally, we can move our hand

around without the cube falling, and put it wherever

we want.

When holding an object, if we move fast enough

and release it, we are able to throw it as it will have

gained enough inertia not to fall right in front of us.

One can throw an object into others as shown in Fig-

ure 6. Thus, we can also adjust the force of the thrown

object by adjusting the speed of our movement, like

in real life, which is only feasible when using a pure

Figure 5: Lifting a cube to make a stack. Top left: preparing

the hand. Top right: grabbing and lifting the cube. Bottom:

opening the hand to let the cube on the stack.

Figure 6: Throwing an object.

Figure 7: Punching a stack of cubes.

physical approach. Moreover, when trying to put the

cube in a tilted hole, the cube will follow the rotation

of the hand naturally in order to be placed correctly.

As we do not have any grasping mode and that

everything is physical, it makes possible to punch ob-

jects and make them ﬂy all over the place as shown

in Figure 7. Since we are not limited to predeﬁned

actions from a controller or any grasping detection

methods, we are free to interact naturally with virtual

environments.

Pure Physics-Based Hand Interaction in VR

233

Figure 4 shows how visual feedback works. The

joints in contact with the cube see their color change,

as well as the object itself get transparent. This point

is also illustrated in Figure 4, 5 and 6.

5 DISCUSSION

Results obtained are promising, nevertheless there are

still areas where different approaches could be tried.

Indeed, the actual state of the projects requires a

learning phase from the user in order to be able to ex-

ploit the full potential of manipulating objects in VR.

As we are lacking haptic feedback, the user must be

trained to manipulate objects without touching any-

thing with his hands. Thus work could be done in the

feedbacks to try to cope with this problem and reduce

the adaptation time.

To improve this model, when a user desires to in-

teract with an object, it might be interesting to make

this object transparent before it gets touched. Mul-

tiple ways can be investigated. As we do not want

to add additional wearable such as gloves, we rely

strongly on visual feedbacks. To maybe improve such

model, it might be interesting to make the object the

user wants to interact with transparent before it gets

touched. For instance, by increasing the touch detec-

tion area of the object, without affecting the physics,

to be able to see through it and thus be able to see the

part of the hand which is behind it.

6 CONCLUSION

We proposed a method that extends the paradigm of

physics-based hand manipulation of objects in VR by

completely eliminating the notion of kinematic state.

Indeed, we eliminate all sorts of grabbing mecha-

nisms which switch the interaction from physics to

kinematic. In addition, our pipeline allows to avoid

all additional accessories such as haptic gloves. To

compensate for the absence of haptic feedback, we

introduced visual feedback to help the user deduce

the missing tactile information. Finally, the results

showed that it is possible to grab objects and even

push them as we would do in real life. The object-

throwing scene would not be possible with a kine-

matic approach. The user would have to use the but-

ton to indicate a release, but the interaction would

then be different from what we do in real life. A

hybrid approach would require the application to re-

ally detect the release at the right moment, at the

risk of having an unrealistic trajectory. Moreover, in

the scene where the cube has to be placed in a hole

in a sloping wall, the kinematic approach would re-

quire an uncomfortable turn of the wrist and would

surely be less easy than with our physical approach

where the cube is oriented against the surface thanks

to physics.

Our test scenes show that a purely physical ap-

proach is feasible, and that novice users are able to

perform tasks quite quickly, which would not be so

obvious with kinematic or even hybrid interaction.

Nevertheless, the model could be improved. The

learning curve in order to adapt to this interaction sys-

tem is steep, mainly because hand motion capture is

sometimes not stable. There might still be some ﬁne-

tuning to improve grasping. Also, the proposed visual

feedback needs to be explored further, or even dig into

another kind of feedbacks such as audio.

For future work, it can be interesting to focus on

feedback development by exploring the use of sounds.

The program could emit a sound when we approach

an object, similar to the way modern cars have park-

ing sensors. However, in scenes with multiple closely

located objects, reaching for one of them could result

in numerous sounds playing simultaneously. Alterna-

tively, sound could be emitted only when a hand joint

comes into contact with an object. What is interesting

here, is that we can make the audio feedback methods

work alongside the visual feedback methods. More-

over, using audio feedback does not need any addi-

tional wearable as VR headset tend to have earphones,

and if not, any headset might do the job as well as the

computers built-in speakers.

REFERENCES

Ahmed, N., Lataifeh, M., and Junejo, I. (2021). Visual

pseudo haptics for a dynamic squeeze / grab gesture

in immersive virtual reality. In 2021 IEEE 2nd In-

ternational Conference on Human-Machine Systems

(ICHMS), pages 1–4.

Ahuja, K., Shen, V., Fang, C. M., Riopelle, N., Kong, A.,

and Harrison, C. (2022). Controllerpose: Inside-out

body capture with vr controller cameras. In Proceed-

ings of the 2022 CHI Conference on Human Factors

in Computing Systems, CHI ’22, New York, NY, USA.

Association for Computing Machinery.

Argelaguet, F., Hoyet, L., Trico, M., and L

ecuyer, A.

(2016). The role of interaction in virtual embodiment:

Effects of the virtual hand representation. IEEE Vir-

tual Reality.

Bazarevsky, V., Kartynnik, Y., Vakunov, A., Raveendran,

K., and Grundmann, M. (2019). Blazeface: Sub-

millisecond neural face detection on mobile gpus.

ArXiv, abs/1907.05047.

Borst, C. W. and Indugula, A. P. (2006). A spring model for

whole-hand virtual grasping. Presence: Teleoperators

& Virtual Environments, 15:47–61.

GRAPP 2024 - 19th International Conference on Computer Graphics Theory and Applications

234

Buchmann, V., Violich, S., Billinghurst, M., and Cockburn,

A. (2004). Fingartips: gesture based direct manipula-

tion in augmented reality. In Conference on Computer

Graphics and Interactive Techniques in Australasia

and Southeast Asia.

Cao, Z., Hidalgo Martinez, G., Simon, T., Wei, S., and

Sheikh, Y. A. (2019). Openpose: Realtime multi-

person 2d pose estimation using part afﬁnity ﬁelds.

IEEE Transactions on Pattern Analysis and Machine

Intelligence.

Dewez, D., Hoyet, L., L

ecuyer, A., and Argelaguet, F.

(2023). Do you need another hand? investigating dual

body representations during anisomorphic 3d manipu-

lation. IEEE Transactions on Visualization and Com-

puter Graphics.

Fern

andez, U. J., Elizondo, S., Iriarte, N., Morales, R.,

Ortiz, A., Marichal, S., Ardaiz, O., and Marzo, A.

(2022). A multi-object grasp technique for placement

of objects in virtual reality. Applied Sciences, 12(9).

Geng, Z., Sun, K., Xiao, B., Zhang, Z., and Wang, J.

(2021). Bottom-up human pose estimation via dis-

entangled keypoint regression. 2021 IEEE/CVF Con-

ference on Computer Vision and Pattern Recognition

(CVPR), pages 14671–14681.

Halliday, D., Resnick, R., and Walker, J. (2013). Funda-

mentals of physics. John Wiley & Sons.

Han, D., Lee, R., Kim, K., and Kang, H. (2023). Vr-

handnet: A visually and physically plausible hand ma-

nipulation system in virtual reality. IEEE Transactions

on Visualization and Computer Graphics, pages 1–12.

Han, S., Liu, B., Cabezas, R., Twigg, C. D., Zhang, P.,

Petkau, J., Yu, T.-H., Tai, C.-J., Akbay, M., Wang,

Z., Nitzan, A., Dong, G., Ye, Y., Tao, L., Wan, C., and

Wang, R. (2020). Megatrack: Monochrome egocen-

tric articulated hand-tracking for virtual reality. ACM

Trans. Graph., 39(4).

Han, S., Wu, P.-C., Zhang, Y., Liu, B., Zhang, L., Wang,

Z., Si, W., Zhang, P., Cai, Y., Hodan, T., Cabezas, R.,

Tran, L., Akbay, M., Yu, T.-H., Keskin, C., and Wang,

R. (2022). Umetrack: Uniﬁed multi-view end-to-end

hand tracking for vr. In SIGGRAPH Asia 2022 Con-

ference Papers, SA ’22, New York, NY, USA. Asso-

ciation for Computing Machinery.

oll, M., Oberweger, M., Arth, C., and Lepetit, V. (2018).

Efﬁcient physics-based implementation for realistic

hand-object interaction in virtual reality. In Proc. of

Conference on Virtual Reality and 3D User Interfaces.

HTC (2023). Vive hand tracking sdk. Accessed: 2023-10-

03.

Hung, C.-W., Chang, R.-C., Chen, H.-S., Liang, C. H.,

Chan, L., and Chen, B.-Y. (2022). Puppeteer: Explor-

ing intuitive hand gestures and upper-body postures

for manipulating human avatar actions. In Proceed-

ings of the 28th ACM Symposium on Virtual Reality

Software and Technology, VRST ’22, New York, NY,

USA. Association for Computing Machinery.

Kaminski, E., Kepplier, S., and Israel, J. H. (2022). Physics-

based hand-object-interaction for immersive environ-

ments. Mensch und Computer.

Kim, J.-S. and Park, J. M. (2015). Physics-based hand

interaction with virtual objects. 2015 IEEE Inter-

national Conference on Robotics and Automation

(ICRA), pages 3814–3819.

Kumar, V. and Todorov, E. (2015). Mujoco haptix: A virtual

reality system for hand manipulation. 2015 IEEE-RAS

15th International Conference on Humanoid Robots

(Humanoids), pages 657–663.

Liu, H., Zhang, Z., Xie, X., Zhu, Y., Liu, Y., Wan, Y.,

and Zhu, S.-C. (2019). Physics-based hand-object-

interaction for immersive environments. IEEE In-

ternational Conference on Robotics and Automation

(ICRA).

Liu, S., Wu, W.-X., Wu, J., and Lin, Y. (2022). Spatial-

temporal parallel transformer for arm-hand dynamic

estimation. 2022 IEEE/CVF Conference on Com-

puter Vision and Pattern Recognition (CVPR), pages

24091–24100.

Mehta, D., Sotnychenko, O., Mueller, F., Xu, W., Elgharib,

M. A., Fua, P., Seidel, H.-P., Rhodin, H., Pons-Moll,

G., and Theobalt, C. (2020). Xnect: Real-time multi-

person 3d motion capture with a single rgb camera. In

International Conference on Computer Graphics and

Interactive Techniques.

Meta (2022). Interaction sdk overview. Accessed: 2023-

10-03.

Oprea, S., Martinez-Gonzalez, P., Garcia-Garcia, A.,

Castro-Vargas, J. A., Orts-Escolano, S., and Garcia-

Rodriguez, J. (2019). A visually realistic grasping sys-

tem for object manipulation and interaction in virtual

reality environments. Comput. Graph., 83(C):77–86.

Point, N. (2023). Optitrack. Accessed: 2023-10-03.

Prachyabrued, M. and Borst, C. W. (2014). Visual feedback

for virtual grasping. 2014 IEEE Symposium on 3D

User Interfaces (3DUI), pages 19–26.

Sch

afer, A., Reis, G., and Stricker, D. (2022). The gesture

authoring space: Authoring customised hand gestures

for grasping virtual objects in immersive virtual envi-

ronments. Mensch und Computer, pages 1–12.

ultraleap (2023). ultraleap.com. Accessed: 2023-07-23.

Vosinakis, S. and Koutsabasis, P. (2018). Evaluation of

visual feedback techniques for virtual grasping with

bare hands using leap motion and oculus rift. Virtual

Reality, 22:47–62.

Winkler, A., Won, J., and Ye, Y. (2022). Questsim: Human

motion tracking from sparse sensors with simulated

avatars. In SIGGRAPH Asia 2022 Conference Papers,

SA ’22, New York, NY, USA. Association for Com-

puting Machinery.

Yang, D., Kim, D., and Lee, S.-H. (2021). Lobstr: Real-

time lower-body pose prediction from sparse upper-

body tracking signals. Computer Graphics Forum, 40.

Yi, X., Zhou, Y., Habermann, M., Shimada, S., Golyanik,

V., Theobalt, C., and Xu, F. (2022). Physical inertial

poser (pip): Physics-aware real-time human motion

tracking from sparse inertial sensors. In IEEE/CVF

Conference on Computer Vision and Pattern Recogni-

tion (CVPR).

Yin, Z. and Yin, K. (2020). Linear time stable pd con-

trollers for physics-based character animation. Com-

puter Graphics Forum, 39(8):191–200.

Zhang, F., Bazarevsky, V., Vakunov, A., Tkachenka, A.,

Sung, G., Chang, C.-L., and Grundmann, M. (2020).

Mediapipe hands: On-device real-time hand tracking.

ArXiv, abs/2006.10214.

Pure Physics-Based Hand Interaction in VR

235