
directional sound placement within browser environ-
ments. However, integration of these tools with dy-
namic object tracking from physical space is still lim-
ited. Early experimental systems demonstrate feasi-
bility in combining WebXR, Web Audio, and WASM
(Web Assembly) to synchronize object pose with
real-time sound field updates (Tomasetti et al., 2023)
(Boem et al., 2024).
These proposals point to promising directions, but
lack the performance guarantees and hardware com-
patibility required for widespread adoption.
7 CONCLUSION
Clearly, spatial audio is an important element in im-
mersive AR, allowing users to experience sound that
responds to real-world motion and location. This
work introduced a Web browser-based system for
converting the rotation and position of a real object
into spatial audio feedback within an AR setting.
By combining object tracking with three-
dimensional sound rendering, built using JavaScript,
we showed that interactive audio features can run
efficiently on modern Web platforms. Performance
tests on mobile Web browsers confirmed that the
system delivers low latency and smooth execution
with efficient audio processing.
To accomplish the research goals, we described
object tagging in Section 2, sound mapping for rota-
tion in Section 3. Also, depth estimation was detailed
in Section 4 and location mapping for position in Sec-
tion 5. Our obtained results confirm that responsive,
real-time audio interaction is achievable directly in
the Web browser without the need for external plug-
ins or native code.
Future studies will build on this foundation by in-
tegrating dynamic environmental soundscapes and in-
vestigating innovative approaches in perceptual audio
design (Schiller et al., 2024) (Batat, 2024). Draw-
ing inspiration from previous studies like (Bhowmik,
2024), (Munoz, 2025), and (Peng et al., 2025), we
also aim to incorporate richer physical object repre-
sentations and more diverse interaction modalities to
further extend the potential of the Web Audio API in
various virtual/augmented/mixed reality experiences.
In addition, another research perspective may fo-
cus on usability evaluation with blind participants to
validate system effectiveness in real contexts, and
exploration of HRTF (head-related transfer func-
tion) (Cheng and Wakefield, 2001) binaural audio to
enhance spatial perception.
github.io/ (Last accessed: July 29th, 2025).
REFERENCES
Batat, W. (2024). Phygital customer experience in the meta-
verse: A study of consumer sensory perception of
sight, touch, sound, scent, and taste. Journal of Re-
tailing and Consumer Services, 78:103786.
Bhowmik, A. K. (2024). Virtual and augmented reality: Hu-
man sensory-perceptual requirements and trends for
immersive spatial computing experiences. Journal of
the Society for Information Display, 32(8):605–646.
Birkl, R., Ranftl, R., and Koltun, V. (2023). Boost-
ing monocular depth estimation models to high-
resolution via content-aware upsampling. arXiv
preprint arXiv:2306.05423.
Blauert, J. (1997). Spatial hearing: The psychophysics of
human sound localization. MIT Press.
Boem, A., Dziwis, D., Tomasetti, M., Etezazi, S., and
Turchet, L. (2024). “It Takes Two”—Shared and Col-
laborative Virtual Musical Instruments in the Musi-
cal Metaverse. In 2024 IEEE 5th International Sym-
posium on the Internet of Sounds (IS2), pages 1–10.
IEEE.
Chelladurai, P. K., Li, Z., Weber, M., Oh, T., and Peiris,
R. L. (2024). SoundHapticVR: head-based spatial
haptic feedback for accessible sounds in virtual reality
for deaf and hard of hearing users. In Proceedings of
the 26th International ACM SIGACCESS Conference
on Computers and Accessibility, pages 1–17.
Cheng, C. I. and Wakefield, G. H. (2001). Introduction
to head-related transfer functions (hrtfs): Represen-
tations of hrtfs in time, frequency, and space. Journal
of the Audio Engineering Society, 49(4):231–249.
Cho, H., Wang, A., Kartik, D., Xie, E. L., Yan, Y., and
Lindlbauer, D. (2024). Auptimize: Optimal Place-
ment of Spatial Audio Cues for Extended Reality. In
Proceedings of the 37th Annual ACM Symposium on
User Interface Software and Technology, pages 1–14.
Fotopoulou, E., Sagnowski, K., Prebeck, K., Chakraborty,
M., Medicherla, S., and D
¨
ohla, S. (2024). Use-Cases
of the new 3GPP Immersive Voice and Audio Services
(IVAS) Codec and a Web Demo Implementation. In
2024 IEEE 5th International Symposium on the Inter-
net of Sounds (IS2), pages 1–6. IEEE.
Hirway, A., Qiao, Y., and Murray, N. (2024). A Quality of
Experience and Visual Attention Evaluation for 360
videos with non-spatial and spatial audio. ACM Trans-
actions on Multimedia Computing, Communications
and Applications, 20(9):1–20.
Matuszewski, B. and Rottier, O. (2023). The Web Au-
dio API as a standardized interface beyond Web
browsers. Journal of the Audio Engineering Society,
71(11):790–801.
McArthur, A., Van Tonder, C., Gaston-Bird, L., and Knight-
Hill, A. (2021). A survey of 3d audio through the
browser: practitioner perspectives. In 2021 Immer-
sive and 3D Audio: from Architecture to Automotive
(I3DA), pages 1–10. IEEE.
Montero, A., Zarraonandia, T., Diaz, P., and Aedo, I.
(2019). Designing and implementing interactive and
Real-Time Sound Mapping of Object Rotation and Position in Augmented Reality Using Web Browser Technologies
45