M&M: Multimodal-Multitask Model Integrating Audiovisual Cues in Cognitive Load Assessment

Long Nguyen-Phuoc, Long Nguyen-Phuoc, Renald Gaboriau, Dimitri Delacroix, Laurent Navarro

2024

Abstract

This paper introduces the M&M model, a novel multimodal-multitask learning framework, applied to the AVCAffe dataset for cognitive load assessment (CLA). M&M uniquely integrates audiovisual cues through a dual-pathway architecture, featuring specialized streams for audio and video inputs. A key innovation lies in its cross-modality multihead attention mechanism, fusing the different modalities for synchronized multitasking. Another notable feature is the model’s three specialized branches, each tailored to a specific cognitive load label, enabling nuanced, task-specific analysis. While it shows modest performance compared to the AVCAffe’s single-task baseline, M&M demonstrates a promising framework for integrated multimodal processing. This work paves the way for future enhancements in multimodal-multitask learning systems, emphasizing the fusion of diverse data types for complex task handling.

Download


Paper Citation


in Harvard Style

Nguyen-Phuoc L., Gaboriau R., Delacroix D. and Navarro L. (2024). M&M: Multimodal-Multitask Model Integrating Audiovisual Cues in Cognitive Load Assessment. In Proceedings of the 19th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 2: VISAPP; ISBN 978-989-758-679-8, SciTePress, pages 869-876. DOI: 10.5220/0012575100003660


in Bibtex Style

@conference{visapp24,
author={Long Nguyen-Phuoc and Renald Gaboriau and Dimitri Delacroix and Laurent Navarro},
title={M&M: Multimodal-Multitask Model Integrating Audiovisual Cues in Cognitive Load Assessment},
booktitle={Proceedings of the 19th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 2: VISAPP},
year={2024},
pages={869-876},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0012575100003660},
isbn={978-989-758-679-8},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 19th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 2: VISAPP
TI - M&M: Multimodal-Multitask Model Integrating Audiovisual Cues in Cognitive Load Assessment
SN - 978-989-758-679-8
AU - Nguyen-Phuoc L.
AU - Gaboriau R.
AU - Delacroix D.
AU - Navarro L.
PY - 2024
SP - 869
EP - 876
DO - 10.5220/0012575100003660
PB - SciTePress