Data-Efficient Transformer-Based 3D Object Detection

Aidana Nurakhmetova, Jean Lahoud, Hisham Cholakkal

2023

Abstract

Recent 3D detection models rely on Transformer architecture due to its natural ability to abstract global context features. One is the 3DETR network - a pure transformer-based model designed to generate 3D boxes on indoor dataset scans. It is generally known that transformers are data-hungry. However, data collection and annotation in 3D are more challenging than in 2D. Thus, our goal is to study the data-hungriness of the 3DETR-m model and propose a solution for its data efficiency. Our methodology is based on the observation that PointNet++ provides more locally aggregated features that can be useful to support 3DETR-m prediction on small dataset problem. We suggest three methods of backbone fusion that are based on addition (Fusion I), concatenation (Fusion II), and replacement (Fusion III). We utilize pre-trained weights from the Group-free model trained on the SUN RGB-D dataset. The proposed 3DETR-m outperforms the original model in all data proportions (10%, 25%, 50%, 75%, and 100%). We improve 3DETR-m paper results by 1.46% and 2.46% in mAP@25 and mAP@50 on the full dataset. Hence, we believe our research efforts can provide new insights into the data-hungriness issue of 3D transformer detectors and inspire the usage of pre-trained models in 3D as one way towards data efficiency.

Download


Paper Citation


in Harvard Style

Nurakhmetova A., Lahoud J. and Cholakkal H. (2023). Data-Efficient Transformer-Based 3D Object Detection. In Proceedings of the 18th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP 2023) - Volume 4: VISAPP; ISBN 978-989-758-634-7, SciTePress, pages 615-623. DOI: 10.5220/0011673200003417


in Bibtex Style

@conference{visapp23,
author={Aidana Nurakhmetova and Jean Lahoud and Hisham Cholakkal},
title={Data-Efficient Transformer-Based 3D Object Detection},
booktitle={Proceedings of the 18th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP 2023) - Volume 4: VISAPP},
year={2023},
pages={615-623},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0011673200003417},
isbn={978-989-758-634-7},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 18th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP 2023) - Volume 4: VISAPP
TI - Data-Efficient Transformer-Based 3D Object Detection
SN - 978-989-758-634-7
AU - Nurakhmetova A.
AU - Lahoud J.
AU - Cholakkal H.
PY - 2023
SP - 615
EP - 623
DO - 10.5220/0011673200003417
PB - SciTePress