loading
Papers Papers/2022 Papers Papers/2022

Research.Publish.Connect.

Paper

Authors: Kishaan Jeeveswaran ; Senthilkumar Kathiresan ; Arnav Varma ; Omar Magdy ; Bahram Zonooz and Elahe Arani

Affiliation: Advanced Research Lab, NavInfo Europe, Eindhoven, The Netherlands

Keyword(s): Vision Transformer, Convolutional Neural Networks, Robustness, Texture-bias, Object Detection, Semantic Segmentation.

Abstract: Convolutional Neural Networks (CNNs), architectures consisting of convolutional layers, have been the standard choice in vision tasks. Recent studies have shown that Vision Transformers (VTs), architectures based on self-attention modules, achieve comparable performance in challenging tasks such as object detection and semantic segmentation. However, the image processing mechanism of VTs is different from that of conventional CNNs. This poses several questions about their generalizability, robustness, reliability, and texture bias when used to extract features for complex tasks. To address these questions, we study and compare VT and CNN architectures as a feature extractor in object detection and semantic segmentation. Our extensive empirical results show that the features generated by VTs are more robust to distribution shifts, natural corruptions, and adversarial attacks in both tasks, whereas CNNs perform better at higher image resolutions in object detection. Furthermore, our re sults demonstrate that VTs in dense prediction tasks produce more reliable and less texture biased predictions. (More)

CC BY-NC-ND 4.0

Sign In Guest: Register as new SciTePress user now for free.

Sign In SciTePress user: please login.

PDF ImageMy Papers

You are not signed in, therefore limits apply to your IP address 54.146.97.79

In the current month:
Recent papers: 100 available of 100 total
2+ years older papers: 200 available of 200 total

Paper citation in several formats:
Jeeveswaran, K.; Kathiresan, S.; Varma, A.; Magdy, O.; Zonooz, B. and Arani, E. (2022). A Comprehensive Study of Vision Transformers on Dense Prediction Tasks. In Proceedings of the 17th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP 2022) - Volume 4: VISAPP; ISBN 978-989-758-555-5; ISSN 2184-4321, SciTePress, pages 213-223. DOI: 10.5220/0010917800003124

@conference{visapp22,
author={Kishaan Jeeveswaran. and Senthilkumar Kathiresan. and Arnav Varma. and Omar Magdy. and Bahram Zonooz. and Elahe Arani.},
title={A Comprehensive Study of Vision Transformers on Dense Prediction Tasks},
booktitle={Proceedings of the 17th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP 2022) - Volume 4: VISAPP},
year={2022},
pages={213-223},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0010917800003124},
isbn={978-989-758-555-5},
issn={2184-4321},
}

TY - CONF

JO - Proceedings of the 17th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP 2022) - Volume 4: VISAPP
TI - A Comprehensive Study of Vision Transformers on Dense Prediction Tasks
SN - 978-989-758-555-5
IS - 2184-4321
AU - Jeeveswaran, K.
AU - Kathiresan, S.
AU - Varma, A.
AU - Magdy, O.
AU - Zonooz, B.
AU - Arani, E.
PY - 2022
SP - 213
EP - 223
DO - 10.5220/0010917800003124
PB - SciTePress