the model trained with traditional cross-entropy loss, 
the red regions are concentrated mainly in the middle 
of the image, failing to effectively differentiate 
between the two classes of samples. The experimental 
results and visual inspection demonstrate that as 
expected, Ordinal Loss enables the model to better 
distinguish between adjacent classes, thus improving 
the performance on ordinal regression tasks. 
 
Figure 4: Visual inspection of models trained with two 
different loss functions using GradCAM
 
(Photo/Picture 
credit: Original).
 
4 CONCLUSIONS 
This study concentrates on utilizing transformer 
models for image classification tasks on MedMNIST 
and enhancing the performance of ordinal regression 
subtasks using a novel loss function. The MedViT 
model, a hybrid architecture combining CNN and 
transformer, is employed to classify all 12 2D datasets 
in MedMNIST and compared against classical CNN 
models. Experimental findings reveal that MedViT, 
adept at capturing multi-scale features, showcases 
significant advantages over traditional methods, 
yielding superior performance across most of the 12 
datasets. The development of Ordinal Loss aims to 
address the observed performance limitations across 
all models on the ordinal regression subdataset, 
RetinaMNIST. This loss function combines 
traditional cross-entropy loss with Rank Loss, 
emphasizing similarity relationships between ordered 
categories during model training. Comparative 
experiments with unmodified cross-entropy loss 
demonstrate that models trained with Ordinal Loss 
achieve higher accuracy on RetinaMNIST for ordinal 
regression tasks. Visual inspection using GradCAM 
further illustrates that Ordinal Loss enables the model 
to better discern key features for distinguishing 
adjacent categories. In the realm of fine-grained 
recognition, certain methods enhance model 
performance by learning pairs of intra-class and inter-
class similar samples. In future research, this 
approach could also be considered for integration into 
the ordinal regression task to further enhance the 
model's ability to discern similar samples effectively. 
REFERENCES 
Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, 
D., Zhai, X., Unterthiner, T., ... & Houlsby, N. 2020. 
An image is worth 16x16 words: Transformers for 
image recognition at scale. arXiv:2010.11929. 
He, K., Zhang, X., Ren, S., & Sun, J. 2016. Deep residual 
learning for image recognition. In Proceedings of the 
IEEE conference on computer vision and pattern 
recognition. pp: 770-778. 
Heo, B., Yun, S., Han, D., Chun, S., Choe, J., & Oh, S. J. 
2021. Rethinking spatial dimensions of vision 
transformers. In Proceedings of the IEEE/CVF 
international conference on computer vision. pp: 
11936-11945. 
Hu, Q., Chen, C., Kang, S., Sun, Z., Wang, Y., Xiang, M., ... 
& Wang, S. 2022. Application of computer-aided 
detection (CAD) software to automatically detect 
nodules under SDCT and LDCT scans with different 
parameters. Computers in Biology and Medicine, vol. 
146, p: 105538. 
Hu, W., Li, C., Li, X., Rahaman, M. M., Ma, J., Zhang, Y., ... 
& Grzegorzek, M. 2022. GasHisSDB: A new gastric 
histopathology image dataset for computer aided 
diagnosis of gastric cancer. Computers in biology and 
medicine, vol. 142, p: 105207. 
Lo, C. M., & Hung, P. H. 2022. Computer-aided diagnosis 
of ischemic stroke using multi-dimensional image 
features in carotid color Doppler. Computers in Biology 
and Medicine, vol. 147, p: 105779.  
Manzari, O. N., Ahmadabadi, H., Kashiani, H., Shokouhi, 
S. B., & Ayatollahi, A. 2023. MedViT: a robust vision 
transformer for generalized medical image 
classification. Computers in Biology and Medicine, vol. 
157, p: 106791. 
Simonyan, K., & Zisserman, A. 2014. Very deep 
convolutional networks for large-scale image 
recognition. arXiv:1409.1556. 
Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., ... & Ni, 
B. 2023. Medmnist v2-a large-scale lightweight 
benchmark for 2d and 3d biomedical image 
classification. Scientific Data, vol. 10(1), p: 41. 
Yang, X., & Stamp, M. 2021. Computer-aided diagnosis of 
low grade endometrial stromal sarcoma (LGESS). 
Computers in Biology and Medicine, vol. 138, p: 
104874.