Convergence  Rate.  Lastly,  we  show  the  loss 
function of  the cross  entropy  (CE)  loss  and  sample 
cross  entropy  (CE+CL2)  on  ResNet50  network  and 
CIFAR-100  dataset  to  evaluate  their  convergence 
rate. For CE + CL2, we only extract the CE 
component  to  be  plotted.  Figure  4  shows  the  two 
plots.  Obviously,  when  the  training  is  regulated  by 
sample  contrastive  loss,  the  cross  entropy  loss 
converges faster compared to without regularization. 
However, the network then converges to roughly the 
same level after epoch 35. The same pattern is 
observed for all other experiments. 
 
Figure 4:  The  cross  entropy loss for CE  and  CE+CL2 on 
ResNet50 and CIFAR-100 dataset. For CE+CL2, only the 
cross entropy loss component is used to plot the graph. With 
CL2 regularization, the cross entropy converges faster. 
6  CONCLUSION 
Deep networks have shown impressive performance 
on  a  number  of  computer  vision  tasks.  However, 
deeper  networks  are  more  susceptible  to  overfitting 
especially when the number of samples per class are 
small. In this work we introduced batch  contrastive 
loss to regularize the network by comparing samples 
in a batch loss. Our experiments show that batch 
contrastive loss has good generalization performance 
especially on deeper network and dataset with smaller 
number of  samples  per  class. It  also  further reveals 
potential  issue  with  the  positive  loss  for  general 
classification  tasks  which  is  a  subject  for  future 
investigation. In the future, we plan to perform more 
evaluation  to  demonstrate  that  the  technique 
generalize well to other datasets as well as tasks (e.g., 
video action classification). We will also look into the 
efficiency issues of contrastive loss.  
ACKNOWLEDGMENT 
This  work  was  supported  by  a  FRGS  grant 
(FRGS/1/2018/ICT02/UTAR/02/03)  from  the 
Ministry of Higher Education (MOHE) of Malaysia. 
REFERENCES 
Chopra,  S.,  Hadsell,  R.,  &  LeCun,  Y.  (2005,  June). 
Learning  a  similarity  metric  discriminatively,  with 
application  to  face  verification.  In 2005 IEEE 
Computer Society Conference on Computer Vision and 
Pattern Recognition (CVPR'05) (Vol. 1, pp. 539-546). 
IEEE. 
Cubuk, E. D., Zoph, B., Mane, D., Vasudevan, V., & Le, Q. 
V.  (2019).  Autoaugment:  Learning  augmentation 
strategies  from  data.  In Proceedings of the IEEE 
conference on computer vision and pattern 
recognition (pp. 113-123). 
DeVries, T., & Taylor, G. W. (2017). Dataset augmentation 
in feature space. arXiv preprint arXiv:1702.05538. 
DeVries,  T.,  &  Taylor,  G.  W.  (2017).  Improved 
regularization  of  convolutional  neural  networks  with 
cutout. arXiv preprint arXiv:1708.04552. 
Gastaldi, X. (2017). Shake-shake regularization of 3-branch 
residual  network. In 5th International Conference on 
Learning Representations 
Ghiasi, G., Lin, T. Y., & Le, Q. V. (2018). Dropblock: A 
regularization  method  for  convolutional  networks. 
In Advances in Neural Information Processing 
Systems (pp. 10727-10737). 
Han, D., Kim, J., & Kim, J. (2017). Deep pyramidal residual 
networks.  In Proceedings of the IEEE conference on 
computer vision and pattern recognition (pp.  5927-
5935). 
He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual 
learning  for  image  recognition.  In Proceedings of the 
IEEE conference on computer vision and pattern 
recognition (pp. 770-778). 
Hermans, A., Beyer, L., & Leibe, B. (2017). In defense of 
the  triplet  loss  for  person  re-identification. arXiv 
preprint arXiv:1703.07737. 
Horiguchi,  S.,  Ikami,  D.,  &  Aizawa,  K.  (2019). 
Significance of softmax-based  features in  comparison 
to  distance  metric  learning-based  features. IEEE 
transactions on pattern analysis and machine 
intelligence, 42(5), 1279-1285. 
Hou,  S.,  &  Wang,  Z  (2019,  July).  Weighted  channel 
dropout for regularization of deep convolutional neural 
network.  In Proceedings of the AAAI Conference on 
Artificial Intelligence (Vol. 33, pp. 8425-8432). 
Huang, G., Sun, Y., Liu, Z., Sedra, D., & Weinberger, K. 
Q.  (2016,  October).  Deep  networks  with  stochastic 
depth. In European conference on computer vision (pp. 
646-661). Springer, Cham. 
Huang,  Y.,  Cao,  X.,  Zhang,  B.,  Zheng,  J.,  &  Kong,  X. 
(2017, April). Batch loss regularization in deep learning 
method  for  aerial  scene  classification.  In 2017 
Integrated Communications, Navigation and 
Surveillance Conference (ICNS) (pp. 3E2-1). IEEE. 
Indyk,  P.,  &  Motwani,  R.  (1998,  May).  Approximate 
nearest  neighbors:  towards  removing  the  curse  of 
dimensionality. In Proceedings of the thirtieth annual 
ACM symposium on Theory of computing (pp.  604-
613).