
Figure 4: The above images show the results of the detection of novel classes.
indicates that the model is capable of generalizing
novel classes with limited training data.
5 CONCLUSION AND FUTURE
SCOPE
The goal of this paper was to achieve object de-
tection for novel classes of objects. Object detec-
tion is done using the Detectron2 module. The two-
stage training helps the model to generalize better
and image augmentation to the second layer makes
the model identify objects even if they are differ-
entiable.From the results achieved, it is evident that
this technique performs better than the existing FSOD
methods. There is an improvement of 18.8% in
AP(75) score in the 10-shot setting, the model also
showed a 49% improvement in AP score in the 10-
shot setting. This proves that the current state-of-the-
art scores have been surpassed by a decent margin.
Furthermore, improvements can be made in image
augmentation techniques so that the model can easily
grasp every minute detail. Also, techniques like local
feature extraction can be added so to retrieve just the
important features from the image which would result
in better classification.
REFERENCES
Cubuk, E. D., Zoph, B., Mane, D., Vasudevan, V., and Le,
Q. V. (2019). Autoaugment: Learning augmentation
strategies from data. In Proceedings of the IEEE/CVF
conference on computer vision and pattern recogni-
tion, pages 113–123.
DeVries, T. (2017). Improved regularization of convolu-
tional neural networks with cutout. arXiv preprint
arXiv:1708.04552.
Fan, Q., Zhuo, W., Tang, C.-K., and Tai, Y.-W. (2020). Few-
shot object detection with attention-rpn and multi-
relation detector. In Proceedings of the IEEE/CVF
conference on computer vision and pattern recogni-
tion, pages 4013–4022.
Guan, W., Yang, Z., Wu, X., Chen, L., Huang, F., He, X.,
and Chen, H. (2024). Efficient meta-learning enabled
lightweight multiscale few-shot object detection in re-
mote sensing images.
Han, G. and Lim, S.-N. (2024). Few-shot object detec-
tion with foundation models. In Proceedings of the
IEEE/CVF Conference on Computer Vision and Pat-
tern Recognition (CVPR), pages 28608–28618.
He, K., Gkioxari, G., Doll
´
ar, P., and Girshick, R. (2017).
Mask r-cnn. In Proceedings of the IEEE international
conference on computer vision, pages 2961–2969.
He, K., Zhang, X., Ren, S., and Sun, J. (2016). Deep resid-
ual learning for image recognition. In Proceedings of
the IEEE conference on computer vision and pattern
recognition, pages 770–778.
Hou, R., Chang, H., MA, B., Shan, S., and Chen, X.
(2019). Cross attention network for few-shot classi-
fication. In Wallach, H., Larochelle, H., Beygelzimer,
A., d'Alch
´
e-Buc, F., Fox, E., and Garnett, R., editors,
Advances in Neural Information Processing Systems,
volume 32. Curran Associates, Inc.
Jian, Y., Yu, F., Singh, S., and Stamoulis, D. (2023). Stable
diffusion for aerial object detection.
Kang, B., Liu, Z., Wang, X., Yu, F., Feng, J., and Dar-
rell, T. (2019). Few-shot object detection via feature
reweighting. In Proceedings of the IEEE/CVF Inter-
national Conference on Computer Vision (ICCV).
Lin, S., Wang, K., Zeng, X., and Zhao, R. (2023). Explore
the power of synthetic data on few-shot object detec-
tion. In Proceedings of the IEEE/CVF Conference
on Computer Vision and Pattern Recognition (CVPR)
Workshops, pages 638–647.
Lin, T.-Y., Doll
´
ar, P., Girshick, R., He, K., Hariharan, B.,
and Belongie, S. (2017). Feature pyramid networks
for object detection. In Proceedings of the IEEE con-
ference on computer vision and pattern recognition,
pages 2117–2125.
Few-Shot Object Detection Using Two Stage Fine Tuning Approach with Data Augmentation
245