
Table 1: Comparison on ”GTAV to Cityscapes” in terms of per-class IoUs and mIoU (%).
Method Base Model road sdwk bldng wall fence pole light sign vegtn trm sky person rider car truck bus train mcycl bcycl mIoU
DPR (Ding et al., 2019) ResNet-101 92.3 51.9 82.1 29.2 25.1 24.5 33.8 33.0 82.4 32.8 82.2 58.6 27.2 84.3 33.4 46.3 2.2 29.5 32.3 46.5
SIBAN (Sibi et al., 2019) ResNet-101 88.5 35.4 79.5 26.3 24.3 28.5 32.5 18.3 81.2 40.0 76.5 58.1 25.8 82.6 30.3 34.4 3.4 21.6 21.5 42.6
AdaptSeg (Tsai et al., 2018) ResNet-101 86.5 36.0 79.9 23.4 23.3 23.9 35.2 14.8 83.4 33.3 75.6 58.6 27.6 73.7 32.5 35.4 3.9 30.1 28.1 42.4
CLAN (Liu et al., 2020a) ResNet-101 87.0 27.1 79.6 27.3 23.3 28.3 35.5 24.2 83.6 27.4 74.2 58.6 28.0 76.2 33.1 36.7 6.7 31.9 31.4 43.2
DISE (Zhao et al., 2019) ResNet-101 91.5 47.5 82.5 31.3 25.6 33.0 33.7 25.8 82.7 28.8 82.7 62.4 30.8 85.2 27.7 45.0 6.4 25.2 24.4 45.4
AdvEnt (Vu et al., 2019b) ResNet-101 89.4 33.1 81.0 26.6 26.8 27.2 33.5 24.7 83.9 36.7 78.8 58.7 30.5 84.8 38.5 44.5 1.7 31.6 32.4 45.5
MSL (Liu et al., 2020b) ResNet-101 89.4 43.0 82.1 30.5 21.3 30.3 34.7 24.0 85.3 39.4 78.2 63.0 22.9 84.6 36.4 43.0 5.5 34.7 33.5 46.4
DLOW (Shaban et al., 2018) ResNet-101 87.1 33.5 80.5 24.5 13.2 29.8 29.5 26.6 82.6 26.7 81.8 55.9 25.3 78.0 33.5 38.7 6.0 22.9 34.5 42.3
Ours ResNet-101 90.51 44.01 79.31 32.61 30.71 49.00 52.60 42.51 79.71 37.61 79.81 78.10 30.50 95.90 46.31 46.11 38.40 24.01 62.60 55.80
43.2 and 42.6, respectively, and relatively lower per-
formance on specific classes like bus and truck.
6 CONCLUSION
We present a novel approach to unsupervised domain
adaptation by leveraging the encoder-decoder frame-
work with a memory-based regularization technique.
Our method utilizes intra-domain knowledge to re-
duce uncertainty during model learning, without in-
troducing additional parameters or external modules.
By using the model itself as a memory module, we
achieve an elegant and efficient regularization of the
training process. Despite its simplicity, our approach
complements existing methods and delivers compet-
itive performance on two prominent synthetic-to-real
benchmarks: GTA5 to Cityscapes.
Our results demonstrate that the proposed model
effectively addresses challenges in domain adapta-
tion, achieving robust segmentation performance by
reducing the domain gap. The integration of memory-
based regularization highlights the potential for lever-
aging inherent model properties to improve training
stability and accuracy.
Future enhancements could focus on designing
models that are inherently robust to environmental
variations, such as changes in lighting, texture, and
adverse conditions. Additionally, advancements in
adversarial learning techniques, such as improved
methodologies inspired by CycleGAN, may further
enhance domain correspondences. Self-supervised
learning approaches could also play a significant
role in reducing dependency on annotated datasets
while fostering the extraction of domain-invariant fea-
tures. Finally, exploring segmentation models based
on transformers and expanding testing across diverse
datasets, including scenarios with low lighting and
adverse weather, can provide deeper insights into the
adaptability of the proposed system.
REFERENCES
Bousmalis, K., Frosio, N. D., Bursuc, D., Hays, J.,
Tsoumakas, G., and Metaxas, D. N. (2016). Domain
separation networks. In Proceedings of the IEEE Con-
ference on Computer Vision and Pattern Recognition
(CVPR), pages 1709–1717.
Chen, W., Wei, Y., Yang, Y., Wang, Z., Li, W., and Wang, X.
(2020). Contrastive learning for unsupervised domain
adaptation. In Proceedings of the IEEE Conference
on Computer Vision and Pattern Recognition (CVPR),
pages 7568–7577.
Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler,
M., F., R., Y., A., and Schiele, B. (2016). The
cityscapes dataset for semantic urban scene under-
standing. Proceedings of the IEEE Conference on
Computer Vision and Pattern Recognition (CVPR),
pages 3213–3223.
Ding, Z., Wang, Q., Huang, J., Zhang, K., and Xie, L.
(2019). Dpr: Domain propagation network for cross-
domain semantic segmentation. In Proceedings of the
IEEE/CVF International Conference on Computer Vi-
sion (ICCV), pages 3330–3340.
Hoffman, J., Tzeng, E., Park, T., Saenko, K., and Darrell, T.
(2018). Cycada: Cycle-consistent adversarial domain
adaptation. In Proceedings of the International Con-
ference on Machine Learning (ICML), pages 1996–
2005.
Liu, L., Lu, H., Lee, L., Yang, M., Wong, T.-L., Wu, D., and
Lin, Y.-W. (2020a). Clan: Class-wise alignment for
unsupervised domain adaptation in semantic segmen-
tation. In Proceedings of the IEEE/CVF Conference
on Computer Vision and Pattern Recognition (CVPR),
pages 2249–2257.
Liu, W., Chen, B., Zhang, Z., Li, X., and Li, X. (2020b).
Msl: Multi-scale learning for domain adaptation
in semantic segmentation. In Proceedings of the
IEEE/CVF Conference on Computer Vision and Pat-
tern Recognition (CVPR), pages 4706–4715.
Richter, S., D., V. W. W., M., R., R., A., A., M., and A., M.
(2016). Playing for data: Ground truth from computer
games. In Proceedings of the European Conference on
Computer Vision (ECCV), pages 102–118. Springer.
Sankaranarayanan, S., Balaji, Y., Jain, A., Kumar, S. R.,
and Gupta, A. (2018). Learning from synthetic data:
Addressing domain shift for semantic segmentation.
Bridging the Synthetic-Real Gap: Unsupervised Domain Adaptation for Cross-Domain Image Segmentation
261