
Table 1: Accuracy of the proposed method and U-Net. 
 
 
However,  the  size  of  feature  maps  of  shallow 
layer at encoder part and that of beginning layer at 
decoder part is different. Thus, we use pooling to be 
the same size. Similarly, since the size of deep layer 
at encoder part and that of final layer at decoder part 
is different, we use unpooling to be the same size.  
We use  batch normalization (Ioffe  and Szegedy, 
2015) at each layer though original U-net did not use 
it. Class balancing (Badrinarayanan et al., 2016) is 
also used to improve the segmentation accuracy of 
objects with small area. 
3  EXPERIMENTS 
We  show  experimental  results  on  semantic 
segmentation in Red Relief Image Map. At first, we 
explain  the  dataset  that  we  use  in  the  following 
experiments in section 3.1. Comparison methods are 
explained  in  section  3.2.  Experimental  results  are 
shown in section 3.3. 
3.1  Dataset 
In this paper, we use eleven Red Relief Image Maps. 
Five  images  are  used  for  training  images  and 
remaining six images are used for test. Since some 
quantity of training images are necessary for training 
deep learning, we crop a local region of 256 x 256 
pixels  with  overlapped  ratio  0.7  from  Red  Relief 
Image Map of 1,500 x 2,000 pixels. In addition, we 
rotate  those  cropped  regions  at  the  interval  of  90 
degrees to enlarge the number of training images. As 
a result, the number of training images is 7,344. Test 
regions  of  256  x  256  pixels  are  cropped  without 
overlap  from  the  original  six  images.  The  total 
number of test regions is 185. 
3.2  Comparison Methods 
We  compare  our  method  with  some  networks 
including the original U-net. The first method is the 
U-Net. The second method is our proposed method. 
When we concatenate the feature maps of different 
resolution, the size of each feature map is changed 
by  pooling  and  convolution  or  unpooling  and 
deconvolution. We call this method “UX-Net1”.  
The  third method  is also  our method  but we  do 
not  use  convolution  and  deconvolution  when  we 
change  the  size  of  feature  map.  Only  pooling  and 
unpooling  are  used  to  change  the  size  of  feature 
maps. We call this network “UX-Net2”. 
3.3  Experimental Results 
We show the experimental results of all methods. As 
evaluation measure, we use the pixel-wise accuracy 
and  class average  accuracy. Pixel-wise  accuracy is 
the  accuracy  in  all  pixels.  This  is  influenced  by 
objects  of  large  area  such  as  background.  Class-
average  accuracy  is  the  average  accuracy  of  each 
class.  This  is  influenced  by  objects  of  small  area 
such as defective areas by trees, road and river. In this 
paper, class average accuracy is more important than 
pixel-wise  accuracy  because  we  want  to  segment 
defective areas by trees, road and river well.   
We show the segmentation results of all methods 
in Figure 3 and 4. The first row shows input image 
and ground  truth label.  The second rows show the 
result  by  U-Net  and  UX-Net1.  The  bottom  row 
shows the result by UX-Net2. 
We show the pixel-wise accuracy  and the class-
average  accuracy  of  each  method  in  Table  1.  The 
best result at each class is shown in red. 
We  found that  our proposed UX-Net  has higher 
accuracy for defective areas by trees, road and river 
than the original U-Net. The pixel-wise accuracy of 
the  proposed  method  is  worse  than  the  U-net 
because the pixel-wise accuracy is influenced by the 
background which is not the main target. 
Note that our proposed method can improve the 
accuracy of defective areas by trees that are hard to 
segment  by  the U-net.  This  is  because we  use  the 
“X-path”  that  the  fine  information  obtained  at 
shallow  layer  is  used  in  deep  layer  and  semantic 
information obtained at deep layer is used to general 
the  final  segmentation  result.  When  we  compare 
UX-Net1 with UX-Net2, UX-Net2 gave better result 
than  UX-Net1.  The  main  difference  is  how  to 
change the feature map. Experimental results show 
that  only  pooling  and  unpooling  is  effective  to 
change  the  size.  When  we  use  pooling  and 
Semantic Segmentation in Red Relief Image Map by UX-Net
599