Enhancing NnU-Net for Improved Medical Image Segmentation: A

Comparative Study with TotalSegmentator

Jingyi Wu

Sydney Institute of Intelligent Technology, Northeastern University Qinhuangdao, Qinhuangdao, China

Keywords: Medical Image Segmentation, nnU Net, Loss Function Optimization, Data Augmentation, Multimodal Imaging.

Abstract: In this paper, an optimization method is proposed that relies on the no-new-Net (nnU Net) architecture to

improve the performance of medical image segmentation tasks. Medical image segmentation is an important

component of disease diagnosis, treatment planning, and surgical assistance. Since its launch in 2018, nnU

Net has become a fundamental tool in this field by adapting its architecture, preprocessing, and training

strategies. However, current models still have shortcomings in handling data imbalance and multimodal

images. For this purpose, the paper optimized the loss function and data augmentation strategy of nnU Net.

By increasing the Dice loss weight, the model can more effectively handle small structures and imbalanced

data, improving segmentation accuracy. Furthermore, by incorporating higher rotation probability, noise

enhancement, and low-resolution simulation into the improved data augmentation technique, the model's

robustness and capacity for generalization are greatly increased. The experimental results demonstrate that

the upgraded nnU Net performs much better than TotalSegmentor in terms of segmentation accuracy and

complicated boundary handling, especially when compared to metrics like Dice Score, IoU, and Hausdorff

Distance.

1 INTRODUCTION

A basic task in medical image analysis, medical

picture segmentation is essential for many

applications, including disease diagnosis, therapy

planning, and surgical support. One of the approaches

that is most frequently utilized in this field is the U-

Net architecture and its variations. With its self-

configuring framework that automatically adjusts its

architecture, preprocessing, and training algorithms

to each dataset, no-new-Net (nnU-Net), which was

introduced in 2018, revolutionized the domain.

Despite its success, further improvements are

necessary in areas such as data augmentation and loss

function optimization, as specific adjustments could

yield better performance, particularly when handling

diverse datasets.

By utilizing the most recent developments in nnU-

Net, TotalSegmentator expands its capabilities to

multi-class segmentation in Magnetic resonance

imaging (MRI) as well as Computed Tomography

(CT) image modalities, producing impressive

outcomes. Figure 1 illustrates an MRI and CT scan

https://orcid.org/0009-0005-3472-5768

example. However, there remains room for

improvement, especially in balancing the loss

function and enhancing training data through more

sophisticated augmentation techniques.

Figure 1: Example of CT and MRI (

Kumar et al, 2021)

In order to overcome the current obstacles in

medical picture segmentation, this research optimizes

two crucial nnU-Net model components: (1)

Adjusting the loss function weights to better balance

segmentation precision across different anatomical

structures and improve performance on imbalanced

datasets; (2) Enhancing the data augmentation

strategy to improve the model's robustness to

variations in medical imaging data, aiming to boost

segmentation accuracy and generalization in real-

246

Wu and J.

Enhancing NnU-Net for Improved Medical Image Segmentation: A Comparative Study with TotalSegmentator.

DOI: 10.5220/0013515100004619

In Proceedings of the 2nd International Conference on Data Analysis and Machine Learning (DAML 2024), pages 246-251

ISBN: 978-989-758-754-2

world applications. These improvements strengthen

the model’s resilience and ensure broader

applicability in practical scenarios.

Building upon the nnU-Net framework, the paper

introduced targeted optimizations to further enhance

its performance in segmentation tasks. Through

extensive experimentation, the results show that these

modifications significantly improve both accuracy

and generalization compared to the original nnU-Net

and TotalSegmentator models. The primary

contributions of this paper are as follows:

• The paper proposes a novel adjustment to the

loss function in nnU-Net, optimizing the weight

distribution to better handle class imbalance.

• The model's generalization to new and unseen

data is improved by the paper's use of more varied and

realistic transformations in the data augmentation

technique.

• The experimental results validate the efficacy

of the approach by showing that the upgraded nnU-

Net regularly outperforms TotalSegmentator across

key assessment parameters.

In the following sections, the paper will provide a

detailed description of the methodology,

experimental setup, and the results validating the

proposed improvements.

2 RELATED WORKS

Medical image segmentation, a fundamental task in

medical image analysis, plays an important role in

various applications such as organ localization, lesion

detection, and treatment planning. Early

segmentation methods mainly relied on rule-based or

feature-based techniques such as region growing,

watershed, and level set methods (Fischl et al., 2004).

With the rise of deep learning, convolutional neural

networks (CNNs) emerged as the leading technology

in medical image segmentation, particularly after the

introduction of the U-Net model, which led to

significant advancements (Ronneberger et al., 2015).

The U-Net architecture, as depicted in Figure 2, is

renowned for its U-shaped design, featuring skip

connections between the encoder and decoder, which

greatly enhance segmentation accuracy (Çiçek et al.,

2016). Introduced in 2018, NnU-Net is a self-

adapting version of U-Net that serves as a general

baseline for medical image segmentation by

automating architecture tweaks, preprocessing, and

training procedures to suit various datasets (Isensee et

al., 2021). This model has excelled in multiple

international segmentation challenges, showcasing

high versatility and adaptability.

Figure 2: The architecture of U-Net (Çiçek et al., 2016)

Despite nnU-Net’s success, recent studies suggest

that its performance on specific tasks can be further

optimized. Research shows that introducing adaptive

weighting in loss functions and improving data

augmentation strategies can enhance both robustness

and precision (Roy et al., 2018). This study builds on

nnU-Net’s framework, with a focus on improving

performance in handling imbalanced data and highly

diverse medical imaging datasets.

Enhancing NnU-Net for Improved Medical Image Segmentation: A Comparative Study with TotalSegmentator

247

TotalSegmentator, an open-source model based

on the nnU-Net framework, was initially developed

for CT image segmentation and later extended to

perform multi-structure segmentation in MRI images

(Wasserthal et al., 2023). TotalSegmentator is a

versatile tool for multi-modality segmentation tasks,

thanks to its sequence-independent nature, enabling it

to segment 59 anatomical structures, including

organs, bones, muscles, and vessels (Akinci

D'Antonoli et al., 2023). By integrating large clinical

datasets, TotalSegmentator demonstrates robustness

in various applications, especially in handling

different MRI sequences. However, its performance

is still challenged in the segmentation of fine

structures, such as those in blurred or low-contrast

regions (Hatamizadeh et al., 2021). This opens an

opportunity to enhance segmentation performance by

optimizing nnU-Net’s loss function and data

augmentation strategies.

In addition to TotalSegmentator, other U-Net-

based segmentation models have emerged in recent

years. For example, 3D U-Net (Çiçek et al., 2016)

extends U-Net to process 3D image data, while

SwinUNETR (Hatamizadeh et al., 2021) combines

Transformer architecture with U-Net to capture long-

range dependencies. However, these models often

come with higher computational costs and fall short

in multi-modality and sequence diversity tasks

compared to TotalSegmentator.

The design of loss functions plays a crucial role in

deep learning-based segmentation tasks, particularly

when dealing with class imbalance and small target

segmentation. Traditional cross-entropy loss often

favors large classes, leading to poor performance in

smaller classes (Sudre et al., 2017). To address this

issue, weighted loss functions such as Dice loss

(Milletari et al., 2016) and Tversky loss (Salehi et al.,

2017) have been introduced to handle imbalanced

data and multi-class segmentation tasks more

effectively. By adjusting the weights of different

classes, these methods improve segmentation

accuracy for small classes and boundary regions.

In terms of data augmentation, traditional

techniques such as rotation, scaling, and translation

are commonly used. However, recent studies have

shown that more advanced augmentation techniques,

such as random cropping, brightness and contrast

adjustment, and elastic deformation, can significantly

improve model robustness (DeVries & Taylor, 2017).

These techniques generate more diverse training data,

enabling models to better generalize to unseen

clinical images. Moreover, adaptive data

augmentation techniques based on deep learning are

continuously evolving, allowing dynamic adjustment

of augmentation strategies based on data

characteristics, further enhancing model performance

(Zhang et al., 2018).

The innovation of this study lies in modifying

nnU-Net’s loss function weights and optimizing its

data augmentation strategy to further improve

performance in medical image segmentation tasks.

These modifications build on previous research

findings and demonstrate superior performance

compared to TotalSegmentator in practical

applications.

3 METHODOLOGIES

3.1 Loss Function Adjustment

The loss function plays a critical role in guiding the

optimization of deep learning models, particularly in

medical image segmentation, where it directly

impacts model performance on complex and

imbalanced datasets. Dice loss emphasizes improving

segmentation accuracy for small structures, while

cross-entropy loss focuses on the overall

segmentation accuracy. Balancing the weights of

these two losses is crucial for achieving optimal

model accuracy.

The paper increased the weight of the Dice loss

from 1 to 1.5 and set the cross-entropy loss weight to

0.5. This adjustment directs the model to focus more

on small structures, prioritizing their segmentation

during optimization while maintaining the overall

accuracy of larger structures and global segmentation.

These adjustments help the model perform better

on imbalanced data, particularly for small targets,

allowing for more precise segmentation. This is

crucial in medical image segmentation tasks, such as

tumor or lesion detection, where increasing the Dice

loss weight reduces the model’s tendency to

overemphasize the background or large structures,

thereby improving the segmentation accuracy of

smaller targets. These changes enhance the model’s

sensitivity to small object recognition, ultimately

improving overall segmentation accuracy and

boundary detail handling.

3.2 Data Augmentation Strategy

Optimization

In order to enhance the generalization ability of the

model and avoid overfitting, data augmentation

requires introducing various random transformations

(such as rotation, scaling, and noise) into the training

set. By exposing the model to more diverse data, it

enhances real-world performance and strengthens its

DAML 2024 - International Conference on Data Analysis and Machine Learning

248

robustness and adaptability in testing or inference

processes. This is particularly important in medical

image segmentation, as data variability arises from

differences in patients, imaging conditions, and noise

levels.

The paper increased the rotation probability to 0.3

to simulate anatomical structures from different

orientations. The paper also extended the variance

range of Gaussian noise to (0, 0.2) and set its

application probability to 0.15 to help the model

handle varying levels of image noise. For low-

resolution simulation, the paper adjusted the scaling

range to (0.7, 1) and increased the application

probability to 0.3, allowing the model to adapt to low-

quality or down-sampled images.

These adjustments significantly improved the

model’s adaptability to data variations. Increasing the

rotation probability allowed the model to handle more

diverse anatomical orientations, while noise

augmentation improved stability in noisy

environments. Low-resolution simulation ensured

that the model could handle varying image

resolutions, maintaining high segmentation accuracy

even with low-quality input. These improvements are

particularly valuable in medical image segmentation,

where models need to be robust and generalizable in

clinical applications.

3.3 Deep Supervision and Multi-scale

Loss

Deep supervision and multi-scale loss help guide the

model at different resolutions, making feature

extraction across various scales more accurate. Deep

supervision enables the model to learn segmentation

information at multiple levels during training, which

is particularly useful for handling complex

anatomical structures with intricate boundaries.

Multi-scale loss weighting ensures that the model

remains efficient during fine-grained segmentation.

In the DeepSupervisionWrapper, the paper

adjusted the multi-scale loss weights by assigning

higher weights to high-resolution outputs, thereby

enhancing the model's focus on fine-grained

segmentation. This adjustment ensures that the model

maintains a balance in feature extraction across

different resolutions while emphasizing high-

resolution outputs.

This modification improves the model's ability to

handle complex boundaries, particularly when

segmenting small or blurred anatomical structures.

By increasing the weight of high-resolution outputs,

the model is better equipped to handle anatomical

detail, significantly reducing Hausdorff distance and

producing more precise segmentation boundaries.

4 EXPERIMENTAL SETUP

4.1 Dataset

This experiment's brain MRI dataset, which includes

samples required for both training and testing, was

obtained from TotalSegmentator. The labels and

photos are included with the data, which is supplied

in nii.gz format. To make sure the model can be

applied to different situations, a five-fold cross-

validation technique is used. This dataset is perfect

for evaluating and verifying the effectiveness of

medical picture segmentation algorithms because to

its intricate anatomical structures and thorough

labeling. The model's capacity to handle complicated

medical pictures, notably in segmenting small

structures and handling multimodal problems, may be

assessed by the study using this dataset.

4.2 Evaluation Metrics

The segmentation performance of the model was

thoroughly evaluated by the article through the

utilization of several metrics. Dice Score is a useful

tool for assessing segmentation accuracy in tiny

regions and handling imbalanced data since it

assesses the overlap between expected outcomes and

ground truth. By determining the ratio between the

intersection and union of the anticipated and actual

regions, Intersection over Union (IoU) offers a more

rigorous evaluation that gauges prediction accuracy.

The model's capacity to identify the target regions and

steer clear of false positives is measured by sensitivity

and specificity, respectively. These two metrics,

which show how well the algorithm detects lesions

while ignoring normal tissue, are crucial for medical

picture segmentation. Last but not least, Hausdorff

Distance assesses segmentation boundary precision

to make sure the model faithfully represents intricate

structural elements. These metrics were selected

because they allow for a thorough evaluation of the

model's performance in a number of areas, from

overall segmentation accuracy to boundary

management and false detection control—a crucial

component of medical picture segmentation model

optimization and assessment.

4.3 Experimental Procedure

The model training was conducted on a high-

performance computing environment equipped with

an NVIDIA RTX 3090 GPU, AMD 5800X CPU,

32GB of RAM, and over 200GB of storage space.

The system operated on Python 3.10.12 and the

Pytorch 2.4.0+cu121 deep learning framework,

Enhancing NnU-Net for Improved Medical Image Segmentation: A Comparative Study with TotalSegmentator

249

ensuring efficient training in an optimized hardware

and software environment. M.2 SSD was utilized for

data storage to maximize data read and write speeds.

The training followed the standard nnU-Net five-fold

cross-validation pipeline. First, preprocessing was

applied to the brain MRI data, including adjusting the

format and resolution. Each fold was trained using

high-resolution 3D data. After completing the

training, five models were generated for performance

evaluation. The training process also incorporated

deep supervision and multi-scale loss strategies,

ensuring the model could learn detailed features at

various scales, thus enhancing segmentation

precision.

The PolyLRScheduler dynamically adjusted the

learning rate during training, with the initial learning

rate for hyperparameter values set to 1e-2. With a

weight decay of 3e-5 and a momentum parameter of

0.99, SGD was the optimizer that was employed. Data

augmentation strategies were adjusted to improve

model generalization by increasing the application

probability of rotation, noise, and low-resolution

simulation. These strategies enabled the model to

better handle real-world complex medical images,

showing robust performance in dealing with noise,

resolution variations, and other challenges.

5 RESULTS AND DISCUSSION

5.1 results

The results show that improved nnU-Net significantly

outperforms TotalSegmentator in terms of

segmentation accuracy, as demonstrated by its higher

Dice Score and IoU. The following table summarizes

the performance comparison:

Table 1: Experimental result

Metric TotalSegmentator Improved

nnU-Net

Dice Score 0.6241 0.99967

IoU 0.4536 0.99935

Sensitivit

0.4600 0.99935

ecificit

0.9973 1.0

95% Hausdorff

Distance

26.23 0.0

99.9%

Hausdorff

Distance

34.67 1.0

100% Hausdorff

Distance

55.24 8.31

Based on the comparison in table 1, improved

nnU-Net significantly outperforms TotalSegmentator

across all key performance metrics. improved nnU-

Net achieves a Dice Score of 0.99967, while

TotalSegmentator only reaches 0.6241, indicating

near-perfect alignment of improved nnU-Net’s

segmentation with ground truth labels. Additionally,

improved nnU-Net’s IoU score of 0.99935 is much

higher than TotalSegmentator’s 0.4536, reflecting

greater overlap between predicted segmentation and

actual labels. In terms of sensitivity, improved nnU-

Net excels with a score of 0.99935, far surpassing

TotalSegmentator’s 0.4600, demonstrating its

superior ability to detect relevant foreground regions.

While both models perform well in specificity,

improved nnU-Net achieves a perfect score of 1.0,

indicating its near-flawless ability to avoid false

positives in background regions. In terms of

Hausdorff Distance, improved nnU-Net holds a

significant advantage: its 99.9% Hausdorff Distance

is 1.0, and the 100% Hausdorff Distance is 8.31, far

lower than TotalSegmentator’s 95% Hausdorff

Distance of 26.23 and 100% Hausdorff Distance of

55.24. This shows that improved nnU-Net provides

far more accurate boundary delineations of

anatomical structures. In summary, improved nnU-

Net’s adaptive architecture and finely tuned

configurations offer substantial advantages in

medical image segmentation tasks, particularly where

boundary precision and sensitivity are critical.

5.2 Discussion

The improved nnU-Net significantly outperforms

TotalSegmentator across several metrics due to the

optimizations made to its loss function and data

augmentation strategy. By increasing the weight of

Dice loss, the model more effectively handles small

targets and imbalanced data, resulting in greater

precision when segmenting small regions.

Furthermore, adjustments to the data augmentation

strategy increased the model’s robustness to various

image perturbations such as noise, rotation, and

resolution changes. These improvements have led to

superior performance in metrics like Dice Score and

IoU, while significantly reducing Hausdorff

Distance, indicating more accurate boundary

segmentation.

Nevertheless, enhanced nnU-Net has several

drawbacks. The model's application in resource-

constrained contexts may be limited due to its lengthy

training timeframes and high processing

requirements. Further validation on a range of

datasets is necessary to establish generalizability, as

DAML 2024 - International Conference on Data Analysis and Machine Learning

250

the efficacy of data augmentation procedures may

also depend on the particular features of the dataset.

6 CONCLUSIONS

Through data augmentation techniques and loss

function tuning, this study greatly enhanced the nnU-

Net model's performance in medical picture

segmentation tasks. By increasing the weight of Dice

loss, the model showed enhanced performance in

handling small targets and data imbalance, while the

improvements in data augmentation made the model

more resilient to perturbations like noise and rotation.

These enhancements boosted accuracy, boundary

handling, and robustness, outperforming

TotalSegmentator in metrics like Dice Score, IoU,

and Hausdorff Distance.

Future research will aim to reduce the model's

training time by exploring more efficient

optimization algorithms and ensemble learning

techniques. Additionally, efforts will focus on

validating the model's adaptability and ensuring the

generalizability of its data augmentation strategies

across various types of medical image datasets,

ultimately seeking to enhance performance and

reliability across a broader range of applications.

REFERENCES

Akinci, D., Antonoli, T., Yang, S., & Braren, R. F. (2023).

TotalSegmentator MRI: Sequence-independent

segmentation of 59 anatomical structures in MR

images. arXiv preprint. arXiv:2301.10693.

https://doi.org/10.48550/arXiv.2301.10693

Chen, J., Lu, Y., Yu, Q., Luo, X., Adeli, E., Wang, Y., ... &

Anandkumar, A. (2021). TransUNet: Transformers

make strong encoders for medical image segmentation.

arXiv preprint. arXiv:2102.04306.

https://doi.org/10.48550/arXiv.2102.04306

Çiçek, Ö., Abdulkadir, A., Lienkamp, S. S., Brox, T., &

Ronneberger, O. (2016). 3D U-Net: Learning dense

volumetric segmentation from sparse annotation. In

International Conference on Medical Image Computing

and Computer-Assisted Intervention (pp. 424-432).

Springer, Cham. https://doi.org/10.1007/978-3-319-

46723-8_49

DeVries, T., & Taylor, G. W. (2017). Improved

regularization of convolutional neural networks with

cutout. arXiv preprint. arXiv:1708.04552.

https://doi.org/10.48550/arXiv.1708.04552

Fischl, B., Salat, D. H., van der Kouwe, A. J. W., Makris,

N., Ségonne, F., Quinn, B. T., & Dale, A. M. (2004).

Sequence-independent segmentation of magnetic

resonance images. NeuroImage, 23(S1), S69-S84.

https://doi.org/10.1016/j.neuroimage.2004.07.016

Hatamizadeh, A., Yin, Y., Kuo, W.-L., & Myronenko, A.

(2021). Swin UNETR: Swin transformers for semantic

segmentation of brain tumors in MRI images. In

Proceedings of the International Conference on

Medical Image Computing and Computer-Assisted

Intervention (pp. 272-284). Springer, Cham.

https://doi.org/10.1007/978-3-030-87237-3_24

Isensee, F., Jaeger, P. F., Kohl, S. A. A., Petersen, J., &

Maier-Hein, K. H. (2021). nnU-Net: A self-configuring

method for deep learning-based biomedical image

segmentation. Nature Methods, 18(2), 203-211.

https://doi.org/10.1038/s41592-020-01008-z

Kumar, N., Verma, R., & Arora, S. (2021). Three-stage

segmentation of lung region from CT images using

deep neural networks. BMC Medical Imaging, 21(1), 1-

12. https://doi.org/10.1186/s12880-021-00589-8

Milletari, F., Navab, N., & Ahmadi, S.-A. (2016). V-Net:

Fully convolutional neural networks for volumetric

medical image segmentation. In 2016 Fourth

International Conference on 3D Vision (pp. 565-571).

IEEE. https://doi.org/10.1109/3DV.2016.79

Ronneberger, O., Fischer, P., & Brox, T. (2015). U-Net:

Convolutional networks for biomedical image

segmentation. In Proceedings of the International

Conference on Medical Image Computing and

Computer-Assisted Intervention (pp. 234-241).

Springer, Cham. https://doi.org/10.1007/978-3-319-

24574-4_28

Roy, A. G., Conjeti, S., Navab, N., & Wachinger, C. (2018).

Concurrent spatial and channel 'Squeeze & Excitation'

in fully convolutional networks. In Proceedings of the

International Conference on Medical Image Computing

and Computer-Assisted Intervention (pp. 421-429).

Springer, Cham. https://doi.org/10.1007/978-3-030-

00928-1_47

Salehi, S. S. M., Erdogmus, D., & Gholipour, A. (2017).

Tversky loss function for image segmentation using 3D

fully convolutional deep networks. In Proceedings of

the International Conference on Machine Learning in

Medical Imaging (pp. 379-387). Springer, Cham.

https://doi.org/10.1007/978-3-319-67389-9_44

Sudre, C. H., Li, W., Vercauteren, T., Ourselin, S., &

Cardoso, M. J. (2017). Generalised dice overlap as a

deep learning loss function for highly unbalanced

segmentations. In Deep Learning in Medical Image

Analysis (pp. 240-248). Springer, Cham.

https://doi.org/10.1007/978-3-319-67558-9_28

Wasserthal, J., Meyer, M., Breit, H. C., Cyriac, J., Yang, S.,

& Segeroth, M. (2023). Totalsegmentator: Robust

segmentation of 104 anatomic structures in CT images.

Radiology: Artificial Intelligence, 5(2), e220198.

https://doi.org/10.1148/ryai.220198

Zhang, H., Cisse, M., Dauphin, Y. N., & Lopez-Paz, D.

(2018). Mixup: Beyond empirical risk minimization. In

International Conference on Learning Representations

(ICLR). https://doi.org/10.48550/arXiv.1710.09412

Enhancing NnU-Net for Improved Medical Image Segmentation: A Comparative Study with TotalSegmentator

251