MR to CT Synthesis Using GANs: A Practical Guide Applied to Thoracic

Imaging

Arthur Longuefosse

, Baudouin Denis De Senneville

, Ga

el Dournes

, Ilyes Benlala

Franc¸ois Laurent

, Pascal Desbarats

and Fabien Baldacci

LaBRI, Universit

e de Bordeaux, Talence, France

Institut de Math

ematiques de Bordeaux, Universit

e de Bordeaux, Talence, France

Service d’Imagerie M

edicale Radiologie Diagnostique et Th

erapeutique, CHU de Bordeaux, France

Keywords:

Generative Adversarial Networks, CT Synthesis, Lung.

Abstract:

In medical imaging, MR-to-CT synthesis has been extensively studied. The primary motivation is to beneﬁt

from the quality of the CT signal, i.e. excellent spatial resolution, high contrast, and sharpness, while avoiding

patient exposure to CT ionizing radiation, by relying on the safe and non-invasive nature of MRI. Recent

studies have successfully used deep learning methods for cross-modality synthesis, notably with the use of

conditional Generative Adversarial Networks (cGAN), due to their ability to create realistic images in a target

domain from an input in a source domain. In this study, we examine in detail the different steps required

for cross-modality translation using GANs applied to MR-to-CT lung synthesis, from data representation and

pre-processing to the type of method and loss function selection. The different alternatives for each step were

evaluated using a quantitative comparison of intensities inside the lungs, as well as bronchial segmentations

between synthetic and ground truth CTs. Finally, a general guideline for cross-modality medical synthesis is

proposed, bringing together best practices from generation to evaluation.

1 INTRODUCTION

In clinical practice, computed tomography (CT) is

typically used to diagnose lung conditions. However,

this modality exposes patients to ionizing radiation,

which may have negative effects on their health. Re-

cently, lung MRI with ultrashort or zero echo-time

(UTE/ZTE) has shown promise for high-resolution

structural imaging of the lung (Dournes et al., 2015;

Dournes et al., 2018). However, the appearance of

images obtained using this technique is substantially

different from those obtained using CT, notably imag-

ing texture, blurring, and noise which has limited its

adoption in clinical practice (cf Figure 1). The gen-

eration of CT images from MRI may be a good alter-

native and could improve patient diagnosis by provid-

ing high quality images to radiologists based solely on

the safe and non-invasive nature of MRI. Over recent

years, deep learning approaches, particularly genera-

tive adversarial networks (GANs) (Goodfellow et al.,

2014), have been extensively studied for image syn-

thesis in medical imaging. This type of network is

made of a generator and a discriminator, and is able to

produce high quality synthetic data similar to a given

dataset by learning a complex non-linear relationship

between MR and CT. Previous research on cross-

modality synthesis has used GANs to synthesize im-

ages in several different regions of the body, such as

the brain (Wolterink et al., 2017; Nie et al., 2017),

pelvic region (Lei et al., 2019), and also in the lungs

using Dixon MRI (Baydoun et al., 2020). Many stud-

ies have been conducted on the development of spe-

ciﬁc GAN models, including unpaired methods based

on cycleGAN (Zhu et al., 2017) and paired methods

based on pix2pix (Isola et al., 2017). In addition,

research also focused on the development of various

loss functions, such as cycle consistency (Zhu et al.,

2017), feature-matching (Wang et al., 2018), percep-

tual (Johnson et al., 2016), and contrastive loss (An-

donian et al., 2021). However, most state-of-the-art

studies are limited to these developments and do not

properly address the full range of steps involved in

medical translation tasks, such as preprocessing and

robust evaluation.

In this paper, we present a general guideline for

image-to-image translation applied to thoracic MR

to CT synthesis, covering key topics such as pre-

processing steps, data normalization and quantiza-

tion, and the importance of an adapted resampling

before registering the input. We review the differ-

ent types of GANs and losses and compare their per-

formances in thoracic image-to-image translation. A

268

Longuefosse, A., Denis De Senneville, B., Dournes, G., Benlala, I., Laurent, F., Desbarats, P. and Baldacci, F.

MR to CT Synthesis Using GANs: A Practical Guide Applied to Thoracic Imaging.

DOI: 10.5220/0011895700003417

In Proceedings of the 18th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP 2023) - Volume 3: IVAPP, pages

268-274

ISBN: 978-989-758-634-7; ISSN: 2184-4321

 2023 by SCITEPRESS – Science and Technology Publications, Lda. Under CC license (CC BY-NC-ND 4.0)

quantitative evaluation of the different models and

parameters is presented, using traditional metrics as

well as a comparison of the segmentations of air-

ways in synthesized CT images versus ground truth

CT images, to help identify the factors that have the

biggest impact on the performance of medical image-

to-image translation. Overall, our evaluation helps to

provide a better understanding of the different mod-

els and parameters used for medical image-to-image

translation and can serve as a useful reference for re-

searchers and practitioners in this ﬁeld.

(a) UTE MR. (b) CT.

Figure 1: Visual comparison of thoracic UTE MR and CT

modalities of the same patient at a corresponding axial slice.

The CT scan shows a higher signal quality, greater contrast

and sharpness, and fewer artifacts compared to the MRI.

2 METHODS

2.1 Data Acquisition

The dataset used in this study consists of UTE MR

and CT thoracic images of 110 patients. Both modal-

ities were acquired on the same day, from 2018 to

2022. CT images were obtained using a Siemens

SOMATOM Force, in end-expiration, with sharp ﬁl-

ters. The parameters used were a DLP of 10 mGy.cm

and a SAFIRE iterative reconstruction. UTE MR im-

ages were acquired using the SpiralVibe sequence on

a SIEMENS Aera scanner, with the following param-

eters: TR/TE/ﬂip angle=4.1ms/0.07ms/5°. Since the

slice plane is encoded in Cartesian mode, the native

acquisition was performed in the coronal plane with

ﬁeld-of-view outside the anterior and posterior chest

edges to prevent aliasing. It should be noted that res-

olutions, voxel spacings, and ﬁelds of view are not

identical in CT and MR images. In addition, modal-

ities may have been taken at different points in the

respiratory cycle. To obtain a paired dataset, an ad-

equate resampling and a deformable registration will

thus be required between CT and MR volumes.

2.2 Preprocessing

2.2.1 Resampling

In multimodal registration, it is typically advised to

use the image with the highest resolution as the ﬁxed

image and the image with the lower resolution as the

moving image, since a higher level of detail and ac-

curacy in the ﬁxed image can help improve the per-

formance of the registration process. In our case, we

have to register the CT volume, with a voxel size of

0.6 × 0.6 × 0.6 mm

, on the MRI, with a voxel size

of 1 × 1 × 1 mm

, which implies a resampling of the

CT to the MRI resolution, and thus a loss of infor-

mation, as shown in Figure 2. To avoid this issue,

we propose to upsample the MRI voxel size to the

CT voxel size, allowing to keep the initial resolution

of the CT, which implies a better convergence of the

registration algorithm as well as better performances

for the GAN. The two modalities are therefore resam-

pled on a common grid of 0.6 × 0.6 × 0.6 mm

using

tricubic interpolation. For comparison purposes, CT

and MR volumes were also resampled on a 1 × 1 × 1

grid.

2.2.2 Multimodal Registration

Accurate alignment of images from different modali-

ties often requires non-rigid registration, especially in

parts of the body subject to severe periodic deforma-

tions, such as cardiac and respiratory motions. Edge-

alignment methods seem particularly well suited for

multimodal medical registration since they don’t rely

on input landmarks and can overcome differences in

intensity and contrast between modalities, by focus-

ing on boundary information. In our dataset, a rigid

translation is estimated to ease convergence, before

the EVolution algorithm (Denis de Senneville et al.,

2016) is employed to estimate the elastic deformation,

a patch-based approach that includes a diffusion reg-

ularization term and a similarity term that favors edge

alignments. To prevent physically implausible folding

of the volumes during the registration process, a dif-

feomorphic transformation is ensured by minimizing

the inverse consistency error ((Christensen and John-

son, 2001), (Heinrich et al., 2012)).

2.2.3 Intensity Normalization

CT and MR modalities have fundamental differences

that must be taken into account when normalizing

intensity values. CT intensity values are deﬁned in

Hounsﬁeld units (HU) and have a physical meaning,

whereas MR intensity values strongly depend on ac-

quisition parameters. Therefore, methods used for in-

tensity normalization must be tailored to the speciﬁc

MR to CT Synthesis Using GANs: A Practical Guide Applied to Thoracic Imaging

269

(a) CT 0.6 × 0.6 × 0.6 mm

. (b) CT 1 × 1 × 1 mm

Figure 2: Visual comparison of identical CT slices at different voxel spacing, with zoomed regions that highlight bronchi,

circled in yellow. Pulmonary bronchi are nearly indistinguishable in the 1 mm

version due to the lower resolution of the

image.

characteristics of each modality. In our study, CT in-

tensities are cropped to [-1000; 2000] HU window to

remove irrelevant values from the table or background

and rescaled to [-1; 1] using the same window lim-

its. MR intensity inhomogeneities, also known as bias

ﬁeld, are ﬁrst corrected using the popular N4 bias ﬁeld

correction algorithm (Tustison et al., 2010). MR val-

ues are then normalized using z-score, i.e. zero mean

and unit variance, cropped to [−3σ ; 3σ] to remove

outliers, σ being the standard deviation, and rescaled

to [-1; 1] based on minimum and maximum inten-

sities. Nyul histogram matching (Ny

ul et al., 2000)

was also considered, but the ﬁndings of Reinhold et

al. (Reinhold et al., 2019) indicated that the synthesis

process was robust to the choice of MR normalization

method used. As a result, we opted for a traditional

Z-score normalization approach.

2.2.4 Field of View Standardization

As shown in Figure 1, modalities may have different

ﬁelds of view (FOV). Due to the use of narrow beams

of X-rays to produce images, CT ﬁeld of view is typi-

cally limited to a small area of the body, whereas MRI

allows capturing a wider ﬁeld of view. In our dataset,

patients may also be in a different position depending

on the modality. This is reﬂected by the visibility of

the arms on the MRI, as opposed to the CT image. To

uniformize FOV, which can be useful to speed up cal-

culations and guide training, one common approach

is to identify a region of interest (ROI) using segmen-

tation methods. Few methods for lung segmentation

in MRI have been developed due to the lower sig-

nal and contrast, as well as the lack of data. On the

other hand, many effective methods are available for

CT, such as the U-Net R-231 convolutional network

(Hofmanninger et al., 2020). The CT volume being

registered on the MRI, it is then possible to apply the

segmentation of the lungs from the CT on the MRI,

allowing to obtain the same FOV on both modalities.

All axial slices are then either cropped or zero-padded

to 512 × 512, depending on the CT lung mask size.

2.2.5 Impact of Intensity Quantization

In this study, we also investigate the impact of the bit

depth of input medical images on the performance of

a GAN in lung MR to CT translation. We create two

datasets, one in line with most of the current state-

of-the-art papers in the ﬁeld with 8-bit images, and

another dataset with 16-bit images, and evaluate the

GAN’s performance on each dataset. This allows us

to determine whether using higher bit-depth images

can improve the performance of the GAN for thoracic

CT synthesis.

2.3 Image-to-Image Translation

Conditional generative adversarial networks

(cGANs), are a variant of GANs trained with

additional constraints on a speciﬁc input image and

have demonstrated signiﬁcant potential for image-to-

image translation tasks. CGANs are typically divided

into two main categories: unpaired methods, often

based on the CycleGAN model (Zhu et al., 2017),

designed for image-to-image translation without

the need for corresponding pairs of images, and

IVAPP 2023 - 14th International Conference on Information Visualization Theory and Applications

270

paired methods, based on the pix2pix model, using

corresponding pairs of images.

Since the introduction of the cycle consistency

loss in CycleGAN, many unpaired methods have been

developed, including NICE-GAN (Chen et al., 2020),

a decoupled network training method that uses the

discriminator to encode the image of the target do-

main. As for the paired methods, pix2pix improve-

ments are described in the method pix2pixHD (Wang

et al., 2018), which is no longer dependent on the

pixel-wise loss, but on a new feature matching loss,

as well as a multi-scale discriminator and a percep-

tual loss. SPADE (Park et al., 2019) also enhanced

the performance of paired methods by injecting class-

speciﬁc information into the generator network. This

model introduces a spatially adaptive normalization

based on the inputs, that improves the performance

and reliability of the generator, allowing synthesized

images that are conditioned on the input class. The

SPADE architecture can be integrated in other mod-

els, such as pix2pixHD, to apply additional con-

straints on the inputs and guide training.

Finally, recent works on paired image-to-image

translation developed a new type of bidirectional con-

trastive loss, called PatchNCE loss (Andonian et al.,

2021), that assesses the similarity between two im-

ages based on the mutual information from embedded

patches, unlike GANs discriminator that only evalu-

ates the realism of a synthesized image. This con-

trastive loss produces a smooth and interpretable loss

trajectory, which makes it easier to evaluate the con-

vergence of the training process and determine the

number of epochs needed. This is a common chal-

lenge with GANs since their traditional loss functions

tend to be noisy and provide no clear indication of

training progress.

2.4 Assessment of the Synthesis

In order to evaluate the performance of generative

models, past research has proposed several extrin-

sic evaluation measures, most notably Inception Dis-

tances (FID (Heusel et al., 2017), KID (Binkowski

et al., 2018)), which compare the generated images to

a set of real images and assess their quality and sim-

ilarity. Such measures have been proven to be insen-

sitive to global structural problems (Tsitsulin et al.,

2020), and may not be sufﬁcient for the evaluation of

medical image translation.

Traditional image processing metrics, such as

MSE, PSNR, SSIM, are the state-of-the-art reference

metrics for evaluating synthetic images. They can

provide information on how well the model preserves

spatial structure and content of the original images,

but are still highly sensitive to noise and distortion,

and may not accurately reﬂect the visual quality of an

image with low-level artifacts (Toderici et al., 2017).

Our assumption is that task-speciﬁc metrics are re-

quired to accurately quantify synthesized images and

evaluate the performance of a model, by taking into

account the structure and semantics of the images.

In our dataset, we deﬁned them as Dice score, preci-

sion, and sensitivity between synthesized and ground

truth bronchial tree segmentations, by using NaviAir-

way (Wang et al., 2022), a bronchiole-sensitive air-

way segmentation pipeline designed for CT data. This

allows us to accurately quantify false positives and

false negatives at the bronchial level for each syn-

thetic CT image. A qualitative evaluation conducted

by radiologists or other medical experts can also be

valuable, to ensure that the translation has preserved

overall ﬁdelity with the ground truth and diagnostic

information.

3 EXPERIMENTS AND RESULTS

(a) UTE-MR input. (b) Ground truth CT.

Figure 3: Comparison between MR, synthetic CT from

SPADE 1 × 1 × 1 mm

and SPADE 0.6 × 0.6 × 0.6 mm

and ground truth CT axial slices.

The initial dataset of 110 MR-CT thoracic images is

split in a training set of 82 patients, and testing set

of 28 patients. Although 3D GANs allow perception

of volumetric and neighborhood spatial information,

they involve an excessive computational cost and a re-

duction of the number of samples, which can be chal-

lenging to implement for some datasets. Therefore,

we choose to train the models on the 2D axial slices

of the CT and MR volumes and deﬁne datasets that

will allow us to assess the impact of each preprocess-

ing step:

• unpaired + unregistered 0.6 × 0.6 × 0.6 mm

CT,

MR to CT Synthesis Using GANs: A Practical Guide Applied to Thoracic Imaging

271

Table 1: Mean squared error (MSE), cross-correlation (CC) and structural similarity index (SSIM) between synthesized CT

and real CT inside the lungs.

Model MSE CC SSIM

NICE-GAN 88,45 ± 9,91 0,9283 ± 0,023 0,9725 ± 0,024

NICE-GAN registered 82,41 ± 9,74 0,9406 ± 0,015 0,9776 ± 0,022

pix2pixHD 78,46 ± 13,03 0,9499 ± 0,011 0,9834 ± 0,032

pix2pixHD w/ contrast 75,51 ± 12,05 0,9557 ± 0,010 0,9900 ± 0,030

SPADE 67,82 ± 8,18 0,9635 ± 0,083 0,9915 ± 0,016

SPADE w/ contrast 67,53 ± 7,31 0,9646 ± 0,088 0,9927 ± 0,017

SPADE 8-bit w/ contrast 67,76 ± 7,70 0,9630 ± 0,096 0,9932 ± 0,018

SPADE 1mm

w/ contrast 76,36 ± 8,79 0,9505 ± 0,016 0,9830 ± 0,026

Table 2: Dice, precision and sensitivity between synthesized and ground truth airways segmentations.

Model Dice Precision Sensitivity

NICE-GAN 0,590 ± 0,088 0,636 ± 0,0752 0,583 ± 0,1298

NICE-GAN-registered 0,640 ± 0,071 0,665 ± 0,069 0,642 ± 0,104

pix2pixHD 0,707 ± 0,054 0,796 ± 0,060 0,660 ± 0,102

pix2pixHD w/ contrast 0,741 ± 0,031 0,787 ± 0,052 0,715 ± 0,088

SPADE 0,733 ± 0,068 0,829 ± 0,060 0,681 ± 0,108

SPADE w/ contrast 0,743 ± 0,060 0,819 ± 0,054 0,706 ± 0,104

SPADE 8-bit w/ contrast 0,742 ± 0,055 0,802 ± 0,057 0,719 ± 0,098

SPADE 1mm w/ contrast 0,687 ± 0,078 0,766 ± 0,068 0,652 ± 0,120

8bit

• unpaired + registered 0.6× 0.6× 0.6 mm

CT, 8bit

• paired + registered 0.6 × 0.6 × 0.6 mm

CT, 8bit

• paired + registered 0.6 × 0.6 × 0.6 mm

CT, 16bit

• paired + registered 1 × 1 × 1 mm

CT, 16bit

The NICE-GAN model was trained using un-

paired datasets, while the pix2pixHD and SPADE

models were trained using paired datasets. We also

evaluated the performance gain of the contrastive loss

when applied to these paired methods.

All models are trained using the same proce-

dure and architecture deﬁned in the respective papers,

apart from pi2pixHD/SPADE dataloader and infer-

ence parts, which have been adapted to support 16-

bit input and output arrays. Table 1 lists the quan-

titative evaluation using mean squared error, cross-

correlation and structural similarity index between

synthesized CT images and ground truth CT. Calcula-

tions are constrained within the intersection between

CT and synthesized CT lung masks, to avoid compar-

ing the backgrounds and narrow the results inside the

lungs. Figure 3 shows an example axial slice between

input MR, ground truth CT and synthetic CT from

SPADE model with different samplings. SPADE re-

sults based on CT with a voxel size of 0.6× 0.6 × 0.6

present enhanced contrast and sharpness, and

therefore allow a more accurate distinction of vessels

and bronchi inside the lungs.

To validate these assumptions, we performed

the airways segmentation of synthesized and ground

truth CT using NaviAirway (Wang et al., 2022),

a bronchiole-sensitive airway segmentation pipeline

designed for CT data, and computed dice score, preci-

sion, and sensitivity (Table 2). To enable comparison,

the SPADE 1× 1 × 1 mm

was resampled to the same

resolution as the ground truth CT before calculating

the airways segmentation. Figure 4 shows a com-

parison of ground truth and synthetic CT bronchial

trees, and illustrates the ability of the SPADE method

to produce high quality airways, with levels of depth

generation that closely approximate those of ground

truth CT.

4 DISCUSSION

Results from Table 1 based on image processing met-

rics and Table 2 based on the evaluation of airways

segmentation are strongly correlated, with identical

trends. Unpaired methods seem to beneﬁt from the

elastic registration but produce less satisfactory re-

sults than paired methods, which is in agreement with

statements in state-of-the-art (Kaji and Kida, 2019).

Paired pix2pixHD method combined with the con-

ditional normalization layer SPADE provides better

performances than the pix2pixHD method alone since

it can overcome false positives and false negatives by

adding constraints on the inputs. The introduction

IVAPP 2023 - 14th International Conference on Information Visualization Theory and Applications

272

Figure 4: Airways segmentation example based on SPADE

with contrastive loss (yellow) and real CT (red) using the

NaviAirway pipeline (Wang et al., 2022).

of the PatchNCE (Andonian et al., 2021) contrastive

loss has improved the performance of paired meth-

ods, particularly for the pix2pixHD model that tends

to diverge. This addition had only a minor impact

on the SPADE model, but still provided better control

over convergence during training and a more accu-

rate way to differentiate epochs. The performance of

the SPADE model with a voxel size of 0.6× 0.6 × 0.6

is signiﬁcantly superior to that of the model with

a voxel size of 1 × 1 × 1 mm

, both in terms of sig-

nal quality and bronchi reconstruction. These results

conﬁrm our initial hypothesis that input data should

be registered based on the voxel size of the modal-

ity with the highest resolution, since downsampling

the ground truth reference leads to a loss of informa-

tion, especially in ﬁne structures such as vessels and

bronchi. Surprisingly, the intensity quantization in the

input dataset images does not appear to have a signif-

icant impact on GAN performances; both SPADE 16-

bit and SPADE 8-bit models performed similarly. The

reason for this could be that our dataset is composed

of highly contrasted information, such as vessels and

bronchi in the lungs, and the representation in 8-bit

instead of the initial 12-bit would barely impact the

reconstruction using GANs. Future works will aim to

conﬁrm this hypothesis by conducting similar exper-

iments using different medical datasets in other parts

of the body.

5 CONCLUSION

In this paper, we present a comprehensive guide for

medical image translation using GANs. We focus on

the importance of data preprocessing, and its impact

on performance; the beneﬁts of using a resampling

based on the modality with the highest resolution,

as opposed to state-of-the-art statements, have been

demonstrated. We advocate the use of contrastive

loss methods, such as PatchNCE, to address one of

the most signiﬁcant challenges of GANs, which is

assessing convergence and stability during training.

In addition, we argue that traditional GAN metrics

commonly used in the ﬁeld, such as FID and KID,

as well as standard image processing metrics, do not

provide sufﬁcient information to adequately evaluate

GAN performances in medical image-to-image trans-

lation tasks. We recommend deﬁning task-speciﬁc

quantitative evaluation methods, ideally in conjunc-

tion with a qualitative evaluation by experts, in order

to robustly assess the performance of a model in this

context. In future work, we plan to investigate the

validity of our assumptions on different datasets for

other parts of the body and provide guidance on in-

corporating 3D information into the training process

for medical image-to-image translation.

REFERENCES

Andonian, A., Park, T., Russell, B., Isola, P., Zhu, J.-Y., and

Zhang, R. (2021). Contrastive feature loss for image

prediction. In Proceedings of the IEEE/CVF Interna-

tional Conference on Computer Vision, pages 1934–

1943.

Baydoun, A., Xu, K., Yang, H., Zhou, F., Heo, J., Jones,

R., Avril, N., Traughber, M., Traughber, B., Qian,

P., and Muzic, R. (2020). Dixon-based thorax syn-

thetic ct generation using generative adversarial net-

work. Intelligence-Based Medicine, 3-4.

Binkowski, M., Sutherland, D. J., Arbel, M., and Gretton,

A. (2018). Demystifying mmd gans. International

Conference on Learning Representations.

Chen, R., Huang, W., Huang, B., Sun, F., and Fang, B.

(2020). Reusing discriminators for encoding: Towards

unsupervised image-to-image translation. In Proceed-

ings of the IEEE/CVF Conference on Computer Vision

and Pattern Recognition (CVPR).

Christensen, G. E. and Johnson, H. J. (2001). Consis-

tent image registration. IEEE Trans Med Imaging,

20(7):568–582.

Denis de Senneville, B., Zachiu, C., Ries, M., and Moo-

nen, C. (2016). EVolution: an edge-based variational

method for non-rigid multi-modal image registration.

Phys Med Biol, 61(20):7377–7396.

Dournes, G., Grodzki, D., Macey, J., Girodet, P. O., Fayon,

M., Chateil, J. F., Montaudon, M., Berger, P., and Lau-

rent, F. (2015). Quiet Submillimeter MR Imaging of

the Lung Is Feasible with a PETRA Sequence at 1.5

T. Radiology, 276(1):258–265.

Dournes, G., Yazbek, J., Benhassen, W., Benlala, I., Blan-

chard, E., Truchetet, M. E., Macey, J., Berger, P.,

MR to CT Synthesis Using GANs: A Practical Guide Applied to Thoracic Imaging

273

and Laurent, F. (2018). 3D ultrashort echo time

MRI of the lung using stack-of-spirals and spherical

k-Space coverages: Evaluation in healthy volunteers

and parenchymal diseases. J Magn Reson Imaging,

48(6):1489–1497.

Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B.,

Warde-Farley, D., Ozair, S., Courville, A., and Ben-

gio, Y. (2014). Generative adversarial nets. In Ad-

vances in Neural Information Processing Systems,

volume 27.

Heinrich, M. P., Jenkinson, M., Bhushan, M., Matin, T.,

Gleeson, F. V., Brady, S. M., and Schnabel, J. A.

(2012). MIND: modality independent neighbourhood

descriptor for multi-modal deformable registration.

Med Image Anal, 16(7):1423–1435.

Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., and

Hochreiter, S. (2017). Gans trained by a two time-

scale update rule converge to a local nash equilibrium.

In Proceedings of the 31st International Conference

on Neural Information Processing Systems, NIPS’17,

page 6629–6640.

Hofmanninger, J., Prayer, F., and et al., J. P. (2020). Au-

tomatic lung segmentation in routine imaging is pri-

marily a data diversity problem, not a methodology

problem. Eur Radiol Exp, 50.

Isola, P., Zhu, J.-Y., Zhou, T., and Efros, A. A. (2017).

Image-to-image translation with conditional adversar-

ial networks. CVPR.

Johnson, J., Alahi, A., and Fei-Fei, L. (2016). Perceptual

losses for real-time style transfer and super-resolution.

In Computer Vision – ECCV 2016, pages 694–711.

Kaji, S. and Kida, S. (2019). Overview of image-to-image

translation by use of deep neural networks: denois-

ing, super-resolution, modality conversion, and recon-

struction in medical imaging. Radiol Phys Technol,

12(3):235–248.

Lei, Y., Harms, J., Wang, T., Liu, Y., Shu, H. K., Jani, A. B.,

Curran, W. J., Mao, H., Liu, T., and Yang, X. (2019).

MRI-only based synthetic CT generation using dense

cycle consistent generative adversarial networks. Med

Phys, 46(8):3565–3581.

Nie, D., Trullo, R., Lian, J., Petitjean, C., Ruan, S., Wang,

Q., and Shen, D. (2017). Medical Image Synthe-

sis with Context-Aware Generative Adversarial Net-

works. Med Image Comput Comput Assist Interv,

10435:417–425.

ul, L. G., Udupa, J. K., and Zhang, X. (2000). New vari-

ants of a method of MRI scale standardization. IEEE

Trans Med Imaging, 19(2):143–150.

Park, T., Liu, M.-Y., Wang, T.-C., and Zhu, J.-Y. (2019).

Semantic image synthesis with spatially-adaptive nor-

malization. In Proceedings of the IEEE Conference

on Computer Vision and Pattern Recognition.

Reinhold, J. C., Dewey, B. E., Carass, A., and Prince, J. L.

(2019). Evaluating the Impact of Intensity Normaliza-

tion on MR Image Synthesis. Proc SPIE Int Soc Opt

Eng, 10949.

Toderici, G., Vincent, D., Johnston, N., Jin Hwang, S.,

Minnen, D., Shor, J., and Covell, M. (2017). Full

resolution image compression with recurrent neural

networks. In Proceedings of the IEEE conference

on Computer Vision and Pattern Recognition, pages

5306–5314.

Tsitsulin, A., Munkhoeva, M., Mottin, D., Karras, P., Bron-

stein, A., Oseledets, I., and Mueller, E. (2020). The

shape of data: Intrinsic distance for data distributions.

In International Conference on Learning Representa-

tions.

Tustison, N. J., Avants, B. B., Cook, P. A., Zheng, Y., Egan,

A., Yushkevich, P. A., and Gee, J. C. (2010). N4ITK:

improved N3 bias correction. IEEE Trans Med Imag-

ing, 29(6):1310–1320.

Wang, A., Tam, T. C. C., Poon, H. M., Yu, K.-C., and Lee,

W.-N. (2022). Naviairway: a bronchiole-sensitive

deep learning-based airway segmentation pipeline for

planning of navigation bronchoscopy. arXiv preprint

arXiv:2203.04294.

Wang, T.-C., Liu, M.-Y., Zhu, J.-Y., Tao, A., Kautz, J., and

Catanzaro, B. (2018). High-resolution image synthe-

sis and semantic manipulation with conditional gans.

In Proceedings of the IEEE Conference on Computer

Vision and Pattern Recognition.

Wolterink, J., Dinkla, A., Savenije, M., Seevinck, P., van

den Berg, C., and I

sgum, I. (2017). Deep mr to ct

synthesis using unpaired data. In Simulation and Syn-

thesis in Medical Imaging, Lecture Notes in Computer

Science, pages 14–23.

Zhu, J.-Y., Park, T., Isola, P., and Efros, A. A. (2017).

Unpaired image-to-image translation using cycle-

consistent adversarial networks. In Computer Vision

(ICCV), 2017 IEEE International Conference on.

IVAPP 2023 - 14th International Conference on Information Visualization Theory and Applications

274