Pair-GAN: A Three-Validated Generative Model from Single Pairs of

Biomedical and Ground Truth Images

Clara Br

emond-Martin

, Huaqian Wu

, C

edric Clouchoux

and K

evin Franc¸ois-Bouaou

Witsee, 33 Av. des Champs-

Elys

ees, 75008 Paris, France

Keywords:

GAN, Single Input, Auto-Encoder, Biomedical, Pair, Segmentation.

Abstract:

Generating synthetic pairs of raw and ground truth (GT) image is a strategy to reduce the amount of acquisition

and annotation by biomedical experts. Pair image generation strategies, from single-input paired images (SIP),

focus on patch-pyramid (PP) or on dual branch generator but, resulting synthetic images are not natural. With

few-input images, for raw synthesis, adversarial auto-encoders synthesises more natural images. Here we

propose Pair-GAN, a combination of PP containing auto-encoder generators at each level, for the biomedical

image synthesis based upon a SIP. PP allows to synthesise using SIP while the AAE generator renders most

natural the image content. We use for this work two biomedical datasets containing raw and GT images.

Our architecture is evaluated with seven state of the art method updated for SIP: qualitative, similitude and

segmentation metrics, Kullback Leibler divergences from synthetic and original feature image representations,

computational costs and statistical analyses. Pair-GAN generates most qualitative and natural outputs, similar

to original pairs with complex shape not produced by other methods, however with increased memory needs.

Future works may use this generative procedure for multimodal biomedical dataset synthesis to help their

automatic processing such as classiﬁcation or segmentation with deep learning tools.

1 INTRODUCTION

Deep learning tools (DL) show powerful capabilities

for image segmentation or classiﬁcation in various

computer-vision domains. The success of these tools,

greedy on training data and resources, relies on the

availability of large number of labeled images. In

the case of biomedical images, imaging and annota-

tions are difﬁcult to obtain. Indeed the time required

to replicates the experiments, or the resources con-

straints for ethical or sustainable reasons, hindrance

the DL implementations.

Data augmentation is the method of choice to in-

crease dataset images Xun et al. (2022); Iqbal et al.

(2022). Classical transformations usually proposed,

such as ﬂip-ﬂop or cropping, do not bring new con-

tent. Over the last decade, generative adversarial

networks (GAN) have given a gather speed to the

automation of image analysis supported by DL in

the biomedical ﬁeld Goodfellow et al. (2014). This

https://orcid.org/0000-0001-5472-9866

https://orcid.org/0000-0003-1061-675X

https://orcid.org/0000-0003-3343-6524

https://orcid.org/0000-0002-9754-3065

method synthesises a new specimen exhibiting the

representative characteristics of the original images,

however without duplicating one of them. The ar-

chitecture sustaining this process is based upon two

networks constituted by convolutional layers: a gen-

erator and a discriminator. The generator creates a

new realistic sample aiming at misleading the second

network, using an image or a noise as an input. The

discriminator aims at determining if the image given

in input is original or synthetic. Both networks are

improving each other using a loss function. However

the original GAN requires at least a minimal number

of images in input to proceed and give signiﬁcant re-

sults Lindner et al. (2019).

To overcome this limitation, many GAN

dropping-down architectures have emerged to syn-

thesise images with single input (SI) image. These

architectures lie most of the time on pyramidal

architectures or dual generators. In the ﬁrst case,

the generators of pyramidal single-input architecture

are multiscale-patch, which consists in convolutional

blocks taking in input a noise and an image at a

speciﬁc level of resolution. The output of this reso-

lution level is then given to the following block, also

taking a noise and an image as inputs Shaham et al.

Brémond-Martin, C., Wu, H., Clouchoux, C. and François-Bouaou, K.

Pair-GAN: A Three-Validated Generative Model from Single Pairs of Biomedical and Ground Truth Images.

DOI: 10.5220/0012318300003660

Paper published under CC license (CC BY-NC-ND 4.0)

In Proceedings of the 19th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP 2024) - Volume 3: VISAPP, pages

37-52

ISBN: 978-989-758-679-8; ISSN: 2184-4321

(2019). The second strategy consists in implementing

a two-branch generator, one focusing on local and

one on background features, with the same original

GAN architecture Sushko et al. (2021a).

To the best of our knowledge, most of SI gener-

ative architecture are based upon GAN architecture,

inheriting their limitations such as mode collapse or

non-natural generations. Other solutions have ap-

peared and could be integrated such as deep convo-

lutions, semantic information or, a conditional input

Iqbal et al. (2022). Only two studies in SI propose

to replace the GAN generative convolutional layers

by variational autoencoders, which helps synthesis-

ing diversiﬁed raw images Gur et al. (2020); Yoon

et al. (2022). In previous work,some drawbacks has

been observed in using the original GAN architecture

to generate diversiﬁed and natural raw images, but

based only on a reduced input images dataset. This

GAN architecture replacement by an auto-encodeur

improves the image generation and has been validated

with three strategies Br

emond Martin et al. (2022).

However, the generation of a pair of images and corre-

sponding ground truth (GT) in SI with an adversarial

auto-encoder generator structure has not been tested

yet.

Recently, two architectures have been updated to

handle the generation of a pair of raw and annotated

images from a single input pair (SIP) of raw and GT

images. The pyramidal improvement has been devel-

oped to handle the GT by increasing the number of

input channels Shaham et al. (2019). Another model

addressing this issue lies in a dual generator from a

noise a branch generates the raw image while the sec-

ond synthesise the mask. An attention mechanism al-

lows then to evaluate the realism of images Sushko

et al. (2023).

The characterisation of the natural aspect of syn-

thesised biological images has not yet been addressed.

In the literature, there is no consensus on a metric or a

metric combination allowing to ensure the naturality

of an image, which is a real issue in the biomedical

ﬁeld Borji (2019). Recently, Br

emond-Martin et al.

(2023) propose to compare metric and psychovisual

evaluation to choose appropriate metrics for an ap-

plicative case.

In this paper, we attempt to answer these ques-

tions by proposing a new Pair-GAN architecture us-

ing pyramidal auto-encoders generation for biomedi-

cal image and GT generations from a SIP. The pyra-

midal structure helps to generate images from SI and

the auto-encoder more natural samples. Some im-

provement has been dedicated to the generative part

of a SI image. However none of them identiﬁed if

the concomitant generation of a raw and ground truth

images give similar results, and none of them to our

knowledge use a pyramidal auto-encoders generation.

We propose to compare the resulting synthetic pair

from Pair- with synthetic pair from state of the art ar-

chitectures dedicated to SI synthesis and adapted for

pair generation. To avoid the lack of consensus in

metrics, we use previously developed and validated

metrics and statistical strategy Br

emond Martin et al.

(2022); Br

emond-Martin et al. (2023) and a computa-

tional validation.

2 RELATED WORKS

2.1 Single Input Generative Methods

Pyramid Frameworks. To generate images from a

SI image, the ﬁrst strategy is to use pyramidal archi-

tectures. The model learns internal statistics from

image patches across different image scales with a

Wasserstein loss. Each stage is constituted by con-

volutional layers. SinGAN is the ﬁrst model to

synthesise images with a SI image strategy Shaham

et al. (2019). Improvements of this architecture may

rely on an attention module (SetGAN) or Gaussian

smoothing Mahendren et al. (2023). Another recent

model, ConSinGAN Hinz et al. (2021), is based upon

a cascaded framework of SinGAN. In this model, a hi-

erarchical module is added with a mixed reconstruc-

tion loss, providing various intensities of image gen-

eration, contrary to SinGAN. The Multi-scale-GAN

lies on cascaded frameworks such as SinGAN. Simi-

larly, it proposes an input based on noise and input im-

ages at various scales, the main difference being the

use of PAC-Bayes boundary theory to tighter general-

isation error bounds and synthesise more realistic im-

ages with a super-resolution Tang et al. (2022). SinIR

uses the same SinGAN cascaded architecture with

particular random pixels shufﬂing inside the genera-

tive part Yoo and Chen (2021). Likewise, Shufﬂing-

SinGAN proposes a pixel shufﬂing at each scale of

the pyramid generative framework, but adds a chan-

nel attention module and a spatial attention module.

Shufﬂing pixels allows strengthening the role of the

generator, misleading the discriminator Zheng et al.

(2021a). While previous methods focus on textural

information, SaSinGAN proposes to estimate larger

image feature information by positioning attention

modules at different locations according to the pyra-

mid scale Chen et al. (2021b). CCASinGAN proposes

cascades where the input characteristics are sketched

into weighted feature maps, increasing the robust-

ness of the attention module. Contrary to SaSinGAN

where the attention modules are updating their posi-

VISAPP 2024 - 19th International Conference on Computer Vision Theory and Applications

tion at each scale, this architecture gives two atten-

tion modules at each scale around convolution layers

Wang et al. (2022). Another multi-scale architecture

consists of using Markov chain Monte Carlo as en-

ergy based model with no need of a discriminator net-

work, as demonstrated in PatchGenCN Zheng et al.

(2021b).

Discriminator Optimisations. One-shot-GAN,

also known as SIV-GAN, focuses on a double dis-

criminator module with a content and a layout branch

to take into account spatial information and improve

one shot image generation. This allows to not

memorise previous training samples and to generate

a different content from the origin Sushko et al.

(2021b). The InGAN is constituted of a multi-scale

discriminator composed of fully-convolutional patch

discriminators. This formation allows capturing

at each scale speciﬁc size patch statistics, coarse

structures and details thanks to the multiscale dis-

criminator Shocher et al. (2018). MorphGAN uses

a double discriminator containing global and patch

discriminators with a generator constituted with style

encoder Ruiz et al. (2020).

Generator Optimisations. The Generalised One-

Shot-GAN focuses on the generator optimisations and

is composed of a main generative path and auxiliary

branch inherited from the previous generator. Auxil-

iary branch aims to give assets of particular elements

such as an object worn by a person, while the main

branch focuses on the style, i.e. generating a person

Zhang et al. (2022a). MoGAN architecture choose

the same strategy by creating region of interest and

background generator branches while the discrimina-

tor consists in Markovian chains. The ROI branch

uses an afﬁne transformation after convolutional lay-

ers to precise the synthesisChen et al. (2021a). To

maintain diversity and avoid collapse generation, HP-

VAE-GAN uses hierarchical patch with VAE Gur

et al. (2020). Similary, the Our-GAN employs ver-

tical coordinate convolutions to produce more natural

contentsYoon et al. (2022). The RcGAN takes into ac-

count a random patch in generator input and consists

in an cGAN architecture where the output of the ante-

penultimate layer takes the conditional vector Arantes

et al. (2020).

Both Optimisations. For SGAN the generator and

discriminator are following a DCGAN architecture in

order to obtain better spatial information Jetchev et al.

(2016). GenDa optimises the generation by adding

a classiﬁer in the discriminative part and truncate the

latent distribution of the generator with a strength fac-

tor Yang et al. (2021). PetsGAN avoids all the multi-

stage construction and allows a one-step training with

internal and external prior knowledge for the genera-

tive path and a regularised latent variable model. The

internal priors restore high-ﬁdelity textural informa-

tion in images, and the external priors give the high-

diversity and layout Zhang et al. (2022b)

GAN Combinations. A particular One-Shot image

generation lies in the combination of many GAN

frameworks such as Ex-Sin-GAN. This framework

proposes a three-module assembly of GAN, each one

focusing on either structural information, semantic, or

textural information. While textural information is re-

trieved using SinGAN framework, the structural mod-

ule is based upon a fully connected discriminator with

a Wasserstein loss. Semantic information retrieval is

based on a GAN inversion with a patch discrimina-

tor and a perceptual loss, where GAN inversion aims

at ﬁnding the latent space code in a pre-trained way

in order to best reconstruct images, giving the seman-

tic content with the two other updates Zhang et al.

(2021). A particular implementation lies in the Adv-

GAN framework which adds a perturbation after the

generator, to render more diversiﬁed generation called

Image to Image Translation Zhang (2019).

2.2 Pair Single-Input Generative

Networks

To our knowledge, the only works generating a pair

of raw and segmented images are from Shaham et al.

(2019) and Sushko et al. (2023). The ﬁrst one consists

in the SinGAN architecture updated for four-channel

images, with the last channel corresponding to the

segmentation mask. The second work is based upon

the One-shot architecture, however, a mask branch is

added inside the generator and an attention module

added in the discriminator.

These two architectures achieve to obtain similar

images to the original pairs. The main drawback of

SinGAN lies in the generator containing simple con-

volutional blocks, not allowing the synthesis of nat-

ural content. The One-Shot architecture proposes to

add an attention module to overcome this issue. The

strategy we choose to implement lies in the simple up-

date of generator convolutional blocks without mod-

ule additions, as an intent to render more natural the

generative process.

Pair-GAN: A Three-Validated Generative Model from Single Pairs of Biomedical and Ground Truth Images

3 METHODS

3.1 Datasets

We select two different biomedical datasets. The la-

beled and segmented gastrointestinal polyp images

are from the HyperKvasir open source dataset, ac-

quired with a standard endoscopy equipment from

Olympus (Olympus Europe, Germany) and Pen-

tax (Pentax Medical Europe, Germany). The

second dataset consists of a labeled brain mice

histopathological dataset, acquired in bright-ﬁeld

from the open microscopy project (idr0018-neff-

histopathology/experimentA). .

3.2 Resources

The scripts we create for this study are in python

3.11.3 with an Anaconda framework containing Py-

Torch 2.0.1 and cuda 117. We execute the codes on

an Intel Core i7-10750H CPU, with 2.59 GHz and a

Nvidia GeForce GTX 1650 TI GPU device.

3.3 Pair-GAN Framework

Our objective is to propose a hybrid architecture be-

tween multi-scale pyramidal patch and auto-encoders

(AE) to generate images from a SIP of images. The

natural diversity of the generation is given by the AE

and, the statistics of a complex single image struc-

tures at various scales are captured by the multi-scale

architecture.

3.3.1 Multi-Scale Architecture

Our proposed multi-scale architecture illustrated in

Figure∼1 consists of a patch-GANs pyramid, where

each level is responsible for capturing the patch dis-

tribution at a different scale of input Shaham et al.

(2019).

The pyramid starts at the lowest resolution and

end with the ﬁnest resolution with a Gaussian noise

image injected at each input. The deciphering is made

with the corresponding original pair of image resolu-

tion. The full script is adapted such as the four chan-

nel images in SinGAN-Seg; the input is composed of

the RGB raw image concatenated to the segmented

image. The pyramid starts by giving a Gaussian im-

age to a generator (Gn) which aims to map a noise

(zn) to a pair of raw and segmented images (pn):

pn = Gn(zn) (1)

The discriminator (Dn) attempts then to decipher if

the pair of images given are original or synthetic. The

Table 1: Generator based upon an auto-encoder architec-

ture. The input shape in pyramid for an image of 100× 100

pixels are from the lowest to the highest resolution level:

25 × 25, 33 × 33, 44 × 44, 58 × 58, 76 ×76 and 100 ×100.

f: ﬁlter, k: kernel, LR: LeakyRely, m: momentum, s: stride.

name shape parameters

conv input level shape (f=4, k=3, s=1 , LR=0.2)

conv max level shape (f=32, k=3, s=1 , LR=0.2)

Batchnorm (m=0.1, afﬁne)

deconv latent space shape (f=32, k=3, s=1 , LR=0.2)

deconv max level shape (f=4, k=3, s=1 , LR=0.2)

Batchnorm (m=0.1, afﬁne)

conv output level shape (f=1, k=3, s=1 , Tanh)

two networks update each other by an adversarial loss.

The single difference between the Gn and the next

levels [Gn−1 : G0] is that [Gn − 1 : G0] receive in ad-

dition to the Gaussian image the raw and segmented

pair of images at a ﬁnest resolution, thus:

pn − 1 = Gn − 1(zn − 1(pn

upsampled

)) (2)

Generator. In the literature, the Generator of Path-

GAN or SinGAN and its derivatives is based upon

a classical GAN approach containing only convolu-

tional blocks Shaham et al. (2019). Here we use

an AE generator architecture as detailed in Table 1

emond Martin et al. (2022); Br

emond-Martin et al.

(2023). The aim of this architecture is to reconstruct a

pair of raw and segmented image from a latent space

s composed of an encoder e, and a decoder d. The

encoder transforms the input by applying one or more

nonlinear parameters into a new representation of a

lower dimension. The decoder also uses nonlinear

transformations to reconstruct the original image with

the lower dimensional representation.

Training. The chosen discriminator is the same

Markovian discriminator than the Patch-GAN, Sin-

GAN and SinGAN-Seg implementation Shaham et al.

(2019). The selected loss reconstruction is from

SinGAN implementation. Instead of using a gradi-

ent penalty to optimise the Wasserstein loss, we use

Perceptual-Wasserstein loss from our previous study

to reproduce similar condition and a better contrast

and natural images Br

emond Martin et al. (2022);

emond-Martin et al. (2023). This loss is applied

on the whole images and not on patch to help the net-

work to better learn the boundaries. This architecture

is a Vanilla kind, at the end of the pyramid training,

the 40 pair of images generated from a single original

pair are retrieved for comparisons.

VISAPP 2024 - 19th International Conference on Computer Vision Theory and Applications

Figure 1: Pair-GAN for single input pair generation of raw and segmented biomedical images. This architecture contains

the classical cascaded network observed in the literature for the single input generation, hybridised with an auto-encoder

replacement instead of simple convolutions at each scale to give a natural content. z1 stands for input, s for the latent space,

e(s|z1) for an encoding distribution and d(l|z1) for the decoding distribution.

3.4 Evaluations

3.4.1 State of the Art Comparisons

To verify our results fairly, we choose to compare

the resulting raw and segmented pairs from our ar-

chitecture with pairs resulting from the run of oth-

ers architectures on the same datasets: SinGAN Sha-

ham et al. (2019), ConSinGAN Hinz et al. (2021),

hp-VAE-GAN Gur et al. (2020), SIV-GAN Sushko

et al. (2023), InGAN Shocher et al. (2018), PetsGAN

Zhang et al. (2022b) and Ex-Sin-GAN Zhang et al.

(2021). The scripts have been adapted to take as input

a four-channel image corresponding to an RGB image

and its GT except for SinGAN and Siv-GAN which

propose input pairs solution. For each architecture,

forty pairs of images are generated from each pair of

input images. For a simple visualisation, we show a

simple sample of input image, and a single sample

pair of each forty pairs of images resulting from each

tested architecture. We colored on segmented sam-

ples the added (pink) or eliminated (green) region of

interest (ROI) during the generation to verify the di-

versity from the original GT. To observe the forty gen-

erated masks shape and variations at the same time, a

heatmap is produced where the most generated pixels

are in pink and the background in black.

In order to verify the interest of using in the gen-

erative part the adversarial auto-encoder in single in-

put, we compare the AAE generation Br

emond Mar-

tin et al. (2022) with a GAN Goodfellow et al. (2014),

a DCGAN Wu et al. (2020), and an INFO-GAN Chen

et al. (2016). To verify the number of image contri-

butions during the generation, we test various few in-

put synthesis: 20, 15, 10, 5 images in input and then

the single input conﬁguration. To estimate the pair

generation interest in the same conditions, we synthe-

sise raw images and then ground-truth separately in

a vanilla way and stop the generation at 2000 epochs

see Tables 8, 7, 9 and 3.

3.4.2 Dimensional Reduction

To verify if the generated raw and segmented images

are in the same feature space than the original im-

age, we apply a t-SNE dimensional reduction anal-

ysis. In order to compare the groups constituted by

images synthesised from each architecture, the mean

Kullback-Leibler distance is evaluated. First it has

been calculated between the representations of orig-

inal and synthetic pair of images, then between the

original sample tested and all the other original sam-

ples contained inside a dataset.

3.4.3 Metrics

Due to the lack of consensus for natural synthetic im-

age characterisation, various metrics are evaluated to

give an asset of the similitude with the original pair of

images, and give their quality Br

emond Martin et al.

(2022); Borji (2019). All raw and segmented pairs

are compared in terms of: Blur, structural similitude

index (SSIM), mutual information (MI), peak signal-

to-noise ratio (PSNR), mean square error (MSE), uni-

versal quality metric (UQM), single image Frechet in-

ception distance (SIFID) and learned perceptual im-

age patch similarity (LPIPS).

For segmented images, the interest is to consider

if the generative networks are generating more, less,

similar or various segmentation shapes than the orig-

inal dataset. For this reason, we calculate Jaccard,

Pair-GAN: A Three-Validated Generative Model from Single Pairs of Biomedical and Ground Truth Images

area of segmentation, differences between the origi-

nal GT and a synthetic segmentation, accuracy, sen-

sitivity, speciﬁcity, dice and f1 scores. The genera-

tion giving the most diversiﬁed content different from

the GT is expected. All the mean, median, minimum,

maximum and standard deviation are given for each

metric and each group of architecture.

3.4.4 Statistics

The result of metrics (8 variables: Blur, SSIM, MI,

PSNR, MSE, UQM, SIFID, LPIPS and then 8 vari-

ables: Jaccard, Area, Ori-syntheticarea, Accuracy,

sensitivity, speciﬁcity, dice, f1) by group of archi-

tectures (8 factors: SinGAN, ConSinGAN, hp-VAE-

GAN, SIV-GAN, InGAN, PetsGAN, Ex-Sin-GAN

and Pair-GAN) are then compared statistically, and

also against the original dataset (9th factor: Original).

For this reason, we choose a Kruskall-Wallis method

followed by a Conover post-hoc in order to compare

all results from all the architectures against the origi-

nal pairs of images. Alpha risk is considered at 5%.

3.4.5 Computational

To verify the beneﬁt of using such methods, we cal-

culate the execution time (in seconds), and memory

usage (in Tb).

4 RESULTS

The objective here is to compare in various ways and

on various datasets if the hybridisation of an AE with

a multi-scale generative network helps to generate

more natural and diversiﬁed images and their GT with

a SIP of biomedical images. In this part, we com-

pare qualitatively, with metrics, statistics, and com-

putationally, the results of our generative architecture

with state-of-the-art results and the original pairs of

images.

4.1 Qualitative Comparisons

We compare visually datasets containing GT labeled

images with the synthetic raw and segmented images

from all the architecture. In Table 2, a sample experi-

ment from a SIP of images is presented. For the polyp

dataset, generated raw images have the same aspect,

color and brightness as the original image regardless

of the method. The only aspect diverging in synthetic

raw results is the red coloration in the middle of the

polyp which disappears in SinGAN, ConSinGAN and

InGAN results while, some red pigmentations are ob-

served in our Pair-GAN architecture.

Concerning the segmented mask, Siv-GAN, Pets-

GAN or Ex-Sin-GAN does not create a particular

shape variation compared to the original dataset. As

shown in the example, some architectures are creat-

ing structures with less region of interest (ROI) such

as SinGAN, ConSinGAN INGAN or our Pair-GAN.

Only hp-VAE-GAN creates an extension of structure,

adding new ROI.

The heatmap representation of all the segmenta-

tion mask generated by each architecture depicts al-

most a similar ROI synthesis for each GAN.

For histological dataset, with complex structures,

if the mice brain color and space representation is re-

produced by the GANs networks, only the Pair-GAN

architecture seems to generate all the structures. Oth-

ers solutions are mainly generating similar structures

at peripheral zones of the microscopic acquisition.

Whatever the architecture used, almost the same

proportion of additional or removal ROI are presented

in synthetic segmentation mask.

The heatmap renders external brain structure

zones more contrasted than the internal brain zones,

except for Pair-GAN images. Our solution visu-

ally generates the most similar and robust generation,

whatever the brain structure considered, with fewer

variations.

We propose in the following subsection to verify

our observation by statistically comparing metrics and

computational calculations on the overall datasets.

4.2 Statistical Space Comparison

In this part, we study the representation of original

and synthetic pairs of images in the same optimised

statistical space. Dimensional reduction is applied to

the extracted features on images during the generative

process. We then calculate Kullback Leibler diver-

gence (KL) on the ten t-SNE representations to verify

the stability of the representation shown in Figure 2.

To verify which architecture generates the most

diversiﬁed representation compared with the origi-

nal single pair of images, the mean KL between the

original representations and the generated raw or seg-

mented synthetic image representations is calculated.

For the polyp dataset, Pair-GAN generates raw im-

ages that are the most divergent from the original

raw images, followed by INGAN and CONSIN. The

mean KL between GT and segmented representations

is greater than the mean KL between raw and syn-

thetic image representations. For this kind of images,

Pair-GAN and CONSIN reach the highest scores and

give the most divergence from the original GT image.

According to the histological dataset, the KL diver-

gence is reduced when original and Pair-GAN syn-

VISAPP 2024 - 19th International Conference on Computer Vision Theory and Applications

Table 2: Biomedical and segmented synthetic data paired generated with a SIP image with various GANs. In green are

represented synthetic missing regions and in pink added regions compared to the original GT image. The heatmap are

representing all the generated sample segmentation with the most occurring segmentation in pink and in black the pixels

which are never corresponding to a segmented region.

Dataset Original GT Original GT

Model Synthetic Segmented Heatmap Synthetic Segmented Heatmap

SinGAN

ConSinGAN

hp-VAE-GAN

One-Shot-GAN

INGAN

Pets-GAN

Ex-Sin-GAN

Pair-GAN(ours)

thetic images are considered. For the raw synthesis,

the Ex-Sin-GAN generation gives the second-lowest

divergence and, for the segmentation, the ONE-SHOT

generation.

To verify if these divergences are too far from nat-

ural images, we then compare original and synthetic

pairs in terms of metrics.

Pair-GAN: A Three-Validated Generative Model from Single Pairs of Biomedical and Ground Truth Images

a) b)

c) d)

Figure 2: KL divergences on t-SNE on synthetic raw and segmented images in the same space as an original polyp image. a)

Raw polyp dataset b) Raw histological dataset c) Segmented polyp dataset and d) Segmented histological dataset.

4.3 Metric Comparisons

4.3.1 Qualitative and Similitude Metrics

The summary of metric calculations is presented in

Table 3.

For the Polyp dataset, raw synthetic images are

different from the original raw images, whatever the

architecture and metric considered. No differences

can be observed between synthetic raw images from

Pair-GAN and other generations.

For the histological dataset, raw images are not

different from original images, except if the genera-

tion is executed with Pair-GAN. This generation dif-

fers from other generative architecture, whatever the

metric considered (p<0.001). Concerning the seg-

mented images, only differences between PAIR-GAN

and HP-VAE-GAN can be observed see Appendix Ta-

ble 6.

Thus, for simple rounded objects in biological

images, the synthesis is qualitative and similar (low

PSNR high SSIM, MI, UQI) whatever the architec-

ture. However, for complex shapes, the synthesis

produces a different content from the original image

(high SSIM, MI, UQI scores) and loses some quality

(increase of PSNR scores) except with Pair-GAN .

For the raw polyp dataset the blurriest images are

synthesised with Consin (with 1253.98), the noisiest

with Ingan (with 19.13) and the best similitude scores

depends on the architecture.

For the raw histological dataset, the blurriest im-

ages are synthesised with HP-VAE-GAN and Consin

(with 27.50 and 27.08), the noisiest and the most sim-

ilar with Pair-GAN (Psnr of 23.90, lowest MSE of

287.96 for quality and, highest Ssim of 0.73, lowest

SIFID of 236.94 and LPIPS of 0.29 for the simili-

tude).

4.3.2 Segmentation Metrics

Here we propose to use segmentation metrics to verify

the differences of synthesis segmentation compared to

the GT and to better characterize the generation. In-

deed, previously we observed the generation can add

or eliminate some regions of interest depending on

the architectures, these metrics could help to quantify

these differences and are summarized in Appendix

Table 5.

The Pair-GAN Jaccard scores for segmented

polyp images tend toward the Original dataset vari-

ation. All the other scores do not seem to be relevant

to compare, as the comparison between original in-

put sample and others originals are far. Indeed, the f1

score between a GT used in input and all the others

GT in the original dataset is 0.28, dice at 0.37, sensi-

tivity at 0.24, accuracy at 0.66 and a difference area

VISAPP 2024 - 19th International Conference on Computer Vision Theory and Applications

Table 3: Metrics calculated on one-shot segmented synthetic images from various architectures. The stars correspond to

signiﬁcant differences between a group of generation with the corresponding input with *: p<0.05; **:p<0.01; ***:p<0.001.

Best scores tending to perfect scores for a quality or similitude metric are bolded.

Metrics part A

Blur Mi SSIM Psnr

Mean Std Mean Std Mean Std Mean Std

Polyp

SI Vs All Original 1361.07 546.48 1743.76 56.82 0.24 0.03 11.87 3.00

SinGAN 1011.16 *** 106.80 1.06 *** 0.01 0.29 *** 0.02 18.62 *** 0.60

CONSIN 1253.98 ** 89.21 1.13 ** 0.03 0.33 ** 0.02 18.70 ** 0.56

HP-VAE-GAN 576.02 *** 46.50 1.13 *** 0.07 0.35 *** 0.02 18.76 *** 0.99

ONE-SHOT-GAN 687.90 *** 45.20 1.12 *** 0.04 0.34 *** 0.02 19.00 *** 0.68

INGAN 1058.21 *** 65.43 1.15 *** 0.04 0.35 *** 0.02 19.13 *** 0.77

Ex-Sin-GAN 938.56 *** 80.43 1.09 *** 0.04 0.32 *** 0.02 18.47 *** 0.67

PETSGAN 1154.50 ** 73.30 1.16 ** 0.04 0.35 ** 0.02 19.00 ** 0.60

PAIRGAN (Ours) 1055.14 *** 77.67 1.11 *** 0.04 0.29 *** 0.02 18.68 *** 0.59

Histological

SI Vs All Original 46.47 6.64 -1824.43 303.29 0.41 0.14 19.71 18.43

SinGAN 25.52 2.09 -1810.91 20.99 0.52 0.03 19.06 0.73

CONSIN 27.08 1.40 -1802.24 23.29 0.53 0.04 19.31 0.82

HP-VAE-GAN 27.50 1.97 -1820.40 19.12 0.51 ,0.03 18.78 0.62

ONE-SHOT-GAN 26.91 1.59 -1800.55 25.42 0.54 0.04 19.31 0.80

INGAN 26.50 1.58 -1807.53 21.94 0.53 0.03 19.28 0.76

Ex-Sin-GAN 25.10 1.75 -1798.80 23.57 0.55 0.04 19.35 0.83

PETSGAN 23.35 1.59 -1795.49 22.48 0.56 0.037 19.79 0.82

PAIRGAN (Ours) 17.33 *** 2.89 -1677.16 *** 45.82 0.73 *** 0.07 23.90 *** 1.80

Metrics part B

MSE UQM SIFID LPIPS

Mean Std Mean Std Mean Std Mean Std

Polyp

SI Vs All Original 4540.44 1469.52 0.72 0.08 890.92 117.81 0.57 0.03

SinGAN 902.60 *** 128.73 0.91 *** 0.01 757.64 *** 90 0.46 *** 0.01

CONSIN 885.50 ** 113.84 0.91 *** 0.01 623.48 ** 114.49 0.43 ** 0.01

HP-VAE-GAN 889.90 *** 221.96 0.92 *** 0.01 757.13 *** 100.60 0.42 *** 0.01

ONE-SHOT-GAN 828.93 *** 140.34 0.92 *** 0.01 778.04 *** 106.62 0.44 *** 0.01

INGAN 808.47 *** 153.73 0.92 *** 0.01 637.66 *** 99.32 0.40 ** 0.01

Ex-Sin-GAN 935.92 *** 154.23 0.91 *** 0.01 759.73 *** 139.39 0.44 *** 0.01

PETSGAN 826.10 ** 121.79 0.92 *** 0.01 511.43 ** 90.14 0.40 ** 0.01

PAIRGAN (Ours) 889.18 *** 123.61 0.91 *** 0.01 598.45 *** 92.15 0.44 *** 0.01

Histological

SI Vs All Original 1756.86 436.67 0.96 0.01 377.96 106.28 0.54 0.13

SinGAN 817.77 136.27 0.98 0.01 465.04 95.37 0.45 0.02

CONSIN 775.76 143.97 0.98 0.01 460.51 77.29 0.44 0.03

HP-VAE-GAN 869.64 123.48 0.98 0.01 416.30 98.95 0.46 0.02

ONE-SHOT-GAN 775.42 135.63 0.98 0.01 382.49 77.83 0.44 0.02

INGAN 779.18 133.07 0.98 0.01 466.02 112.45 0.44 0.03

Ex-Sin-GAN 769.15 146.74 0.98 0.01 480.09 109.76 0.44 0.03

PETSGAN 694.36 125.28 0.98 0.01 414.39 99.71 0.44 0.02

PAIRGAN (Ours) 287.96 *** 115.51 0.99 *** 0.01 236.94 *** 100.81 0.29 *** 0.04

around 11000 while the difference between the GT

with the synthetic reaches are different, see Appendix

Table 5. Thus, for the area, difference area, accuracy,

sensitivity, speciﬁcity, dice and F1-score, the Pair-

GAN result is different from the Original (p<0.001).

There are no statistical differences between the Pair-

GAN architecture segmentation synthesis and other

architecture results, except for the Jaccard and area

scores (and particularly with the HP-VAE-GAN ar-

chitecture with p<0.001).

Histological synthetic segmentation is different

from the original GT for all the architectures accord-

ing to all the scores except for the accuracy with

Pair-GAN architecture (0.78 versus 0.77, p>0.05).

The Pair-GAN dice and f1 scores are signiﬁcantly

weaker than other architectures (p<0.01 while for

others groups p<0.001). The segmentation synthe-

sis is different for all the scores with Pair-GAN and

others architectures, except for the areas (the only dif-

ference concerns Pair-GAN with CONSIN p<0.001),

the total area scores (the only difference concerns

Pair-GAN with ONE-SHOT p<0.01), and the Jac-

card (there are no differences between Pair-GAN and

PETS-GAN p>0.05).

Pair-GAN: A Three-Validated Generative Model from Single Pairs of Biomedical and Ground Truth Images

Table 4: Computational comparisons.

Architecture Time(s) Memory (Tb)

SinGAN 1944.62 2986.95

ConSinGAN 2047.64 2871.04

hp-VAE-GAN 2020.55 2452.71

One-Shot-GAN 2245.21 582.03

INGAN 1999.47 2482.88

Pets-GAN 3747.69 441.43

Ex-Sin-GAN 8377.86 2785.37

Pair-GAN 2020.25 3034.33

4.4 Computational Comparisons

Pair-GAN run in the same time laps than SinGAN

architecture see Table 4(2020s vs 1944s). Concern-

ing memory consumption, Pair-GAN is the most ex-

pensive (3034Tb) and Pets-GAN needs the less mem-

ory.The Ex-Sin-GAN architecture needs almost four

times the SinGAN time to be executed.

5 DISCUSSION

We present Pair-GAN, a hybrid framework containing

a multi-scale architecture with auto-encoders to help

generation of pairs of raw and segmented biomedi-

cal images from a SIP. This generation gives natu-

ral images and an accurate segmentation, which are

considered as qualitative and similar to the original SI

from which they have been generated. They are also

in the same statistical space as the original dataset.

Contrary to other frameworks, Pair-GAN generates

higher diversity from a SI as shown in the dimen-

sional reduction and the metric calculation for the

polyp dataset. For the histological dataset, with more

complex shape, it renders the most similar and natural

representations. However, Pair-GAN requires a huge

need of memory to be executed in approximately the

same time as other networks.

To our knowledge, Pair-GAN is the third imple-

mentation dedicated to a generation pair from a SIP

(SinGAN and ConSin-GAN) Shaham et al. (2019);

Sushko et al. (2023). This strategy generates im-

ages as natural as the other two frameworks, nev-

ertheless with more similar structures than SinGAN

and ConSin-GAN. These interesting results may be

due to the auto-encoding structure we added in the

pyramid layer combined with the perceptual loss

emond Martin et al. (2022) and Appendix Tables 8,

7, 9, 3. To verify these observations, a psychovisual

study of raw synthetic images may help to decipher

if these images are also considered as natural by bi-

ological experts, and particular architecture misleads

more experts than others such as in Br

emond-Martin

et al. (2023). Another interesting approach to verify

the segmentation is trying to classify these images (by

their physio-pathological content, for instance). Here

we observe the results on two datasets with binary

segmentation, an interesting project would be to test

the generation of multimodal segmentation in order to

produce a data augmentation dedicated to multi-label

classiﬁcations Pandeva and Schubert (2019); Hong

et al. (2020).

Pair-GAN generates more diversiﬁed contents

than other architectures, as seen in the statistical space

comparison for simple shape. This diversity could be

a representation of the natural diversity present in the

original dataset composed of multiple labeled images.

This diversity of generation is a strong feature re-

searched in GAN ﬁeld in order to improve deep learn-

ing training for image segmentation for instance Xun

et al. (2022). For a qualitative segmentation, increas-

ing the number of natural samples with various fea-

tures allows to render accurate the results. However,

more natural and diversiﬁed augmented samples in

a segmentation training process, without augmenting

the total number of training images allows to precise

the segmentation Br

emond-Martin et al. (2023). An

interesting project may be to verify which data aug-

mentation architecture with single or few-input ren-

ders the most accurate segmentations.

Our architecture gives the most similar results

(SSIM,UQM,SIFID for instance) for complex biolog-

ical contents. For the histological dataset, only Pair-

GAN seems to produce a realistic image despite the

complexity of histological multiple structures. In-

deed, minima and maxima similitude scores for seg-

mented images from Pair-GAN are near the minima

and maxima GT scores. It is not the case for other ar-

chitectures, and these metrics results may be linked

with heatmap observations. For the polyp dataset,

containing a single object, all architectures except

Pair-GAN generate images similar to the input orig-

inal images. Pair-GAN generates shapes not already

produced (none of the similitude scores are reaching

their maximum) as shown in Table 3. Thus, a set of

metrics seems to help verify the naturality of images

produced with our architecture, while there is cur-

rently no consensus on the use of a particular metric

to validate the GAN synthesis Borji (2019). Another

interesting project should be to verify if these met-

rics are really useful in psychovisual evaluation task

emond-Martin et al. (2023). Particularly, if they

help to identify the naturality of rounded or simple

(such as polyp) or multiple or complex segmented ob-

jects (such as brain structures) in images.

Pair-GAN does not require more time to be ex-

ecuted, but needs a consequent amount of memory.

VISAPP 2024 - 19th International Conference on Computer Vision Theory and Applications

The addition of an encoding and decoding part with

latent space increases memory needs. In compari-

son, the original SinGAN structure, composed of only

convolution layers, needs less memory Shaham et al.

(2019). To reduce the memory requirement, a future

work is to lighten the encoding architecture and op-

timise the latent space. Instead of using the maximal

resolution in the second encoding convolutional layer,

an idea could be to update the shape according to the

level of resolution, which may accurate the results.

In future directions, other noise inputs may be

evaluated for the generative part. Indeed, in previous

research, we show an inﬂuence of the noise injection

according to a kind of acquisition Br

emond Martin

et al. (2022). Thus, it could be interesting to update

the noise injection according to the microscopic ac-

quisition considered and test if the result from a par-

ticular injection is still linked with the acquisition.

For the polyp dataset, the update of noise input may

reproduce particular saturation, the over/under expo-

sure of polyp topology during the imaging Ali et al.

(2020). Additionally, the generated image speculari-

ties may be evaluated to enhance the generated light

reﬂection on smooth objects. The loss function may

also be improved. As shown previously, it can im-

prove the contrast between the background and the

researched structureBr

emond Martin et al. (2022).

6 CONCLUSION

In this article we present Pair-GAN:

• A generative architecture based upon patch-

pyramidal auto-encoders;

• Taking in input a single pair of raw and GT

biomedical images;

• Which synthesise natural images, similar and in

the same statistical space as original pairs and

compared with state-of-the-art methods.

Such approach may be interesting to increase minimal

datasets to automate for instance the diagnosis grade

of a disease from a single image with deep learning

methods. An interesting perspective may be to verify

the grade of each generation from a single input pair

of images annotated with the grade.

REFERENCES

Ali, S., Zhou, F., Braden, B., Bailey, A., Yang, S., Cheng,

G., Zhang, P., Li, X., Kayser, M., Soberanis-Mukul,

R. D., et al. (2020). An objective comparison of detec-

tion and segmentation algorithms for artefacts in clin-

ical endoscopy. Scientiﬁc reports, 10(1):2748.

Arantes, R. B., Vogiatzis, G., and Faria, D. R. (2020). Rc-

gan: learning a generative model for arbitrary size

image generation. In Advances in Visual Comput-

ing: 15th International Symposium, ISVC 2020, San

Diego, CA, USA, October 5–7, 2020, Proceedings,

Part I 15, pages 80–94. Springer.

Borji, A. (2019). Pros and cons of gan evaluation measures.

Computer Vision and Image Understanding, 179:41–

65.

emond-Martin, C., Simon-Chane, C., Clouchoux, C., and

Histace, A. (2023). Brain organoid data synthesis and

evaluation. Frontiers in Neuroscience, 17.

emond Martin, C., Simon Ch

ane, C., Clouchoux, C.,

and Histace, A. (2022). Aaegan loss optimizations

supporting data augmentation on cerebral organoid

bright-ﬁeld images. In VISIGRAPP (4: VISAPP),

pages 307–314.

Chen, J., Xu, Q., Kang, Q., and Zhou, M. (2021a).

Mogan: Morphologic-structure-aware generative

learning from a single image. arXiv preprint

arXiv:2103.02997.

Chen, X., Duan, Y., Houthooft, R., Schulman, J., Sutskever,

I., and Abbeel, P. (2016). Infogan: Interpretable repre-

sentation learning by information maximizing genera-

tive adversarial nets. Advances in neural information

processing systems, 29.

Chen, X., Zhao, H., Yang, D., Li, Y., Kang, Q., and Lu,

H. (2021b). Sa-singan: self-attention for single-image

generation adversarial networks. Machine Vision and

Applications, 32:1–14.

Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B.,

Warde-Farley, D., Ozair, S., Courville, A., and Ben-

gio, Y. (2014). Generative adversarial nets. Advances

in neural information processing systems, 27.

Gur, S., Benaim, S., and Wolf, L. (2020). Hierarchical patch

vae-gan: Generating diverse videos from a single sam-

ple. Advances in Neural Information Processing Sys-

tems, 33:16761–16772.

Hinz, T., Fisher, M., Wang, O., and Wermter, S. (2021). Im-

proved techniques for training single-image gans. In

Proceedings of the IEEE/CVF Winter Conference on

Applications of Computer Vision, pages 1300–1309.

Hong, D., Yao, J., Meng, D., Xu, Z., and Chanus-

sot, J. (2020). Multimodal gans: Toward cross-

modal hyperspectral–multispectral image segmenta-

tion. IEEE Transactions on Geoscience and Remote

Sensing, 59(6):5103–5113.

Iqbal, A., Sharif, M., Yasmin, M., Raza, M., and Aftab, S.

(2022). Generative adversarial networks and its appli-

cations in the biomedical image segmentation: a com-

prehensive survey. International Journal of Multime-

dia Information Retrieval, 11(3):333–368.

Jetchev, N., Bergmann, U., and Vollgraf, R. (2016). Texture

synthesis with spatial generative adversarial networks.

arXiv preprint arXiv:1611.08207.

Lindner, L., Narnhofer, D., Weber, M., Gsaxner, C.,

Kolodziej, M., and Egger, J. (2019). Using synthetic

training data for deep learning-based gbm segmenta-

tion. In 2019 41st Annual International Conference of

Pair-GAN: A Three-Validated Generative Model from Single Pairs of Biomedical and Ground Truth Images

the IEEE Engineering in Medicine and Biology Soci-

ety (EMBC), pages 6724–6729. IEEE.

Mahendren, S., Edussooriya, C. U., and Rodrigo, R. (2023).

Diverse single image generation with controllable

global structure. Neurocomputing, 528:97–112.

Pandeva, T. and Schubert, M. (2019). Mmgan: Gener-

ative adversarial networks for multi-modal distribu-

tions. arXiv preprint arXiv:1911.06663.

Ruiz, N., Theobald, B.-J., Ranjan, A., Abdelaziz, A. H.,

and Apostoloff, N. (2020). Morphgan: One-shot face

synthesis gan for detecting recognition bias. arXiv

preprint arXiv:2012.05225.

Shaham, T. R., Dekel, T., and Michaeli, T. (2019). Singan:

Learning a generative model from a single natural im-

age. In Proceedings of the IEEE/CVF international

conference on computer vision, pages 4570–4580.

Shocher, A., Bagon, S., Isola, P., and Irani, M. (2018). In-

gan: Capturing and remapping the” dna” of a natural

image. arXiv preprint arXiv:1812.00231.

Sushko, V., Gall, J., and Khoreva, A. (2021a). One-shot

gan: Learning to generate samples from single images

and videos. In Proceedings of the IEEE/CVF con-

ference on computer vision and pattern recognition,

pages 2596–2600.

Sushko, V., Zhang, D., Gall, J., and Khoreva, A. (2021b).

Generating novel scene compositions from single im-

ages and videos. arXiv preprint arXiv:2103.13389.

Sushko, V., Zhang, D., Gall, J., and Khoreva, A.

(2023). One-shot synthesis of images and segmen-

tation masks. In Proceedings of the IEEE/CVF Win-

ter Conference on Applications of Computer Vision,

pages 6285–6294.

Tang, J., Tao, B., Gong, Z., and Yin, Z. (2022). Adap-

tive adversarial training method for improving multi-

scale gan based on generalization bound theory. arXiv

preprint arXiv:2211.16791.

Wang, X., Jiang, W., Zhao, L., Liu, B., and Wang, Y. (2022).

Ccasingan: Cascaded channel attention guided single-

image gans. In 2022 16th IEEE International Confer-

ence on Signal Processing (ICSP), volume 1, pages

61–65. IEEE.

Wu, Q., Chen, Y., and Meng, J. (2020). Dcgan-based data

augmentation for tomato leaf disease identiﬁcation.

IEEE Access, 8:98716–98728.

Xun, S., Li, D., Zhu, H., Chen, M., Wang, J., Li, J., Chen,

M., Wu, B., Zhang, H., Chai, X., et al. (2022). Gener-

ative adversarial networks in medical image segmen-

tation: A review. Computers in biology and medicine,

140:105063.

Yang, C., Shen, Y., Zhang, Z., Xu, Y., Zhu, J., Wu, Z., and

Zhou, B. (2021). One-shot generative domain adapta-

tion. arXiv preprint arXiv:2111.09876.

Yoo, J. and Chen, Q. (2021). Sinir: Efﬁcient general image

manipulation with single image reconstruction. In In-

ternational Conference on Machine Learning, pages

12040–12050. PMLR.

Yoon, D., Oh, J., Choi, H., Yi, M., and Kim, I. (2022). Our-

gan: One-shot ultra-high-resolution generative adver-

sarial networks. arXiv preprint arXiv:2202.13799.

Zhang, W. (2019). Generating adversarial examples in one

shot with image-to-image translation gan. IEEE Ac-

cess, 7:151103–151119.

Zhang, Z., Han, C., and Guo, T. (2021). Exsingan: Learning

an explainable generative model from a single image.

arXiv preprint arXiv:2105.07350.

Zhang, Z., Liu, Y., Han, C., Guo, T., Yao, T., and Mei, T.

(2022a). Generalized one-shot domain adaptation of

generative adversarial networks. Advances in Neural

Information Processing Systems, 35:13718–13730.

Zhang, Z., Liu, Y., Han, C., Shi, H., Guo, T., and Zhou, B.

(2022b). Petsgan: Rethinking priors for single image

generation. In Proceedings of the AAAI Conference on

Artiﬁcial Intelligence, volume 36, pages 3408–3416.

Zheng, M., Zhang, P., Gao, Y., and Zou, H. (2021a).

Shufﬂing-singan: Improvement on generative model

from a single image. In Journal of Physics: Confer-

ence Series, volume 2024, page 012011. IOP Publish-

ing.

Zheng, Z., Xie, J., and Li, P. (2021b). Patchwise generative

convnet: Training energy-based models from a single

natural image for internal learning. In Proceedings

of the IEEE/CVF Conference on Computer Vision and

Pattern Recognition, pages 2961–2970.

APPENDIX

We implemented the baseline models for single in-

put image with open sources we adapted to generation

from various channel input. These are the links to the

open sources we employed:

• SinGAN: https://github.com/tamarott/SinGAN;

• ConSinGAN: https://github.com/tohinz/ConSin

GAN;

• HP-VAE-GAN: https://github.com/shirgur/hp-v

ae-gan;

• SIV-GAN: https://github.com/boschresearch/one

-shot-synthesis;

• InGAN: https://github.com/Caenorst/InGAN/tree

/py3;

• PETS-GAN: https://github.com/zhangzc21/petsg

an;

• Ex-Sin-GAN: https://github.com/zhangzc21/ExS

inGAN

VISAPP 2024 - 19th International Conference on Computer Vision Theory and Applications

Table 5: Appendix segmentation metrics on segmented synthetic images compared to the single original segmented image.

The stars correspond to *: p<0.05; **:p<0.01; ***:p<0.001.

Dataset Architecture jaccard area ori-syntheticarea accuracy

Mean Std Mean Std Mean Std Mean Std

Polyp

SI Vs all Original 0.25 0.01 31219.698 25196.51 11004.698 25196.51 0.66 0.09

SinGAN 0.20 *** 0.01 52925.52 *** 8278.16 32710.52 *** 8278.16 0.94 *** 0.03

CONSIN 0.21 *** 0.01 53483.52 *** 5871.89 33268.52 *** 5871.89 0.94 *** 0.02

HP-VAE-GAN 0.21 *** 0.004 50288.1 *** 10491.23 30073.1 *** 10491.23 0.93 *** 0.05

ONE-SHOT-GAN 0.21 *** 0.003 53970.6 *** 4427.57 33755.6 *** 4427.57 0.95 *** 0.02

INGAN 0.22 *** 0.01 51999.66 *** 5983.92 31784.66 *** 5983.92 0.94 *** .03

Ex-Sin-GAN 0.22 *** 0.01 51243.18 *** 5967.71 31028.18 *** 5967.71 0.94 *** 0.03

PETSGAN 0.22 *** 0.007 54153.54 *** 3598.14 33938.54 *** 3598.14 0.95 *** 0.01

PAIRGAN 0.23 *** 0.009 53073.9 *** 5356.36 32858.9 *** 5356.36 0.95 *** 0.02

Histological

SI Vs all Original 0.95 0.02 374394.0 32243.30 254031.0 32243.30 0.78 0.06

SinGAN 0.16 *** 0.01 60167.0 *** 3133.26 -60196.0 *** 3133.25 0.74 *** 0.01

CONSIN 0.17 *** 0.01 60410.62 *** 2968.53 -59952.38 *** 2968.53 0.75 0.01 ***

HP-VAE-GAN 0.18 *** 0.01 61367.82 *** 3512.00 -58995.18 *** 3512.00 0.75 *** 0.01

ONE-SHOT-GAN 0.18 *** 0.01 61623.28 *** 3669.62 -58739.72 *** 3669.62 0.75 *** 0.01

INGAN 0.19 *** 0.01 61246.16 *** 3362.73 -59116.84 *** 3362.73 0.75 *** 0.01

Ex-Sin-GAN 0.19 * 0.01 61790.96 *** 3268.68 -58572.04 *** 3268.68 0.75 *** 0.01

PETSGAN 0.19 *** 0.01 62139.4 *** 2735.36 -58223.6 *** 2735.36 0.75 ** 0.01

PAIRGAN 0.18 *** 0.01 61135.9 *** 1568.16 -59227.1 *** 1568.16 0.77 0.01

Dataset Architecture sensitivity speciﬁcity dice f1

Mean Std Mean Std Mean Std Mean Std

Polyp

SI Vs all Original 0.24 0.22 0.87 0.13 0.37 0.22 0.28 0.21

SinGAN 0.84 *** 0.11 0.98 *** 0.02 0.82 *** 0.02 0.89 *** 0.07

CONSIN 0.85 *** 0.080 0.99 *** 0.01 0.83 *** 0.01 0.91 *** 0.04

HP-VAE-GAN 0.81 *** 0.16 0.99 *** 0.01 0.81 *** 0.05 0.88 *** 0.12

ONE-SHOT-GAN 0.87 *** 0.06 0.99 *** 0.006 0.83 *** 0.01 0.92 *** 0.04

INGAN 0.84 *** 0.08 0.99 *** 0.01 0.83 *** 0.01 0.90 *** 0.05

Ex-Sin-GAN 0.83 *** 0.09 0.99 *** 0.006 0.82 *** 0.01 0.90 *** 0.05

PETSGAN 0.88 *** 0.05 0.99 *** 0.01 0.83 *** 0.01 0.92 *** 0.03

PAIRGAN 0.86 *** 0.07 0.99 *** 0.009 0.83 *** 0.01 0.91 *** 0.04

Histological

SI Vs all Original 0.78 0.05 0.78 0.08 0.73 0.04 0.77 0.06

SinGAN 0.47 *** 0.02 0.98 *** 0.01 0.48 *** 0.01 0.63 *** 0.02

CONSIN 0.47 *** 0.02 0.98 *** 0.01 0.48 *** 0.01 0.63 *** 0.02

HP-VAE-GAN 0.48 *** 0.03 0.97 *** 0.01 0.48 *** 0.01 0.63 *** 0.02

ONE-SHOT-GAN 0.48 *** 0.03 0.98 *** 0.01 0.48 *** 0.01 0.64 *** 0.03

INGAN 0.48 *** 0.03 0.98 *** 0.01 0.48 *** 0.01 0.64 *** 0.02

Ex-Sin-GAN 0.48 *** 0.02 0.98 *** 0.01 0.48 *** 0.01 0.64 *** 0.02

PETSGAN 0.49 *** 0.02 0.98 *** 0.01 0.49 *** 0.01 0.65 *** 0.02

PAIRGAN 0.50 *** 0.01 0.99 *** 0.01 0.50 ** 0.01 0.66 ** 0.01

Figure 3: KL divergences on t-SNE from different generator architectures for the generation of biomedical images (left) and

ground-truth (right) from few input to single input generation.

Pair-GAN: A Three-Validated Generative Model from Single Pairs of Biomedical and Ground Truth Images

Table 6: Appendix qualitative and similitude metrics on segmented synthetic images compared to the single original seg-

mented image. The stars correspond to *:p<0.05, **:p<0.01, ***:p<0.001.

Blur Mi SSIM Psnr

Mean Std Mean Std Mean Std Mean Std

Polyp

SI Vs All Original 1044.60 463.54 1970.92 25.46 0.61 0.09 5.43 3.26

SinGAN 1393.56 96.55 1815.03 30.91 0.90 0.03 14.64 2.68

CONSIN 1405.06 108.52 1808.72 20.65 0.91 0.02 15.09 2.07

HP-VAE-GAN 1347.30 122.48 1814.70 43.28 0.90 0.05 15.00 3.16

ONE-SHOT-GAN 1378.24 72.50 1798.90 19.78 0.92 0.02 16.07 1.92

INGAN 1369.70 88.71 1806.53 24.77 0.91 0.02 15.56 2.32

Ex-Sin-GAN 1359.27 106.29 1814.48 27.50 0.90 0.03 14.68 2.23

PETSGAN 1395.62 63.91 1796.64 16.41 0.92 0.02 16.42 1.78

PAIRGAN (Ours) 1339.51 92.59 1805.07 20.99 0.91 0.02 15.47 2.02

Histological

SI Vs All Original 1314.28 143.94 -1834.47 130.65 0.46 0.13 9.45 9.95

SinGAN 357.94 27.52 -1765.31 13.14 0.09 0.02 10.31 0.51

CONSIN 380.86 18.98 -1761.32 13.78 0.10 0.02 10.47 0.56

HP-VAE-GAN 392.54 20.80 -1776.11 12.60 0.08 0.0 10.04 0.50

ONE-SHOT-GAN 372.95 27.17 -1763.14 15.86 0.11 0.02 10.52 0.63

INGAN 353.75 24.53 -1759.07 14.44 0.01 0.02 10.44 0.55

Ex-Sin-GAN 326.89 28.06 -1747.43 15.08 0.11 0.03 10.62 0.60

PETSGAN 333.30 27.26 -1751.00 13.28 0.12 0.02 10.76 0.57

PAIRGAN (Ours) 382.87 70.71 -1682.05 29.28 0.24 0.05 14.08 1.26

MSE UQM SIFID LPIPS

Mean Std Mean Std Mean Std Mean Std

Polyp

SI Vs All Original 19838.83 5677.25 0.57 0.10 493.37 141.04 0.30 0.05

SinGAN 2706.06 1798.89 0.89 0.03 260.87 126.65 0.12 0.03

CONSIN 2243.39 1056.01 0.89 0.02 227.91 75.08

HP-VAE-GAN 2827.56 2861.19 0.88 0.05 256.94 136.91 0.11 0.04

ONE-SHOT-GAN 1781.11 890.11 0.90 0.02 221.83 82.55 0.10 0.02

INGAN 2098.39 1270.77 0.90 0.03 215.38 96.59 0.10 0.03

Ex-Sin-GAN 2538.37 1405.72 0.89 0.02 298.78 165.67 0.11 0.03

PETSGAN 1616.76 724.83 0.91 0.02 203.15 71.88 0.09 0.02

PAIRGAN (Ours) 2064.84 1061.74 0.90 0.02 237.87 80.27 0.11 0.02

Histological

SI Vs All Original 11894.75 2939.43 0.46 0.13 200.58 52.96 0.39 0.09

SinGAN 6096.30 722.70 0.27 0.02 355.90 58.27 0.71 0.01

CONSIN 5887.06 767.54 0.27 0.02 370.43 55.47 0.71 0.01

HP-VAE-GAN 6486.44 749.72 0.26 0.02 366.08 63.01 0.72 0.01

ONE-SHOT-GAN 5830.22 811.32 0.28 0.02 336.10 40.16 0.73 0.01

INGAN 5916.38 727.33 0.27 0.02 399.93 58.59 0.70 0.01

Ex-Sin-GAN 5696.20 781.82 0.28 0.02 373.83 43.74 0.71 0.01

PETSGAN 5503.59 706.46 0.28 0.02 329.86 51.47 0.72 0.01

PAIRGAN (Ours) 2652.85 784.44 0.35 0.02 266.79 56.72 0.65 0.02

VISAPP 2024 - 19th International Conference on Computer Vision Theory and Applications

Table 7: Benchmark on generative architecture based upon

computational comparisons.

Architecture Time ParametersD ParametersG

GAN 676.30 1903875 (7.26 MB) 15492097 (59.10 MB)

DCGAN 11914.20 1903875 (7.26 MB) 433473 (1.65 MB)

INFOGAN 74987.79 1904387 (7.26 MB) 1904387 (7.26 MB)

AAE 1337.86 1904387 (7.26 MB) 142337 (556.00 KB)

Table 8: Sample from each generative architecture at 2000

epochs and for a certain number of input on histological

dataset.

Input Model Raw Seg

original

few input (20)

GAN

DCGAN

INFOGAN

AAE

single input

GAN

DCGAN

INFOGAN

AAE

Pair-GAN: A Three-Validated Generative Model from Single Pairs of Biomedical and Ground Truth Images

Table 9: Benchmark on generative architecture based upon metric comparisons on raw images from histological dataset.

Number Input Architecture Blur Mi SSIM Psnr

Mean Std Mean Std Mean Std Mean Std

20 Original 1439.95 226.79 1.28 0.59 0.36 0.16 20.43 18.26

20 GAN 5179.59 2328.42 0.85 0.12 0.16 0.09 13.99 2.95

20 DCGAN 811.27 128.62 1.10 0.06 0.33 0.05 16.82 0.73

20 INFOGAN 32884.89 1097.04 0.87 0.05 0.01 0.01 4.92 0.43

20 AAE 61.75 18.92 0.70 0.09 0.35 0.04 9.45 0.29

15 GAN 4575.59 2621.84 0.94 0.11 0.22 0.08 16.03 2.38

15 DCGAN 824.38 102.77 1.07 0.07 0.31 0.07 16.73 0.77

15 INFOGAN 29011.43 21650.14 0.85 0.05 0.02 0.008 9.76 0.35

15 AAE 57.61 24.92 0.65 0.1 0.35 0.04 9.19 0.27

10 GAN 5466.76 2834.79 0.98 0.11 0.21 0.1 15.32 3.01

10 DCGAN 17726.71 456.91 0.91 0.05 0.04 0.03 8.10 0.19

10 INFOGAN 21180.05 2382.57 0.77 0.06 0.03 0.01 11.42 0.41

10 AAE 40.77 17.47 0.71 0.11 0.36 0.04 9.31 0.35

5 GAN 8928.75 4573.26 0.88 00.12 0.14 0.11 12.58 4.67

5 DCGAN 25499.19 2080.48 0.96 0.05 0.04 0.02 7.59 0.23

5 INFOGAN 12413.56 4570.68 0.75 0.10 0.06 0.04 12.06 0.41

5 AAE 29.72 22.39 0.61 0.09 0.34 0.04 8.70 0.25

1 GAN 50669.93 684.43 0.89 0.05 0.01 0.01 7.20 0.47

1 DCGAN 647.60 328.41 1.10 0.09 0.30 0.04 15.93 0.62

1 INFOGAN 4427.16 4525.23 0.52 0.22 0.17 0.10 9.63 0.53

1 AAE 130.38 5.03 1.17 0.07 0.44 0.06 19.13 0.79

Number Input Architecture MSE UQM

Mean Std Mean Std

20 Original 1480.27 89.55 0.97 0.01

20 GAN 3334.66 2782.71 0.93 0.06

20 DCGAN 1369.91 225.85 0.97 0.01

20 INFOGAN 21065.03 2088.27 0.48 0.06

20 AAE 7389.90 498.28 0.82 0.01

15 GAN 1983.19 1890.19 0.96 0.04

15 DCGAN 1400.04 229.64 0.97 0.01

15 INFOGAN 6902.15 556.23 0.88 0.01

15 AAE 7842.50 494.37 0.81 0.01

10 GAN 2627.49 3185.54 0.94 0.07

10 DCGAN 10085.92 457.35 0.76 0.01

10 INFOGAN 4705.10 450.31 0.93 0.01

10 AAE 7644.53 616.56 0.81 0.01

5 GAN 6597.83 8174.11 0.85 0.18

5 DCGAN 11351.16 600.82 0.72 0.01

5 INFOGAN 4069.32 397.58 0.93 0.012

5 AAE 8780.68 506.66 0.79 00.01

1 GAN 12454.67 1387.32 0.75 0.03

1 DCGAN 1675.49 224.75 0.96 0.01

1 INFOGAN 7129.98 790.86 0.85 0.02

1 AAE 806.59 143.83 0.98 0.01

VISAPP 2024 - 19th International Conference on Computer Vision Theory and Applications