GeST: A New Image Segmentation Technique based on Graph

Embedding

Anthony Perez

Univ. Orl

eans, INSA Centre Val de Loire, LIFO EA 4022, FR-45067 Orl

eans, France

Keywords:

Image Segmentation, Superpixels, Graph Embeddings, Complex Networks.

Abstract:

We propose a new framework to develop image segmentation algorithms using graph embedding, a well-

studied tool from complex network analysis. So-called embeddings are low-dimensional representations of

nodes of the graph that encompass several structural properties such as neighborhoods and community struc-

ture. The main idea of our framework is to ﬁrst consider an image as a set of superpixels, and then compute

embeddings for the corresponding undirected weighted Region Adjacency Graph. The resulting segmentation

is then obtained by clustering embeddings. To the best of our knowledge, known complex network-based

segmentation techniques rely on community detection algorithms. By introducing graph embedding for image

segmentation, we combine two nice properties of aforementioned segmentation techniques, namely working

on small graphs with low-dimensional representations. To illustrate the relevance of our approach, we propose

GeST, an implementation of this framework using node2vec and agglomerative clustering. We experiment our

algorithm on a publicly available dataset and show that it produces qualitative results compared to state-of-

the-art segmentation techniques while requiring low computational complexity and memory.

1 INTRODUCTION

The aim of image segmentation is to partition an im-

age into separate regions of interest, which ideally

correspond to real-world objects. This constitutes a

fundamental process in many image and computer vi-

sion applications. There have been many methods de-

veloped for image segmentation over the years, with

three main categories emerging: pixel-based, region-

based and boundary-based methods. In pixel-based

methods, pixels with similar features are grouped to-

gether without considering spatial relationship, while

region-based methods deﬁne objects as regions of

pixels with homogeneous characteristics. Many ap-

proaches exist in all three categories, including for

instance intra-region uniformity metrics, inter-region

dissimilarity metrics and shape measures. One of

the main differences between intra- and inter-region

techniques is that the former may result in discon-

tiguous objects. We refer the reader to the survey

of Zhang et al. (Zhang et al., 2008) for more infor-

mation on unsupervised image segmentation. Graph-

based segmentation techniques have also been pro-

posed. For instance, the Felzenszwalb and Hut-

tenlocher method (Felzenszwalb and Huttenlocher,

2004) relies on the computation of minimum span-

ning trees for grid-like graphs. More recently, many

approaches have been developed using community

detection algorithms, a well-studied tool from com-

plex network analysis. Since our work also relies on

complex network analysis tools, we give more insight

on such techniques.

Related Work. Community detection algorithms

have been used to produce state-of-the-art segmenta-

tion techniques (Browet et al., 2011; Li and Wu, 2014;

Mourchid et al., 2016; Nguyen et al., 2019). Both

pixel and region-based approaches have been pro-

posed. In the former case, an undirected (un)weighted

graph is derived from pixels of the image at hand, and

community detection algorithms are then applied to

obtain the segmented image. Regarding region-based

methods, an undirected (un)weighted graph (so-called

Region Adjacency Graph) is ﬁrst obtained from an

initial set of regions, called superpixels (Achanta

et al., 2012). Then, a community detection algo-

rithm is applied to obtain the sought segmentation

(see e.g. (Li and Wu, 2014)). Both superpixels and

community detection-based segmentation techniques

are known to have an over-segmentation effect. To

circumvent this issue, Nguyen et al. (Nguyen et al.,

2019) additionally use a merging procedure to ag-

Perez, A.

GeST: A New Image Segmentation Technique based on Graph Embedding.

DOI: 10.5220/0010191502450252

In Proceedings of the 16th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP 2021) - Volume 4: VISAPP, pages

245-252

ISBN: 978-989-758-488-6

245

glomerate similar regions as computed by communi-

ties. Such an idea has already been used by Tr

emeau

and Colantoni (Tr

emeau and Colantoni, 2000) in a

similar context. One may also use additional im-

age features (such as colors and textures) to compute

the segmentation (Li and Wu, 2014; Nguyen et al.,

2019). Notice that since communities correspond to

connected subgraphs, segmentations computed with

such methods consist of contiguous regions.

Our Contribution. We propose a new approach

using a recent complex network analysis technique,

namely graph embedding. The aim of graph embed-

ding is to compute low-dimensional representations

of nodes of a graph that encompass structural prop-

erties such as neighborhoods and community struc-

ture (Grover and Leskovec, 2016; Cai et al., 2018).

A particular feature of graph embedding is that two

nodes of the network may have similar representa-

tions while not being connected. Based on this tech-

nique, we propose a general framework for image seg-

mentation: starting from a set of superpixels, we next

compute embeddings of the Region Adjacency Graph

(RAG) and then use a clustering algorithm to obtain

a set of regions. A merging procedure may ﬁnally be

applied to obtain the sought segmentation. Note that

methods using community detection algorithms either

work on a large graph (Nguyen et al., 2019) or on a

small set of superpixels represented by a large num-

ber of features (Mourchid et al., 2016). We hence

propose a combination of both ideas, that is a low-

dimensional representation of a small set of superpix-

els to produce segmentations using any clustering al-

gorithm. We develop a python implementation

such a framework, namely GeST (Graph embedding

Segmentation Technique). Since our aim is to high-

light the relevance of graph embedding for image seg-

mentation, we use very few image features. While our

approach is region-based, the computed regions may

be discontiguous due to the very nature of graph em-

bedding. A set of contiguous regions can however

be easily derived from our results. Using a publicly

available dataset, we emphasize that GeST achieves

state-of-the-art results while requiring low computa-

tional complexity and memory.

Outline. We ﬁrst give a high-level description of

our framework, introducing deﬁnitions and nota-

tions for superpixels and graph embedding algorithms

(Section 2). We also provide details about the python

implementation of such a framework. We next turn

our attention to the experimental setup by describing

https://github.com/anthonimes/GeST

Figure 1: An image together with its associated RAG.

the dataset used as well as state-of-the art methods

we compare to (Section 3). We then give qualitative

and quantitave evaluations of our method (Section 4)

and conclude with future research perspectives (Sec-

tion 5).

2 DESCRIPTION OF THE

APPROACH

Superpixels and Graph Representation. There

are several ways to represent a given image as a graph.

In a ﬁrst place, one may consider each pixel of the

image as a node of the representing graph, and con-

nect pixels according to some distance function (e.g.

feature colors or Manhattan’s distance within the im-

age). This is the approach followed by Nguyen et

al. (Nguyen et al., 2019) who then apply a commu-

nity detection algorithm to obtain a ﬁrst segmentation

of the image. While their construction uses a similar-

ity threshold to determine connections between pix-

els, this is not reﬂected on the resulting unweighted

graph. When considering large images, this approach

may lead to graphs with a high number of vertices

and edges and thus not be scalable. As a workaround,

many works (Li and Wu, 2014; Linares et al., 2017;

emeau and Colantoni, 2000) consider superpixels,

that assign a region to each pixel of the original image.

This results in a partition of the pixels set, that is con-

sidered as an initial segmentation. One can then easily

associate a Region Adjacency Graph (RAG) to such a

segmentation by taking one node per region and by

connecting regions whose pixels share some bound-

aries. Image segmentation techniques using this ap-

proach usually require high-dimensional representa-

tions of the obtained regions to compute the ﬁnal seg-

mentation. In this work, we propose to exploit nice

properties of both approaches by considering RAGs

with low-dimensional yet relevant vector representa-

tions. We thus consider the superpixels framework,

and represent images as undirected weighted graphs

VISAPP 2021 - 16th International Conference on Computer Vision Theory and Applications

246

G = (V, E, ω), where V is the set of regions computed

by the superpixels algorithm, E ⊆ V ×V denotes the

set of adjacent regions and ω : E → R is any similar-

ity measure. As observed in previous works (Li and

Wu, 2014), the L*a*b* space is the closest to the hu-

man perception and is hence the one chosen for our

implementation. We thus use such a color space to

deﬁne the weight function ω. For every region R we

consider the mean of each color channel C, that is:

Mean(R) =

|R|

∑

i=1

(1)

As a result, every region is represented by a 3-

dimensional color feature vector. We then weight

any edge e

∈ E according to the Euclidean dis-

tance d(R

, R

) between vectors of the corresponding

regions R

and R

. In order to obtain a similarity mea-

sure and thus properly deﬁne the weight function ω,

we use a Gaussian type radius basis function:

ω(R

, R

) = exp

−d(R

, R

)

2 · σ

(2)

Graph Embedding. The aim of graph embedding

is to represent nodes of a given graph using low-

dimensional vectors (embeddings) that capture struc-

tural properties of the network at hand (e.g. neighbor-

hoods and community structure). Formally:

Deﬁnition 1. Let G = (V, E, ω) be a graph. A graph

embedding in dimension d is a function φ : V → R

mapping every node of G to a d-dimensional vector.

Different frameworks have been developed (see

for instance (Cai et al., 2018) for a recent survey). As

mentioned earlier, one may either consider a weighted

or an unweighted graph when representing images.

Our experiments showed that using embeddings with-

out considering weights provide less relevant results.

While most graph embedding techniques are designed

to cope with unweighted networks, the state-of-the-art

node2vec framework (Grover and Leskovec, 2016) is

implemented to deal with weights. More details about

this framework are provided at the end of this section.

Clustering. Once the embeddings have been com-

puted, a clustering algorithm is applied to obtain the

ﬁnal image segmentation, either on the embeddings

only or with additional image features. The pseudo-

code of our approach is given Algorithm 1.

Merging Similar Regions on Selected Images.

Following ideas proposed in community detection ap-

proaches (Li and Wu, 2014; Nguyen et al., 2019;

emeau and Colantoni, 2000), our algorithm tries as

Algorithm 1: General framework.

Input : An image I in the L*a*b* space

Output: A segmentation S of I

1 P ← initial segmentation using superpixels;

2 G ← weighted RAG from P ;

3 emb ← embeddings computed from G plus

additional (optional) features;

4 S ← segmentation obtained from a clustering

algorithm applied on emb;

5 return S or Merge (S);

a ﬁnal step to merge small and similar regions (w.r.t.

image features). Whenever a region has a number of

pixels below some empirically ﬁxed threshold, it is

merged with its most similar adjacent region. Like-

wise, adjacent regions sharing similar features are

merged together. Since color features may be very

different from an image to another, we did not manage

to ﬁnd a universal threshold for similarity. To circum-

vent this issue, we follow ideas from Liu et al. (Liu

et al., 2011), who proposed a model to estimate image

segmentation difﬁculty. To that end, the authors sug-

gest to consider the F-measure of images with known

ground-truth segmentations. We thus use such a mea-

sure to select a set of about 20 images where Algo-

rithm 1 fails to properly segment. We then apply the

merging procedure (Algorithm 2) on this set of im-

ages only.

Algorithm 2: Merging procedure.

Input : A segmentation S, a pixel threshold

t p ∈ N, a similarity threshold ts ∈ R

Output: A segmentation S

merged from S

1 while a merging occurs do

2 G

← RAG from S;

3 for every edge R, R

of G

4 sim ← similarity between R and R

;

5 if sim > ts then

6 S ← merge regions R and R

;

7 for every node R of G

8 if |R| 6 t p then

9 R

← closest adjacent region of R;

10 S ← merge regions R and R

;

Implementation of Algorithm 1. To compute the

initial segmentation (Line 1), we use the Mean

Shift-based EDISON algorithm (Christoudias et al.,

2002)

. Experiments have shown that using the Mean

Shift algorithm to compute superpixels provide the

best results, even if its computational complexity is

https://github.com/fjean/pymeanshift

GeST: A New Image Segmentation Technique based on Graph Embedding

247

a bit higher than SLIC (Achanta et al., 2012). We

now give more details about the computation of em-

beddings (Line 3) using node2vec. The presenta-

tion follows that of Grover and Leskovec (Grover and

Leskovec, 2016). This framework is based on (bi-

ased) random walks, with parameters that allow to

simulate different behaviors with respect to the graph

at hand. More formally, given a source node u, a ran-

dom walk {c

, . . . , c

l−1

} of ﬁxed length l is simulated.

Let c

= u. The remaining nodes of the sequence are

generated by the following distribution:

P(c

= x |c

i−1

= y) =

(

i f yx ∈ E

0 otherwise

where π

is the unnormalized transition proba-

bility between nodes y and x, and Z is the normal-

izing constant (Grover and Leskovec, 2016). In order

to bias random walks, the node2vec framework in-

troduces two parameters, namely return parameter p

and in-out parameter q. Formally, a 2

order random

walk with these two parameters guides the walk: as-

suming the state of the random walk is c

i−1

= x and

= y, the walk needs to decide its next step and thus

evaluates the transition probability π

on edges yv.

The unnormalized transition probability π

is set to

= α

(x, v) · ω(y, v) with:

(x, v) =











i f d(x, v) = 0

1 i f d(x, v) = 1

i f d(x, v) = 2

where d(u, v) denotes the shortest distance be-

tween nodes u and v and ω(u, v) the weight of edge uv.

These two parameters can be adjusted to either sim-

ulate a Breadth-First Search or a Depth-First Search

exploration of the graph. Another way of describ-

ing these parameters is that if q > 1, then structural

equivalence between nodes will be prioritized. This

means that nodes that are far apart in the graph but

share similar structure will lie close in the embedding

space. On the other hand, q < 1 will emphasize graph

connectivity, related to community structure. See Fig-

ure 2 for an illustration of both cases. Finally, features

are learned using stochastic gradient descent (Recht

et al., 2011). We refer the reader to (Grover and

Leskovec, 2016) for a more accurate description of

the node2vec framework.

We tried several clustering algorithms (Line 4),

and found out that the impact of the algorithm used is

not really signiﬁcant. However, the set of parameters

used within a given algorithm may have an impact on

the results. The most important parameter that needs

to be adjusted in all cases is the number of clusters k

Figure 2: Two possible outcomes for node2vec depending

on the set of parameters (Grover and Leskovec, 2016). The

above graph corresponds to p = 1 and q = 0.5, and thus

emphasizes community structure. The second graph corre-

sponds to p = 1 and q = 2 and emphasizes structural equiv-

alence.

to be computed. We will discuss such parameters Sec-

tion 3. We used Agglomerative Clustering (Ward Jr,

1963) with cosine distance and average linkage. Re-

garding image features, since our aim is to highlight

the relevance of graph embedding in image segmenta-

tion, we use only simple features as a way to measure

similarity between adjacent regions. In particular, we

consider only color features. Hence, in addition to

the mean color of each region R (Equation 1), we also

consider the standard deviation:

Std(R) =

|R|

∑

i=1

− Mean(R))

(3)

As we shall discuss Section 5, considering a more

intricate similarity measure may provide better re-

sults.

3 EXPERIMENTAL SETUP

Dataset and Resources. We use the publicly avail-

able BSDS500 Berkeley dataset (Arbelaez et al., 2010)

which consists of a set of 100 images. For ev-

ery image, ﬁve to eight manually computed ground-

truth segmentations are available. All algorithms are

written using open-source python libraries, such as

skimage (Van der Walt et al., 2014), numpy (Oliphant,

2006) and sklearn (Pedregosa et al., 2011) for main

image, RAG and metrics processing. All experiments

were conducted on a Dell Latitude 5490 with 16 Gb

VISAPP 2021 - 16th International Conference on Computer Vision Theory and Applications

248

of RAM and a 8xIntel Core i5-8350U 1.70Ghz. The

segmentation process takes a few seconds per image,

and really low memory (less than 2%). Notice more-

over that we did not make any optimization to the

code, and a more thorough analysis is considered as

extension of this work. We now describe most param-

eters used in our experiments.

Parameters. We ﬁrst give parameters for the ini-

tial superpixels segmentation, namely spatial radius

, range radius h

and minimum size of computed re-

gions M. We followed parameters described by Chris-

toudias et al. (Christoudias et al., 2002) and obtained

the best performances and results using (h

= 7, h

4.5, M = 50). To compute the RAG of such a segmen-

tation, we empirically set σ ≈ 7.9, and d(·, ·) is chosen

to be the Euclidean distance between the mean color

vectors of the corresponding regions (Equation (2)).

Regarding the node2vec framework, we focused on

preserving community structure and hence set p = 2

and q = 0.5. Embeddings are in dimension 16, the

number of walks per node is 20 and the walk length

is set to 20. Finally, before applying Agglomera-

tive Clustering (Ward Jr, 1963), we add as features

to the embeddings the mean and standard deviation

of every region (Equations (1) and (3)), resulting in

a 22-dimensional feature vector. We present results

obtained with an empirically determined number of

clusters of k = 21, that provides relevant segmenta-

tions for the Berkeley dataset. Note that the mean

value of the number of regions of ground-truth seg-

mentations is around 20. To allow for an automatic

selection of k, we also present results obtained using

the mean of three known quality scores for cluster-

ing: silhouette criterion (Rousseeuw, 1987), Cali

nski-

Harabasz (Cali

nski and Harabasz, 1974) and Davies-

Bouldin scores (Davies and Bouldin, 1979), with k

ranging from 2 to 24. High values of the ﬁrst two

scores and low value of the last one mean good clus-

terings. As one can see Table 1, the difference ob-

tained between both values of k does not signiﬁcantly

alter the results. Hence, in order to provide a method

as general as possible, we use the aforementioned cri-

teria to automatically estimate the number of regions

to compute. We now turn our attention to the merg-

ing procedure (Algorithm 2). To deﬁne similarity be-

tween regions, we use the same setting than Nguyen

et al. (Nguyen et al., 2019) and compute the cosine

similarity of features described Equations (1) and (3).

Given two d-dimensional vectors u and v, the cosine

similarity is deﬁned as:

cosine(u, v) =

u · v

kuk · kvk

(4)

Evaluation Metrics. Since the considered dataset

comes with several ground-truth segmentations (from

5 to 8), we use the Probabilistic Rand Index (PRI) to

measure the quality of our algorithm. This measure

has been introduced by Unnikrishnan and Hebert (Un-

nikrishnan and Hebert, 2005) and is used when

multiple ground-truth segmentations are provided.

Roughly speaking, PRI corresponds to the mean of

the Rand Index over all ground-truth segmentations.

More formally, the aim is to compare a test segmen-

tation through soft nonuniform weighting of pixel

pairs as a function of the variability in the ground-

truth set (Unnikrishnan and Hebert, 2005; Unnikrish-

nan et al., 2007). In other words, the PRI measures

the probability that pairs of pixels have consistent la-

bels in the set of ground-truth segmentations (Nguyen

et al., 2019). Given such a set of ground-truth seg-

mentations S

and a test segmentation S on n pixels,

the PRI is formally deﬁned as:

PRI(S, S

) =

∑

i< j



i j

+ (1 − c

i j

) · (1 − p

i j

)







(5)

where c

i j

denotes the event of a pair of pixels

i and j having the same label in S, and p

i j

is the

expected value of a random variable deﬁned using

the corresponding Bernouilli distribution on the

ground-truth segmentations.

We also provide the Variation of Information (VI)

to allow a better comparison with state-of-the-art seg-

mentation techniques. This measure was introduced

by Meil

a (Meil

a, 2005) in order to compare two

clusterings according to their information difference.

Given two clusterings X and Y , VI is deﬁned as:

V I(X,Y ) = H(X) + H(Y ) − 2I(X,Y ) (6)

where H(X ) is the entropy of X and I(X,Y ) the

Mutual Information between X and Y .

4 QUALITATIVE AND

QUANTITATIVE EVALUATIONS

Qualitative Evaluation. We present several seg-

mentations obtained on the BSDS500 dataset, and

compare it to the provided ground-truth segmenta-

tions. Due to the very nature of our algorithm, we pro-

pose both images with and without the merging pro-

cedure. Recall that due to the clustering and merging

procedures, the number of clusters of segmentations

is variable depending on the image at hand. See Fig-

ures 3 to 5.

GeST: A New Image Segmentation Technique based on Graph Embedding

249

(a) Original image (b) Initial segmentation (c) GeST segmentation (d) Colored regions

Figure 3: Illustration of the segmentation process (k = 22).

Quantitative Evaluation. We provide PRI and VI

results for several known segmentation techniques.

Since our approach is graph-based, we mainly com-

pare to such methods with a particular focus on tech-

niques based on community detection or modularity

optimization. We brieﬂy present such methods:

• EDISON (Christoudias et al., 2002) uses the Mean

Shift algorithm as building block.

• Weighted Modularity Segmentation

(WMS, (Browet et al., 2011)) uses an approx-

imation of the Louvain method that unfolds

community structures in large graphs (Blondel

et al., 2008).

• Li-Wu (Li and Wu, 2014), Fast Multi-Scale

and Modularity optimization (FMS and

MO (Mourchid et al., 2016)), Louvain (Nguyen

et al., 2019) are based on modularity optimiza-

tion, together with image features (including

histogram of oriented gradients) and agglom-

erative algorithms for merging similar regions.

The main difference lies in the fact that the ﬁrst

methods are region-based while the last one is

pixel-based.

• Felzenszwalb and Huttenlocher

(F&H (Felzenszwalb and Huttenlocher, 2004)) is

a graph-based method using grid-like graphs and

minimum spanning trees.

As one can see Table 1, our method provides rele-

vant results for both PRI and VI measures. Since our

method is not deterministic, we present the average

result taken over 10 runs. The ﬁrst row corresponds

to a clustering with k = 21 clusters. The second row

is the method with an automatic selection of k.

The Impact of Embeddings. In order to illustrate

that the combination of embeddings of the Region

Table 1: Quantitative evaluation of different algorithms on

BSDS500 (see (Li and Wu, 2014; Nguyen et al., 2019)). Bot-

tom rows correpond to complex network-based segmenta-

tion techniques.

Methods PRI VI

GeST (k = 21) 0

9 2

GeST (automatic) 0

7 2

EDISON 0.786 2.002

F&H 0.770 2.188

WMS 0.752 2.103

Li-Wu 0.777 1.879

MO 0.803 −

FMS 0.811 −

Louvain 0

2 1

Table 2: Relevance of embeddings and color features.

Method PRI VI

GeST-FV 0.700 2.672

GeST-n2v 0.798 2.286

Adjacency Graph with image features is indeed ac-

curate, we applied Algorithms 1 and 2 with two other

sets of features, namely color features only (GeST-FV)

and embeddings only (GeST-n2v). The results de-

picted Table 2 show that while embeddings alone are

meaningful, using additional image features improve

the results.

Computing Different Clusterings. Since our

method relies on a clustering algorithm, the number

Table 3: Results obtained by keeping best segmentation

(w.r.t. PRI) for 2, 8, 15, 21 and 30 clusters.

Method PRI VI

GeST 0

0 1

VISAPP 2021 - 16th International Conference on Computer Vision Theory and Applications

250

Figure 4: Impact of the merging procedure on two images

from the BSDS500 dataset. The original image is displayed

on top, and then Mean Shift initial segmentation, GeST

and merging results are displayed.

of clusters automatically determined by the algorithm

may have a great impact on the resulting segmen-

tation. Following an approach from Arbelaez et

al. (Arbelaez et al., 2010), we propose to compute

clusterings using a number of clusters ranging in

{2, 8, 15, 21, 30} and to keep the best result for

each image (w.r.t. PRI). The results are presented

Table 3. One can see that for every image, our

method encompasses a segmentation that agrees

with all ground-truth with high values of PRI and

low values of VI. Figure 6 illustrates the impact of

the number of clusters w.r.t. the evaluation metrics

presented Section 3. While PRI values are stable

from k = 12 with best values for k = 21, we observe

that the VI has lowest values for a lower number of

clusters. Recall that the average number of regions

for ground-truth segmentations is 20.

5 CONCLUSION

In this work we used a recent technique from com-

plex network analysis, so-called graph embedding,

as cornerstone for image segmentation. We propose

a general framework based on such a technique and

Figure 5: Left: different segmentations obtained with

8, 15, 21 and 30 clusters, respectively. Right: ground-truth

segmentations with 13, 14, 19 and 44 regions, respectively.

Figure 6: Mean PRI and VI values for GeST with number of

clusters k ranging from 2 to 24.

obtain state-of-the-art results with low computational

complexity and memory. Our method also relies on a

set of pre-computed superpixels with a merging pro-

cess, that have been used on community detection ap-

proaches (Li and Wu, 2014; Mourchid et al., 2016;

Nguyen et al., 2019). Since our aim was to illustrate

the relevance of graph embedding for image segmen-

tation, we relied on few image-related features. This

may hence be a ﬁrst step toward obtaining better re-

sults. Moreover, it would be interesting to use a su-

pervised method to determine most parameters, that

were mainly set empirically or without the use of prior

knowledge provided by available test sets. This is for

instance the case for the number of clusters, which

GeST: A New Image Segmentation Technique based on Graph Embedding

251

can have a great impact on the result. While this can

be improved by merging, it would be interesting to see

whether classiﬁcation can improve results. Finally,

we hope that this approach will yield a new direction

for image segmentation.

ACKNOWLEDGEMENTS

The author would like to thank Nicolas Dugu

e for

fruitful discussions regarding clustering algorithms.

REFERENCES

Achanta, R., Shaji, A., Smith, K., Lucchi, A., Fua, P., and

usstrunk, S. (2012). Slic superpixels compared to

state-of-the-art superpixel methods. IEEE transac-

tions on pattern analysis and machine intelligence,

34(11):2274–2282.

Arbelaez, P., Maire, M., Fowlkes, C., and Malik, J. (2010).

Contour detection and hierarchical image segmenta-

tion. IEEE transactions on pattern analysis and ma-

chine intelligence, 33(5):898–916.

Blondel, V. D., Guillaume, J.-L., Lambiotte, R., and Lefeb-

vre, E. (2008). Fast unfolding of communities in large

networks. Journal of statistical mechanics: theory

and experiment, 2008(10):P10008.

Browet, A., Absil, P.-A., and Van Dooren, P. (2011). Com-

munity detection for hierarchical image segmentation.

In International Workshop on Combinatorial Image

Analysis, pages 358–371. Springer.

Cai, H., Zheng, V. W., and Chang, K. C.-C. (2018). A com-

prehensive survey of graph embedding: Problems,

techniques, and applications. IEEE Transactions on

Knowledge and Data Engineering, 30(9):1616–1637.

Cali

nski, T. and Harabasz, J. (1974). A dendrite method for

cluster analysis. Communications in Statistics-theory

and Methods, 3(1):1–27.

Christoudias, C. M., Georgescu, B., and Meer, P. (2002).

Synergism in low level vision. In Object recognition

supported by user interaction for service robots, vol-

ume 4, pages 150–155. IEEE.

Davies, D. L. and Bouldin, D. W. (1979). A cluster separa-

tion measure. IEEE transactions on pattern analysis

and machine intelligence, (2):224–227.

Felzenszwalb, P. F. and Huttenlocher, D. P. (2004). Efﬁ-

cient graph-based image segmentation. International

journal of computer vision, 59(2):167–181.

Grover, A. and Leskovec, J. (2016). node2vec: Scalable

feature learning for networks. In Proceedings of the

22nd ACM SIGKDD, pages 855–864.

Li, S. and Wu, D. O. (2014). Modularity-based image seg-

mentation. IEEE Transactions on Circuits and Sys-

tems for Video Technology, 25(4):570–581.

Linares, O. A., Botelho, G. M., Rodrigues, F. A., and Neto,

J. B. (2017). Segmentation of large images based on

super-pixels and community detection in graphs. IET

Image Processing, 11(12):1219–1228.

Liu, D., Xiong, Y., Pulli, K., and Shapiro, L. (2011). Esti-

mating image segmentation difﬁculty. In International

Workshop on Machine Learning and Data Mining in

Pattern Recognition, pages 484–495. Springer.

Meil

a, M. (2005). Comparing clusterings: an axiomatic

view. In Proceedings of the 22nd international con-

ference on Machine learning, pages 577–584.

Mourchid, Y., Hassouni, M. E., and Cheriﬁ, H. (2016). An

image segmentation algorithm based on community

detection. In COMPLEX NETWORKS 2016, Milan,

Italy, pages 821–830.

Nguyen, T., Coustaty, M., and Guillaume, J. (2019). A

combination of histogram of oriented gradients and

color features to cooperate with louvain method based

image segmentation. In VISIGRAPP 2019, Volume

4: VISAPP, Prague, Czech Republic, February 25-27,

2019, pages 280–291. SciTePress.

Oliphant, T. E. (2006). A guide to NumPy, volume 1. Trel-

gol Publishing USA.

Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V.,

Thirion, B., Grisel, O., Blondel, M., Prettenhofer,

P., Weiss, R., Dubourg, V., Vanderplas, J., Passos,

A., Cournapeau, D., Brucher, M., Perrot, M., and

Duchesnay, E. (2011). Scikit-learn: Machine learning

in Python. Journal of Machine Learning Research,

12:2825–2830.

Recht, B., Re, C., Wright, S., and Niu, F. (2011). Hog-

wild: A lock-free approach to parallelizing stochastic

gradient descent. In Advances in neural information

processing systems, pages 693–701.

Rousseeuw, P. J. (1987). Silhouettes: a graphical aid to

the interpretation and validation of cluster analysis.

Journal of computational and applied mathematics,

20:53–65.

emeau, A. and Colantoni, P. (2000). Regions adjacency

graph applied to color image segmentation. IEEE

Transactions on image processing, 9(4):735–744.

Unnikrishnan, R. and Hebert, M. (2005). Measures of

similarity. In 2005 Seventh IEEE Workshops on Ap-

plications of Computer Vision (WACV/MOTION’05)-

Volume 1, volume 1, pages 394–394. IEEE.

Unnikrishnan, R., Pantofaru, C., and Hebert, M. (2007). To-

ward objective evaluation of image segmentation al-

gorithms. IEEE transactions on pattern analysis and

machine intelligence, 29(6):929–944.

Van der Walt, S., Sch

onberger, J. L., Nunez-Iglesias, J.,

Boulogne, F., Warner, J. D., Yager, N., Gouillart, E.,

and Yu, T. (2014). scikit-image: image processing in

python. PeerJ, 2:e453.

Ward Jr, J. H. (1963). Hierarchical grouping to optimize an

objective function. Journal of the American statistical

association, 58(301):236–244.

Zhang, H., Fritts, J. E., and Goldman, S. A. (2008). Image

segmentation evaluation: A survey of unsupervised

methods. Computer vision and image understanding,

110(2):260–280.

VISAPP 2021 - 16th International Conference on Computer Vision Theory and Applications

252