A Statement Report on the Use of Multiple Embeddings

for Visual Analytics of Multivariate Networks

Daniel Witschard

, Ilir Jusuﬁ

, Rafael M. Martins

and Andreas Kerren

Department of Computer Science and Media Technology, Linnaeus University, V

axj

o, Sweden

Keywords:

Multivariate Network, Visualization, Visual Analytics, Embedding, Methodology.

Abstract:

The visualization of large multivariate networks (MVN) continues to be a great challenge and will probably

remain so for a foreseeable future. The ﬁeld of Multivariate Network Embedding seeks to meet this challenge

by providing MVN-speciﬁc embedding technologies that targets different properties such as network topology

or attribute values for nodes or links. Although many steps forward have been taken, the goal of efﬁciently

embedding all aspects of a MVN remains distant. This position paper contrasts the current trend of ﬁnding

new ways of jointly embedding several properties with the alternative strategy of instead using, and combining,

already existing state-of-the-art single scope embedding technologies. From this comparison, we argue that

the latter strategy provides a more generic and ﬂexible approach with several advantages. Hence, we hope to

convince the visual analytics community to invest more work in resolving some of the key issues that would

make this methodology possible.

1 INTRODUCTION

A large amount of research effort has been put into

the problem of visualization and visual analytics (VA)

regarding large multivariate networks (MVNs). This

is a complex and challenging problem both from a

pure visualization perspective as well as from a com-

putational perspective. While the ﬁeld of MVN visu-

alization is well-studied—and there exist several ap-

proaches for displaying and interacting with MVNs

(Kerren et al., 2014; Martins et al., 2017; Nobre et al.,

2019)—the size of many MVNs (for example so-

cial networks such as Twitter and Facebook) exceed

the limits of what can be efﬁciently handled by such

methods directly. This means that in many cases, ef-

fective pre-processing and extensive computations are

vital steps before visualization even becomes feasible.

Furthermore, there is a constant need for more efﬁ-

cient analysis algorithms from a computational per-

spective. Hence, methods for effective representa-

tions and efﬁcient calculations for MVNs would be

beneﬁcial from both a visualization perspective and

a VA perspective. It is our conviction that combined

https://orcid.org/0000-0001-6150-0787

https://orcid.org/0000-0001-6745-4398

https://orcid.org/0000-0002-2901-935X

https://orcid.org/0000-0002-0519-2537

forces from these two areas are needed to achieve the

progress needed.

Embeddings are relatively low-dimensional vec-

tor representations of the embedded items and embed-

ding algorithms exist for different types of data. (E.g.,

the nodes of a network or the words of a text corpus).

The main goal of the emdedding algorithm is usually

to produce embeddings in a way so that items that

are similar (from some chosen aspect) in the original

data set are embedded in vectors that lie close to each

other in the embedding space (with regard to some

chosen distance metric). Therefore embeddings can,

for instance, be used for efﬁcient similarity compu-

tations over a data set. From a technical viewpoint

there are several arguments to suggest that embedding

technology is probably a suitable candidate for meet-

ing the challenge of MVN visualization and MVN vi-

sual analytics. The main argument for this is that it

has already proven to be an efﬁcient and successful

strategy within other problem areas (Goyal and Fer-

rara, 2017; Toshevska et al., 2020) In recent years,

several embedding techniques speciﬁc to MVNs have

been introduced (Cui et al., 2019) allowing for new

and ingenious progress (see Figure 1). Therefore, it

is not surprising that a lot of effort is currently be-

ing put into exploring ways to create new and better

MVN embeddings. In this scenario, a common ap-

proach is to try to simultaneously embed several prop-

Witschard, D., Jusuﬁ, I., Martins, R. and Kerren, A.

A Statement Report on the Use of Multiple Embeddings for Visual Analytics of Multivariate Networks.

DOI: 10.5220/0010314602190223

In Proceedings of the 16th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP 2021) - Volume 3: IVAPP, pages

219-223

ISBN: 978-989-758-488-6

219

Figure 1: Embeddings have become important tools for MVN analysis and made the way for substantial progress. Technolo-

gies for embedding properties such as network topology and attributes on nodes or edges have enabled more computationally-

efﬁcient ways to support common objectives and tasks.

erties (by this we mean different characteristics of the

MVN such as network topology, attributes on nodes

or edges etc.) using so-called content-enhanced rep-

resentation learning or similar techniques (Hamilton

et al., 2018; Sun et al., 2016; Lerique et al., 2020).

While this has proven successful for some applica-

tions, we would like to point out that the typical multi-

property MVN embedding technology is quite narrow

in its scope regarding the amount of properties it can

capture (typically only two). Hence, the quest for

the—highly desirable—algorithm capable of captur-

ing all properties of the MVN at once still remains an

open challenge, at least for the foreseeable future.

In this paper, we therefore argue for an alternative

approach, and take the position that there seems to be

a currently unexploited way that could hopefully lead

towards the goal of comprehensive and efﬁcient MVN

embeddings. The alternative strategy that we pro-

pose is to seek novel ways to combine and leverage

the plethora of already-existing state-of-the-art em-

bedding technologies, also from other ﬁelds that do

not necessarily target MVNs. Furthermore, we argue

that this should be done in a way that will allow for

exploitation of:

1. the beneﬁts of single focus,

2. the ﬂexibility of separation, and

3. the power of combination.

These three concepts are outlined in the following

sections and provide the overall structure of this pa-

per. To make our reasoning more concrete, and hope-

fully easier to follow, we will assume that the embed-

dings we discuss are used for similarity calculations

(which is a common case), but our arguments are not

dependent on this.

2 THE BENEFITS OF SINGLE

FOCUS

Embedding algorithms are typically quite complex in

their structure (Cui et al., 2019) (for example, it is not

uncommon that they are based on model training and

loss function minimization.) Therefore, calculating

good embeddings for even a single property is usu-

ally a non-trivial task in itself, with many challenges.

The case of simultaneously embedding two or more

properties (see Figure 2) can, in general, be viewed as

a joint optimization problem, so that more complex-

ity is added on top of the already existing one. Fur-

thermore, joint optimization problems usually contain

some amount of trade-off between objectives since it

may be hard, or impossible, to ﬁnd a common optima

(Ngatchou et al., 2005). It is therefore reasonable to

argue that an embedding algorithm that targets a sin-

gle property will have the potential for higher-quality

similarity calculations (for that speciﬁc property) than

an algorithm that tries to bundle it together with oth-

ers. There is, of course, no strict guarantee that that

will be the case for all situations and for all data sets.

Thus, we claim that it is likely that the highest

quality for similarity calculations, as measured on a

per-property basis, will be obtained by algorithms that

target only a single property. Furthermore, we note

that this line of reasoning deliberately opens the pos-

sibility to use algorithms that do not speciﬁcally tar-

IVAPP 2021 - 12th International Conference on Information Visualization Theory and Applications

220

Figure 2: In the CENE framework (Sun et al., 2016), the original MVN is extended by adding new nodes. For each text

attribute on an existing node, a new child node (of a different type) is added with links to its parent node(s). The resulting

augmented (multimodal) network, now containing two different types of nodes and two different types of links, is then used

in the embedding algorithm for joint optimization of network structure and textual content.

get MVNs. This is highly desirable and precisely

our goal, since a state-of-the-art algorithm from the

ﬁeld of word/text embedding, for example, might also

work perfectly well for embedding any text attributes

of an underlying MVN. In other words, the main idea

here would be to apply property-by-property embed-

ding using single-focus embedding methods (as they

would, in theory, produce the best results), and to ﬁnd

a clever way to combine them, as discussed in Sec-

tion 4.

3 THE FLEXIBILITY OF

SEPARATION

Even for algorithms that do not encounter the prob-

lem of simultaneous minimization of loss functions,

mentioned above, there is still a disadvantage in the

fact that the resulting similarities can only be evalu-

ated in combination, and not in separation. While this

may be a desirable in some cases, it is also a drawback

since, as with (Sun et al., 2016), a combined embed-

ding of network topology and text attributes provides

for detection of nodes that lie “near” each other and

have similar text content, but it does not provide for

comparison of nodes that have similar text content but

lie far apart in the network (cf. Figure 2). However, if

the embeddings had been separated into one for each

property, it would have provided for any combination

of the separate similarities, including inverted similar-

ity. This would allow users to pose interesting ques-

tions such as “Show me all the items that are similar

with regard to property A and dissimilar with regard

to property B”. Thus, we claim that a combined set

of single focus embeddings will always provide for

a wider, and more ﬂexible, range of similarity calcu-

lations than a joint embedding of the same set. In

fact, the range of possible similarity calculations of

a joint property embedding will always be a subset of

the range of possibilities of the set of the same proper-

ties embedded one by one. When considering the use

of VA techniques and user domain expertise, this ap-

proach becomes even more powerful compared to the

joint embedding, as it would allow for a more ﬂexible

and complex exploration of the underlying data.

4 THE POWER OF

COMBINATION

For many of the properties that could be of interest

for MVN embedding (e.g., network topology, textual

content, or categorical attributes of nodes) there actu-

ally already exist several state-of-the-art embedding

methods, each with their own strengths and weak-

nesses. Therefore, instead of trying to make a choice

for the best one (which can be subjective and may

vary from case to case), a feasible strategy would be

to try to leverage them all in an ensemble combina-

tion. This is a well-known strategy in many other ar-

eas (Dong et al., 2020), relying for example on the

fact that a well-chosen combination of different clas-

siﬁers often has the potential to outperform any of the

individual contributing classiﬁers.

Another argument to support this claim is the fact

that a combination may hold a larger exploitable po-

tential, since what has been missed by one classiﬁer

may be compensated for by another. This effect is es-

pecially strong for situations when the classiﬁers have

a low level of inter-dependency, and hence it is desir-

able to combine different technologies. Furthermore,

it is not unusual to obtain synergy effects for ensem-

A Statement Report on the Use of Multiple Embeddings for Visual Analytics of Multivariate Networks

221

Figure 3: Applying the general idea of ensemble analysis to embeddings. A selected property is embedded by several different

algorithms (or the same algorithm with different settings) and the results are combined to hopefully achieve a higher quality.

ble combinations such that the quality of the com-

bined result is signiﬁcantly higher than for any of the

contributing parts taken by itself (Opitz and Maclin,

1999). By applying the same line of reasoning, with

some modiﬁcations, to embeddings we believe that it

is possible to use the general idea of ensemble analy-

sis as a valid strategy also for this case (see Figure 3

for details).

5 CONCLUSIONS AND EARLY

RESULTS

In line with the reasoning above, we conclude that

trying to ﬁnd ways to simultaneously embed several

different properties of an MVN is a challenging task

with some inevitable disadvantages. Therefore, a fea-

sible strategy, that will likely provide for the highest

quality and the maximal ﬂexibility, is to instead em-

bed properties one-by-one using state-of-the-art algo-

rithms that target the data type of the speciﬁc property.

Furthermore, if several state-of-the-art algorithms ex-

ist for a speciﬁc property, their respective yields could

probably be combined for even higher quality by us-

ing ensemble calculations. From our perspective, the

main pros of this approach are listed in the following:

• It gives a straight-forward method to obtain and

combine embeddings of many important aspects

of MVNs.

• Using an “all-embedding” approach allows for

building homogeneous and effective pipelines for

back-end calculations and processing.

• Advances from other areas can be readily reused

within the ﬁeld of MVN visualization and VA.

In contrast, the major cons of our approach might be:

• Tailor-made multi-feature embeddings may still

be a better choice for speciﬁc applications.

• An “all-embedding” strategy is not necessarily the

best for MVN analysis.

• Ensemble analysis is not a trivial task, and it will

have to be adapted to the speciﬁc circumstances

of each situation.

Taking all aspects into account, our overall conclusion

is that the pros out-weight the cons, and with this pa-

per we hope to convince the VA community to invest

more work and time to develop

• novel approaches for combining existing embed-

ding technologies in a MVN context,

• generic approaches for analyzing the quality and

yield from embeddings of different data types, and

• generic approaches for assessing the performance

of different ensemble strategies when facing

several state-of-the-art algorithms.

To give further credibility to our claims we conclude

by giving some early results from our work show-

ing that the general ideas from ensemble analysis can

IVAPP 2021 - 12th International Conference on Information Visualization Theory and Applications

222

indeed be applied to embeddings. In our test setup

we have used some 3000 article abstracts from the

IEEE conferences and embedded them in ﬁve differ-

ent ways using paragraph-sized text embedding tech-

nologies. Performing all-to-all similarity calculations

and verifying the results with regards to a small, man-

ually labeled, ground-truth set indicates that an en-

semble approach yields better quality than any of the

single embeddings taken by themselves (Witschard

et al., 2020). However, much work remains to be done

to fully understand the general rules behind this pro-

cess. Nevertheless, these promising ﬁrst results, to-

gether with the generalizability of the approach to any

embedding type, encourage us to use this as a foun-

dation for attempting a general framework for MVN

embedding.

REFERENCES

Cui, P., Wang, X., Pei, J., and Zhu, W. (2019). A survey on

network embedding. IEEE Transactions on Knowl-

edge and Data Engineering, 31(5):833–852.

Dong, X., Yu, Z., Cao, W., Shi, Y., and Ma, Q. (2020). A

survey on ensemble learning. Frontiers of Computer

Science, 14(2):241–258.

Goyal, P. and Ferrara, E. (2017). Graph embedding tech-

niques, applications, and performance: A survey.

arXiv:1705.02801.

Hamilton, W. L., Ying, R., and Leskovec, J. (2018). Rep-

resentation learning on graphs: Methods and applica-

tions. arXiv:1709.05584.

Kerren, A., Purchase, H. C., and Ward, M. O. (2014). Multi-

variate Network Visualization. Springer International

Publisher.

Lerique, S., Abitbol, J. L., and Karsai, M. (2020). Joint

embedding of structure and features via graph convo-

lutional networks. Applied Network Science.

Martins, R. M., Kruiger, J. F., Minghim, R., Telea, A. C.,

and Kerren, A. (2017). MVN-Reduce: Dimensional-

ity Reduction for the Visual Analysis of Multivariate

Networks. In Kozlikova, B., Schreck, T., and Wis-

chgoll, T., editors, EuroVis 2017 - Short Papers. The

Eurographics Association.

Ngatchou, P., Zarei, A., and El-Sharkawi, A. (2005). Pareto

multi objective optimization. In Proceedings of the

13th International Conference on, Intelligent Systems

Application to Power Systems, pages 84–91. IEEE.

Nobre, C., Meyer, M., Streit, M., and Lex, A. (2019). The

state of the art in visualizing multivariate networks.

EUROVIS, Volume 38, Number 3.

Opitz, D. and Maclin, R. (1999). Popular ensemble meth-

ods: An empirical study. Journal of Artiﬁcial Intelli-

gence Research.

Sun, X., Guo, J., Ding, X., and Liu, T. (2016). A general

framework for content-enhanced network representa-

tion learning. arXiv:1610.02906.

Toshevska, M., Stojanovska, F., and Kalajdjieski, J. (2020).

Comparative analysis of word embeddings for captur-

ing word similarities. arXiv:2005.03812.

Witschard, D., Jusuﬁ, I., Martins, R. M., and Kerren, A.

(2020). Multiple embeddings for multivariate net-

work analysis. Poster abstract presented at 6th an-

nual Big Data Conference at Linnaeus University, in

axj

o, Sweden.

A Statement Report on the Use of Multiple Embeddings for Visual Analytics of Multivariate Networks

223