GRAPHICAL MODELS FOR RELATIONS

Modeling Relational Context

Volker Tresp

1,2

, Yi Huang

, Xueyan Jiang

and Achim Rettinger

Siemens AG, Corporate Technology, Munich, Germany

Ludwig-Maximilians-Universit¨at M¨unchen, Munich, Germany

Karlsruhe Institute of Technology, Karlsruhe, Germany

Keywords:

Relational learning, Probabilistic relational models, Relational context information, Recommendation

systems, Graphical models.

Abstract:

We derive a multinomial sampling model for analyzing the relationships between two or more entities. The

parameters in the multinomial model are derived from factorizing multi-way contingency tables. We show

how contextual information can be included and propose a graphical representation of model dependencies.

The graphical representation allows us to decompose a multivariate domain into interactions involving only a

small number of variables. The approach formulates a probabilistic generative model for a single relation. By

construction, the approach can easily deal with missing relations. We apply our approach to a social network

domain where we predict the event that a user watches a movie. Our approach permits the integration of both

information about the last movie watched by a user and a general temporal preference for a movie.

1 INTRODUCTION

In this paper we address the problem of predict-

ing the existence of a relation between two or more

entities. Examples would be relations describing

the interest of a user for items, e.g., watches(User,

Movie), friendship relations in a social network, e.g.,

isFriendsWith(PersonA, PersonB), or patient treat-

ment and patient diagnosis relations in a clinical

setting, e.g., getsTreatment(Patient, Treatment), has-

Disease(Patient, Disease). Although a number of

different approaches have been proposed for this

task (Koller and Pfeffer, 1998; Taskar et al., 2002;

Getoor et al., 2007; Domingos and Richardson, 2007;

Kemp et al., 2006; Xu et al., 2006), matrix factor-

ization approaches are clearly among the leading ap-

proaches since they can readily exploit structure in

relational patterns. For relations with an arity larger

than two, tensor factorization recently have become

popular, enabling the modeling of relations such as

rates(User, Movie, Rating) or watches(User, Movie,

May2011) (Rendle et al., 2010). In most cases, ma-

trix and tensor factorization have been implemented

in a deterministic interpretation, e.g., simply to com-

plete a matrix or a tensor based on a low-rank approx-

imation. Exception are the probabilistic approaches

in (Yu et al., 2006; Chu et al., 2006; Salakhutdi-

nov and Mnih, 2007) where Gaussian models and

Bernoulli models are employed to model preferences

of users for certain items. Here we show that by as-

suming a particular sampling scheme and by normal-

izing the factorized matrix and the factorized tensor,

respectively, we can obtain a probabilistic interpreta-

tion in terms of a multinomial model. In particular,

we assume that a statistical unit or a data point —

and thus also a row in the data matrix— is deﬁned

by a relational tuple, i.e. an instantiated relation. As

an example, let a data point be deﬁned by the ob-

servation that a particular user u watches a particular

movie m, let C be the contingency table of observed

user/movie pairs, and let

C be the factorized and nor-

malized contingency table. Then we would estimate

that

P(u, m) = ˆc

u,m

, where ˆc

u,m

= {

u,m

. An advan-

tage of this approach is that we only model what is

observed, which means that we do not need to em-

ploy a missing data mechanism for unobserved re-

lations. This is particulary useful in the typical sit-

uation where only positive examples for a relation

are available. In many other approaches one needs

to specify if a relational instance not present in the

data should be assumed missing or non-existent. If

modeled as missing, potentially complex missing data

mechanism need to be applied.

Another advantage is that we now can extend the

114

Tresp V., Huang Y., Jiang X. and Rettinger A..

GRAPHICAL MODELS FOR RELATIONS - Modeling Relational Context.

DOI: 10.5220/0003665201140120

In Proceedings of the International Conference on Knowledge Discovery and Information Retrieval (KDIR-2011), pages 114-120

ISBN: 978-989-8425-79-9

 2011 SCITEPRESS (Science and Technology Publications, Lda.)

model with contextual information. Let’s consider the

relation watches(User, Movie, LastMovieWatched-

ByUser, Month) which says that a user watches a

movie in a given month and where we also have infor-

mation about the last movie that the user has watched.

Such a relation can be modeled by a four-way tensor

which would give us, after reconstruction and normal-

ization,

P(User, Movie, LastMovieWatchedByUser,

Month). Naturally, the contingency tables for ten-

sors are very sparse, in particular if one considers

that the involved variables often have many thousand

states; the goal of this paper is to exploit structure in

the data, visualized as graphical models, to generate

data-efﬁcient models. Graphical models are a com-

mon approach for exploiting independencies in high-

dimensional domains.

We believe that this new way of the application of

graphical model can lead to quite interesting and pow-

erful models. A particular beneﬁt is the modularity of

the approach which permits a separate optimization of

local models, which, of course, is the beneﬁt of graph-

ical models —in particular of Bayesian networks and

decomposable models— in general (Lauritzen, 1996).

The paper is organized as follows. In the next sec-

tion, we describe related work. In Section 3 we de-

scribe the basic idea and in Section 4 we develop the

approach using data from a social network. We show

that contextual information can improve the predic-

tion. Section 5 contains our conclusions.

2 RELATED WORK

Graphical models have a long history in expert

systems and statistical modeling (Lauritzen, 1996).

Graphical models have also been applied to relational

domains. Prominent examples are Probabilistic Re-

lational Models (Koller and Pfeffer, 1998; Getoor

et al., 2007), Markov Logic Networks (Domingos and

Richardson, 2007), and Inﬁnite Hidden Relational

Models (Kemp et al., 2006; Xu et al., 2006). Al-

though being very general, the application of these

models to a given relational domain might still be

tricky: Probabilistic Relational Models require in-

volved structural optimization, Markov Logic Net-

works depend on the available of rule sets and logical

expressions (approximately) valid in the domain and

Inﬁnite Hidden Relational Models require complex

inference processes. Here, we focus on the modeling

of a single relation which leads to simpler and scal-

able models. The sampling assumptions in this paper

are similar to the ones made in the pLSI model (Hof-

mann, 1999) and the underlying assumptions in some

matrix and tensor decomposition approaches (Ren-

dle et al., 2010; Wermser et al., 2011), although in

these papers, this sampling assumption is not stated

explicitly. The difference is that here we exploit inter-

dependencies in the domain using graphical models

whereas those approaches form a joint clustering and

factorization model, respectively. It might be inter-

esting to note that (Rendle et al., 2010) uses a simpli-

ﬁed factorized model which consists of sums of terms

deﬁned for individual interactions whereas we obtain

products of simple interaction components. The ar-

gument that higher-order tensor models permit the in-

tegration of contextual background information was

also made in (Wermser et al., 2011).

There is a large literature on matrix completion

methods, which we apply to model the interactions

in the graphical model (Cands and Recht, 2008). In

particular, the winning entry to the NETFLIX com-

petition used matrix completion approaches (Takacs

et al., 2007; Bell et al., 2010). Tensor factoriza-

tion has become an area of growing interest. A re-

cent overview has been provided in (Kolda and Bader,

2009).

In (Yu et al., 2006; Salakhutdinov and Mnih,

2007) contextual information was included in matrix

completion approaches. A Gaussian noise model is

employed which is more suitable for modeling contin-

uous and ordinal quantities, such as a user score for a

movie, than for the likelihood of the existence of a re-

lation, as we are doing here. Also, those approaches

often have difﬁculties in situations where only posi-

tive examples for a relation are available; they need to

distinguish between true negatives (e.g., it is known

that a user does not like a movie) and missing in-

formation (e.g., it is unknown if a user likes a par-

ticular movie). Bernoulli and Gaussian sampling ap-

proaches have been pursued in (Chu et al., 2006; Chu

and Ghahramani, 2009).

3 RELATIONAL POPULATIONS,

GRAPHICAL STRUCTURES,

AND THE MULTINOMIAL

MODEL

In this section we describe the standard object-

centered sampling model and contrast it with the

relation-oriented sampling model used in this paper.

3.1 Standard Object-oriented Sampling

Assumption

Traditionally, statistical units, i.e. data points, are

associated with objects and statistical models con-

GRAPHICAL MODELS FOR RELATIONS - Modeling Relational Context

115

……………

1001

1011

I4I3I2I1

………

I4U2

ID3

I2U1

ID2

I1U1

ID1

Figure 1: Left: In a more traditional view, each row is

deﬁned by a user and the columns represent the different

items. A one indicates that a user has purchased an item.

Right: Each row is deﬁned by an event user-buys-item,

which is the sampling assumption used in this paper.

cern the statistical dependencies between attributes of

those objects. A typical example is a medical do-

main where one analyzes the dependencies between

the attributes of a population of patients, for exam-

ple in form of a Bayesian network. In a data matrix

the patients would deﬁne the rows and would act as a

unique identiﬁers and the attributes would deﬁne the

columns. A fundamental task is then to predict if a

novel object belongs to the same population (density

estimation), or what values a variable has to assume

such that the likelihood that the object belongs to the

same population is maximized (predictive modeling).

This approach is also quite common in modeling

relational domains. For example, one might analyze

the preferences of a population of U users based on

user attributes and based on known preferences for I

items, e.g., buy(User, Item), where the preferences are

essentially also treated as attributes of the users (Fig-

ure 1, Left). In (Breese et al., 1998) a Bayesian net-

work is described where a binary node x

represents

an item and the state of the node indicates if a user has

bought an item (x

= 1) or not (x

= 0). The Bayesian

network models then

P(x

, . . . , x

). (1)

A problem one encounters in these models is that

one needs to distinguish between relationships known

not to exist and relations that are unknown. For ex-

ample, in the Bayesian networks in (Breese et al.,

1998) and in the Dependency Networks (Heckerman

et al., 2000) missing relations are treated as not-to-

exist whereas in (Koller and Pfeffer, 1998; Xu et al.,

2006; Domingos and Richardson, 2007; Getoor et al.,

2007) Gibbs sampling and loopy belief propagation

are used for dealing with unknown relationships.

3.2 Relation-oriented Sampling

Assumption

In our relation-oriented view, an instance is deﬁned

by an observed relation, i.e., a tuple, typically describ-

ing the relationship between two or more objects (Fig-

ure 1, Right). The population then consists of all true

tuples and a sample is a random subset of those true

tuples. Thus, whereas in the previous subsection we

assumed that either users or items deﬁne the rows in

the data matrix, here we assume that each observed

instantiated relation (tuple) deﬁnes a row.

Considering again the relation buy(User, Item), the

data matrix would contain two columns encoding the

user and the item, respectively, and a model would

estimate

P(User = u, Item = i). (2)

Note that whereas Equation 1 describes a probability

distribution over I binary variables, this equation de-

scribes a multinomial model with two variables where

the two variables have U and I states, respectively.

Considering now that we generalize from two to A

attributes that describe a relation, i.e., are informative

for determining the existence of a relation, the basic

problem is to evaluate P(x

, . . . , x

), i.e., the probabil-

ity that a novel relationship with attributes x

, . . . , x

is likely to exist. Alternatively, it might be interesting

to predict the most likely value of one of the attributes

given other attributes, such as P(x

, . . . , x

), e.g.,

the probability of an item x

given a user x

and given

contextual information x

, . . . , x

In object-to-object relationships, variables typi-

cally contain many states and a contingency table in-

volving all variables can be very sparse. In high-

dimensional domains graphical models have been

quite effective in the past (Lauritzen, 1996) and so

in this paper we will apply them as well. As dis-

cussed earlier, the novelty in this paper is that we ap-

ply graphical models in domains where the relations

form the instances and where we model just a sin-

gle relation instead of a whole network of entities and

their relationships.

For our purpose, Bayesian networks and decom-

posable models are most suitable. For a Bayesian net-

work model, the probability distribution factors as

P(x

, . . . , x

) =

∏

i=1

P(x

|par(x

))

∏

i=1

P(x

, par(x

))

P(par(x

))

Typically a Bayesian network is depicted as a di-

rected graphical model without directed loops. In this

model, par(x

) denotes the direct parents of x

Given a Bayesian network structure, the task

is then to model P(x

|par(x

)), or equivalently,

P(x

, par(x

)). If the involved variables have many

states, matrix and tensor completion methods have

been successful in the past and we also apply those

in our approach, as described in the next Section.

KDIR 2011 - International Conference on Knowledge Discovery and Information Retrieval

116

4 DEVELOPMENT OF A

CONCRETE MODEL USING

DATA FROM THE GETGLUE

SOCIAL NETWORK SITE

4.1 GetGlue: A Social Network Site

We based our experiments on GetGlue

(http://getglue.com), a social network that lets

users connect to each other and share Web navigation

experiences. In addition, GetGlue uses semantic

recognition techniques to identify books, movies,

and other similar topics and publishes them in the

form of data streams. Users can observe the streams

and receive recommendations on interesting ﬁndings

from their friends. Both the social network data

and the real-time streams are accessible via Web

APIs. Users have online names, and they know

and follow other users using well-known Semantic

Web vocabularies, such as the Friend of a Friend

(FOAF) vocabulary for user names and the knows

relationship, and the Semantically Interlinked Online

Communities (SIOC) for the follows relationship.

Objects represent real-world entities (such as movies

or books) with a name and a category. Resources

represent information sources that describe the actual

objects, such as webpages about a particular movie

or book.

In the following we use GetGlue data for recom-

mending items, in particular movies, to users. This is

essentially a probability density estimation problem

since we estimate the probability that a novel user-

movie pair belongs to the population.

4.2 Modeling User-movie Events

We model the event that a user watches a movie. The

graphical model consists of two attributes, i.e., the

user and the movie (Figure 2). The rows in the data

matrix are then deﬁned by known user-movie events

and the columns consists of two variables with as

many states as there are users and movies, respec-

tively. A contingency table C is formed. Entry c

u,m

counts how often user u has watched movie m. By

dividing the entries by the overall counts, we can in-

terpret the entries as estimates for the probabilities

of observing a user-movie pair under this sampling

assumption, i.e. as a maximum likelihood estimate

of P(u, m). This matrix will contain many zero en-

tries and the maximum likelihood estimates are noto-

riously unreliable. We follow common practice and

smooth the matrix using a matrix factorization ap-

proach. We perform a singular value decomposition

Figure 2: A graphical model for the dependencies between

users U and movies M.

= U DU

and obtain the low-rank approxima-

tion (Huang et al., 2010)

C = U

diag



+ λ



⊤

where diag



+λ



is a diagonal matrix containing

the s leading eigenvalues in D and where U

contains

the corresponding s columns of U. λ is a regulariza-

tion parameter. After proper normalization

C, the en-

tries can be interpreted as

P(u, m), i.e., an estimate of

the probability of observing the relation that user u

watches movie m.

It should be noted that matrix completion is an ac-

tive area of research and many other matrix comple-

tion methods are applicable as well. Recommenda-

tions for users can now be based on

P(u, m).

4.3 Adding Information on the Last

Movie watched

Certainly, there is a sequential nature of the user-

watches-movie process that the model so far can-

not capture. In particular we might consider the last

movie that a user has watched as additional informa-

tion (Rendle et al., 2010). Note that we now obtain

a truly ternary relation watches(u,m,l) consisting of

user, movie and last movie l watched by the user. The

approach followed in (Rendle et al., 2010) is to con-

sider a three-way contingency table and apply tensor

factorization as a tensor smoothing approach. There

it was argued that general tensor factorization, such

as PARAFAC or Tucker (Kolda and Bader, 2009), are

too difﬁcult to apply in this situation since the con-

tingency table is very sparse and a simpliﬁed addi-

tive model is applied. In our approach we suggest

that an appropriate graphical model is shown in Fig-

ure 3 (left).

The model indicates that the last movie

Normalization takes care that all entries are non-zero

and are smaller than one. Incidentally, this step turns out to

be unnecessary in the regularized reconstruction, since af-

ter matrix completion all entries already obeyed these con-

straints. A second step ensures that the sum over matrix

entries is equal to one.

A link from the last movie to movie might appear more

plausible. If one does that change, the link between user

and movie would need to point from movie to user, such

that no collider (more than one link pointing to the same

node) appears. With a collider one would need to use a

tensor model as a local model.

GRAPHICAL MODELS FOR RELATIONS - Modeling Relational Context

117

TimeOf

Watching

Last

Figure 3: Left: As additional information, the last movie,

which the user has watched, is added. Right: The month

when the user watches the movie is added.

watched by a user directly inﬂuences the next movie

that a user watches but that given that information,

last movie and user are independent. The great advan-

tage now is that we do not need to readapt the user-

movie model but can model independently the movie-

last-movie dependency. Again we calculate empirical

probabilities based on the contingency table, smooth

the table using matrix factorization and obtain

P(m, l).

We combine both models and form

P(u, m, l) =

P(u, m)

P(m, l)

P(m)

Note that in contrast to (Rendle et al., 2010), we do

not obtain a sum of local models but a product of local

models.

4.4 Adding Time of the Event

Next we consider the instance of time when a movie

is watched t. Certainly, the preference for movies

changes in time and at certain instances in time a

movie might be very popular and then decrease in

popularity. Also, a movie can only be watched af-

ter it is released. Time of watching in units of month

is added to the model. Again we formed an empirical

estimate based on the movie-time of watching contin-

gency table. The graphical model is shownin Figure 3

(right).

We now obtain

P(u, m, l,t) =

P(u, m)

P(m, l)

P(m,t)

(

P(m))

4.5 Experimental Results

In Figures 4 and 5 we see experimental results. The

results are based on 3076 users and 9707 movies and

we considered 44 months. Before smoothing, the

user-movie matrix had 1.8% nonzero entries and the

last movie-movie matrix had 1.21% nonzero entries.

In all experiments, we display the cross-validated

(5 folds) NDCG score (Jarvelin and Kekalainen,

2000) (described in the Appendix) as a function of

the rank s of the approximation. The top plot shows

0 50 100 200 400

0.15

0.2

0.25

0.3

Number of latent variables

NDCG all

Friendship

Movie−Time(MT)

LastMovie−Movie(LM)

User−Movie(UM)

UM + LM

UM + LM + MT

Figure 4: Experiments with social networks data without

model regularization. NDCG score as a function of s, the

rank in the matrix completion.

0 50 100 200 400

0.15

0.2

0.25

0.3

Number of latent variables

NDCG all

Friendship

Movie−Time(MT)

LastMovie−Movie(LM)

User−Movie(UM)

UM + LM

UM + LM + MT

Figure 5: The same model as in Figure 4 but with regular-

ization (λ > 0).

results without regularization (λ = 0) and the bottom

shows a regularized solution. The regularizes solu-

tion shows much better performance and will now

be discussed. MT is the baseline and shows the pre-

dictive performance if movies are simply rated based

in their overall popularity in a given month. LM al-

ready shows much better performance where the pre-

diction is based on information about the last movie

watched. This model purely models the Markov prop-

erty of the event of watching movies. UM shows the

performance based on the classical user-movie model

and is better than the LM model. Thus, personaliza-

tion is more informative than sequence information.

Most interesting, by combining both sources of in-

formation, the performance is greatly improved (UM

+ LM). UM + LM + MT combines the user-movie,

movie-last move and the movie-time model, thus in-

formation about when the movie was watched was in-

KDIR 2011 - International Conference on Knowledge Discovery and Information Retrieval

118

0 50 100 200 400

0.15

0.2

0.25

0.3

Number of latent variables

NDCG all

User−Movie(UM)

UM + LM(avg)

UM + LM + MT(avg)

Figure 6: The ﬁgure shows results obtained by simply

adding the estimated probabilities for the components.

cluded. The superior performance of the combined

model clearly conﬁrms the beneﬁts of the proposed

approach.

An interesting question is how a simple averaging

of the probabilities of the individual models would

perform. Figure 6 shows that adding the individual

models also improves the performance but that the

gain is better in our approach, based on a multiplica-

tive model.

5 CONCLUSIONS

In this paper we have described a novel approach for

applying graphical models to relational domains. We

deﬁne a statistical unit, i.e., instance, by object-to-

object relationships. We applied our approach to a

social network setting and to user-item modeling and

showed that contextual information can be included

to improve prediction accuracy. The great advantage

of the approach is its modularity which permits the

modeling of domains with many variables. Note that

information such as the last movie watched by the

user would be very difﬁcult or impossible to encode

by most other relational learning approaches.

Several extension are possible. First, we used reg-

ularized matrix factorization for approximating the

local probability distributions. Any other of the avail-

able matrix completion approaches could be used as

well (Cands and Recht, 2008), in particular if the

number of objects grows beyond a few thousand. Sec-

ondly, in this paper the local models described inter-

actions between two variables. In case that local inter-

actions between more than two many-state variables

need to be modeled, one can employ tensor factoriza-

tion (Kolda and Bader, 2009) for the local models.

In terms of scalability, the limiting factor is the

matrix completion step but very fast solutions have

recently been proposed (Cands and Recht, 2008).

ACKNOWLEDGEMENTS

We acknowledge funding by the German Federal

Ministry of Economy and Technology (BMWi) under

the THESEUS project and by the EU FP 7 under the

Integrated Project LarKC.

We would like to thank Hendrik Wermser for pro-

viding us with the data set used in the experiments.

REFERENCES

Bell, R. M., Koren, Y., and Volinsky, C. (2010). All together

now: A perspective on the netﬂix prize. Chance.

Breese, J. S., Heckerman, D., and Kadie, C. (1998). Empir-

ical analysis of predictive algorithms for collaborative

ﬁltering. In Uncertainty in Artiﬁcial Intelligence.

Cands, E. J. and Recht, B. (2008). Exact matrix completion

via convex optimization. Computing Research Repos-

itory - CORR.

Chu, W. and Ghahramani, Z. (2009). Probabilistic models

for incomplete multi-dimensional arrays. In AISTATS.

Chu, W., Sindhwani, V., Ghahramani, Z., and Keerthi, S. S.

(2006). Relational learning with gaussian processes.

In NIPS.

Domingos, P. and Richardson, M. (2007). Markov logic: A

unifying framework for statistical relational learning.

In Getoor, L. and Taskar, B., editors, Introduction to

Statistical Relational Learning. MIT Press.

Getoor, L., Friedman, N., Koller, D., Pferrer, A., and Taskar,

B. (2007). Probabilistic relational models. In Getoor,

L. and Taskar, B., editors, Introduction to Statistical

Relational Learning. MIT Press.

Heckerman, D., Chickering, D. M., Meek, C., Rounthwaite,

R., and Kadie, C. M. (2000). Dependency networks

for inference, collaborative ﬁltering, and data visual-

ization. Journal of Machine Learning Research.

Hofmann, T. (1999). Probabilistic latent semantic analysis.

In Uncertainty in Artiﬁcial Intelligence (UAI).

Huang, Y., Tresp, V., Bundschus, M., Rettinger, A., and

Kriegel, H.-P. (2010). Multivariate structured predic-

tion for learning on the semantic web. In Proceed-

ings of the 20th International Conference on Inductive

Logic Programming (ILP).

Jarvelin, K. and Kekalainen, J. (2000). IR evaluation meth-

ods for retrieving highly relevant documents. In SI-

GIR’00.

Kemp, C., Tenenbaum, J. B., Grifﬁths, T. L., Yamada, T.,

and Ueda, N. (2006). Learning systems of concepts

with an inﬁnite relational model. In Poceedings of the

National Conference on Artiﬁcial Intelligence (AAAI).

GRAPHICAL MODELS FOR RELATIONS - Modeling Relational Context

119

Kolda, T. G. and Bader, B. W. (2009). Tensor decomposi-

tions and applications. SIAM Review.

Koller, D. and Pfeffer, A. (1998). Probabilistic frame-based

systems. In Proceedings of the National Conference

on Artiﬁcial Intelligence (AAAI).

Lauritzen, S. L. (1996). Graphical Models. Oxford Statis-

tical Science Series.

Rendle, S., Freudenthaler, C., and Schmidt-Thieme, L.

(2010). Factorizing personalized markov chains for

next-basket recommendation. In World Wide Web

Conference.

Salakhutdinov, R. and Mnih, A. (2007). Probabilistic matrix

factorization. In NIPS.

Takacs, G., Pilaszy, I., Nemeth, B., and Tikk, D. (2007). On

the gravity recommendation system. In Proceedings

of KDD Cup and Workshop 2007.

Taskar, B., Abbeel, P., and Koller, D. (2002). Discrimina-

tive probabilistic models for relational data. In Uncer-

tainty in Artiﬁcial Intelligence (UAI).

Wermser, H., Rettinger, A., and Tresp, V. (2011). Modeling

and learning context-aware recommendation scenar-

ios using tensor decomposition. In Proc. of the Inter-

national Conference on Advances in Social Networks

Analysis and Mining.

Xu, Z., Tresp, V., Yu, K., and Kriegel, H.-P. (2006). Inﬁnite

hidden relational models. In Uncertainty in Artiﬁcial

Intelligence (UAI).

Yu, K., Chu, W., Yu, S., Tresp, V., and Xu, Z. (2006).

Stochastic relational models for discriminative link

prediction. In Advances in Neural Information Pro-

cessing Systems 19.

APPENDIX

Details on the NDCG Score

We use the normalized discounted cumulative gain

(NDCG) to evaluate a predicted ranking. NDCG is

calculated by summing over all the gains in the rank

list R with a log discount factor as

NDCG(R) =

∑

r(k)

− 1

log(1+ k)

where r(k) denotes the target label for the k-th ranked

item in R, and r is chosen such that a perfect rank-

ing obtains value 1. To focus more on the top-ranked

items, we also consider the NDCG@n which only

counts the top n items in the rank list. These scores

are averaged over all ranking lists for comparison.

KDIR 2011 - International Conference on Knowledge Discovery and Information Retrieval

120