Graph-based Rating Prediction Using Eigenvector Centrality

Dmitry Dolgikh and Ivan Jel

ınek

Department of Computer Science and Engineering, Faculty of Electrical Engineering,

Czech Technical University in Prague, Prague, Czech Republic

Keywords:

Recommendation Systems, Graph-based Recommendations, User Preference, Social Network Analysis,

Eigenvector Centrality.

Abstract:

The most of recommendation systems rely on the statistical correlations of the past explicitly given user rating

for items (e.g. collaborative ﬁltering). However, in conditions of insufﬁcient data of past rating activities,

these systems are facing difﬁculties in rating prediction, this situation is commonly known as the cold-start

problem. This paper describes how graph-based represendation and Social Network Analysis can be used to

help dealing with cold-start problem. We proposed a method to predict user rating based on the hypotesis

that the rating of the node in the network corresponded to the rating of the most important nodes which are

connected to it. The proposed method has been particularly applied to three MovieLens datasets to evaluate

rating predition performance. Obtained results showed competitiveness of our method.

1 INTRODUCTION AND

RELATED WORK

Rating prediction is an important and popular re-

search topic in the area of recommendation systems.

In a typical Collaborative ﬁltering system user assigns

ratings to already known items, so the system can

predict ratings of unknown items based on the rat-

ings that other similar users assigned to these items

(Celma, 2010). Similarity among the users can be

represented by the similarity of rows in user-item ma-

trix where each row represents the rating given by

user to the particular item (i.e. who rated the same

items with similar scores). The most highly rated un-

known items commonly are presented as recommen-

dations to the user.

Despite the fact that Collaborative ﬁltering is one

of the most widely used recommendation methods it

has several issues, such as the complexity of the ﬁnd-

ing the similar users in case of sparse user-item ma-

trix (i.e. cold-start problem) and subjectivity and am-

biguity of explicitly given rating (Ricci et al., 2011).

Moreover, according to (Cena et al., 2010) and (Gena

et al., 2011), users have different preferences with re-

spect to the rating scales to use for the object they are

evaluating i.e. rating scale are differently perceived

by users.

Recommendation systems collect data for a sig-

niﬁcant period of time and emotional state of the user

can vary greatly during this period, causing variation

of the rating. As a consequence, the search for similar

users may give false positive results.

Content-based ﬁltering approach solves some of

the issues of the Collaborative ﬁltering by collecting

information describing items. This approach relies on

the similarity of items, which is deﬁned as objective

distance between items (e.g. the similarity of prop-

erties), without any inﬂuence by subjective factors.

Items which are similar to the preferences given by

user are typically presented as recommendations. Ob-

vious problem of this approach is the need to extract,

maintain and store a lot of relevant and reliable infor-

mation about the items that can be a challenging task

depending on the domain complexity. For instance,

due to the vastness of the multimedia ﬁeld, the user

is often unable to describe his preferences correctly

and fully. In addition, in a real world his preferences

at the speciﬁc moment of time may be inﬂuenced by

the other unknown factors such as various psycholog-

ical effects or his friends’ opinions. Besides, content-

based ﬁltering approach is limited by the type of con-

tent that can be recommended in that way.

Towards improving the results of recommenda-

tions, many researchers use so-called Hybrid methods

to ﬁnd the best combination of the abovementioned

methods (Burke, 2002).

Nowadays, usage of graph-based structures and

application graph theory to extract additional hid-

228

Dolgikh, D. and Jelínek, I.

Graph-based Rating Prediction using Eigenvector Centrality.

DOI: 10.5220/0006044902280233

In Proceedings of the 8th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K 2016) - Volume 1: KDIR, pages 228-233

ISBN: 978-989-758-203-5

 2016 by SCITEPRESS – Science and Technology Publications, Lda. All rights reser ved

den features is gaining popularity (Nu

nez-Gonzalez

and Gra

na, 2015; Tiroshi et al., 2014; Zeng et al.,

2013). Graph-based representation allows to visu-

alise and deeply understand semantic relations among

items and users. Introduced in the beginning of the

20th century by sociologists, Social Network Analy-

sis (SNA)(Scott, 2012) became widespread in recent

years due to the advent of the social network websites

such as Facebook and Twitter.

For example, the effective identiﬁcation of impor-

tant nodes and, in general, discovering of hidden net-

work organization structure is a challenging task for

current large-scale social networks (Sun et al., 2014).

Like the most powerful people in the community, im-

portant nodes inﬂuence not only the nodes directly

connected to them, but also the whole network. User’s

interests also have heterogeneous structure and they

can be represented by the network of connected nodes

(interests) which have varying degrees of importance.

Typical for the Collaborative ﬁltering methods user-

items relationships can be easily represented as a bi-

partite graph and applying SNA to this network struc-

ture might discover some hidden features. Obtained

additional information is capable to help to better un-

derstand the data that can be useful to improve the

results of recommendation process.

The advantages of usage of SNA motivated us

to examine rating prediction process from the other

side. We have used the network structure to predict

the user’s given rating based only on user’s personal

preferences and his previous feedback. The proposed

method has no limitations concerning the type of rec-

ommended items, which is typical for content-based

systems, and also does not rely on the other users’

feedback, as collaborative ﬁltering algorithm does.

This approach adapts to the user’s behaviour and al-

lows us to calculate recommendations independently

for each user.

Summary of Contributions

The main contributions of the paper are the follow-

ings:

• Novel graph-based rating prediction method that

is based on eigenvector centrality.

• Rating impact measuring process, which helps to

deal with the variation of rating.

• Experiments on three well-known MovieLens

datasets to show the comparison results in dif-

ferent data sparseness and measuring its per-

formance with respect to well-known MAE and

RMSE metrics.

factorfactor

itemitem

factorfactor

itemitem

factorfactor

itemitem

factorfactor

itemitem

Figure 1: The example of created undirected bipartite item-

factor graph.

2 METHODOLOGY

Our research was based on the hypothesis that the rat-

ing of the node in the network corresponds to the rat-

ing of the most important nodes which are connected

to it.

We have focused on the item-item relationships

inside users interests. This examination allowed us

to look deeper into the preferences of each individ-

ual user, by identifying the underlying factors that in-

ﬂuenced his choice. The item-item relationships are

of little use to ﬁnd hidden patterns, because they de-

scribe only the existence of a link (similarity) between

objects, but do not explain the reason for this connec-

tion. To compensate it, we needed to add the interme-

diate objects in the relationships between items - the

factors.

The factor is any object semantically deﬁning

the existence of connection between items. In other

words, if there is a common factor between two or

more items, these items are semantically connected.

Genres (categories) of music, ﬁlms or literature may

be considered as the representative examples of fac-

tors.

As a result, we have created undirected bipartite

graph G = (I, F, E) where I is a set of items, F is a

set of factors and E denoting the edges of the graph

(Figure 1).

Following our aforementioned hypothesis, we

require to determine the importance of each node in

the network, before we can predict the rating. For

this purpose, we have used the SNA eigenvector

centrality measure. Based on the idea that the node

importance in a network is increased by having con-

nections to other nodes that are themselves important,

eigenvector centrality computes the centrality of a

node based on the centrality of its neighbours. Hence,

the value of eigenvector centrality can be large not

Graph-based Rating Prediction using Eigenvector Centrality

229

0.40.4

0.5890.589

0.5070.507

1.01.0

0.40.4

0.3680.368

0.6080.608

0.580.58

0.2160.216

0.7280.728

0.360.36

Figure 2: Nodes are ranked, based on the normalized value

of eigenvector centrality.

only because a node has many neighbours, but also

if it has important neighbours (or in both cases).

Deﬁnition 1 (Eigenvector Centrality). For given

graph G the eigenvector centrality c(i) of node i is

proportional to the sum of the centralities of i’s neigh-

bours:

c(i) =

∑

t∈G

c(t) (1)

where A

is an element of adjacency matrix (New-

man, 2010). This measure was ﬁrstly proposed by

Bonacich in 1987 (Bonacich, 1987). Figure 2 demon-

strates the example of graph, where nodes are ranked,

based on the value of centrality.

In the most of the practical cases a rating scale is

discrete, not continuous. As we mentioned before,

user’s choice can be inﬂuenced by the unknown

factors, so rating scale can be differently interpreted

depending on user’s current emotional state. As a

result, we are dealing with atypical rating for this

user. We elaborated rating impact measure using

eigenvector centrality to deal with outliers. This

measure allowed us to correctly approximate the

rating of node, based on the rating of its neighbours.

Deﬁnition 2 (Rating Impact). Let R be predetermined

numerical set of possible ratings (e.g. from 1 to 5),

predicted rating r

of the node i must be an element of

predetermined set R. Based on the eigenvector cen-

trality we can determine the impact of each possible

rating in the subset of node neighbours.

A rating impact I

of rating r ∈ R can be measured

by:

∑

i∈N

c(i) (2)

where N is a subset of node neighbours which

have rated by rating r and c(i) is a value of eigen-

vector centrality of node i ∈ N.

High-impact rating among node neighbours can

be used as the predicted rating of node. In other

words, neighbour nodes are ”voting” for the best rat-

ing according to their importance. Similarly to eigen-

vector centrality, rating of node can be chosen not

only because node has a lot of neighbours rated by

some rating, but also if it has important neighbours.

3 EXPERIMENTAL EVALUATION

3.1 Datasets

To implement and evaluate the proposed algorithm,

we have exploited three datasets provided by Grou-

pLens Research group at University of Minnesota

show the comparison results in different data sparse-

ness. The ﬁrst dataset is MovieLens 1M dataset,

which contains 1,000,209 anonymous ratings of ap-

proximately 3,900 movies made by 6,040 users, rat-

ings are made on a 1-to-5-star scale. The sec-

ond dataset is MovieLens 10M dataset, which con-

tains 10,000,054 ratings applied to 10,681 movies by

71,567 users, ratings are made on a 5-star scale, with

half-star increments (from 0.5 to 5). The last dataset is

MovieLens 20M dataset, which contains 20,000,263

ratings of 27,278 movies made by 138,493 users, rat-

ings are made on a 5-star scale, with half-star incre-

ments (from 0.5 to 5) (Harper and Konstan, 2015).

All of them provided genres and year of release as

side information for each movie.

3.2 Evaluation Procedure

From the abovementioned datasets we produced small

datasets consisted of ratings, given by randomly se-

lected 1000 users. Each user’s subset of ratings was

randomly 80%-20% split to train-test subsubsets of

the ratings data.

For our experiment, we developed the prediction

script in Python programming language. For graph

constructing, analysis and eigenvector centrality cal-

culation we used NetworkX library (Hagberg et al.,

2008).

Afterwards, for each user, prediction script con-

structed undirected bipartite (movie-factor) graph us-

ing training set and conducted calculating eigenvector

centrality for each node in this graph. Genres and year

of release were used as the factors.

At the initial stage all factor nodes are not rated.

Obtained values of centrality were used by the script

to predict rating for each factor node in the graph

http://grouplens.org/

KDIR 2016 - 8th International Conference on Knowledge Discovery and Information Retrieval

230

factorfactor

moviemovie

factorfactor

moviemovie

factorfactor

moviemovie

factorfactor

moviemovie

NewNew

Figure 3: Rating prediction.

based on the rating impact measure proposed in sec-

tion 2. After this step, all nodes in the graph were

rated.

At the next step, to predict the rating of movies

in the test set, our script created nodes for each of

them and connected them to their factor nodes if they

existed in the graph before (Figure 3). Thereafter the

script performed the similar rating prediction process

for new nodes based on the rating of factor nodes to

which they were connected.

We repeated this process ﬁve times to reduce the

variance caused by the random split selection.

3.3 Evaluation Metrics

In order to measure the accuracy of the system pre-

dicted ratings Mean Absolute Error (MAE) and the

RMSE (Root Mean Square Error) were used. They

are well-known metrics widely used for the evalua-

tion of the accuracy of the predicted ratings (Sotelo

and Juayek, 2015; Barjasteh et al., 2015; Gupta and

Nagpal, 2015; Shi et al., 2015). MAE is deﬁned as

MAE

∑

i∈U

test

|ˆr

− r

test

(3)

where U

test

is the test set of user’s ratings, ˆr

and r

are

predicted and actual rating value respectively.

RMSE gives more weight to larger prediction er-

rors and is deﬁned as

RMSE

∑

i∈U

test

(ˆr

− r

)

test

(4)

For each metric, the results were also averaged

over all users.

3.4 Results and Discussion

To reduce the variance, the results were averaged over

5 validation folds. Table 1 summarizes our results. A

smaller value of metrics indicates better accuracy.

As can be seen from the results, our method

showed better performance for the MovieLens 20M

dataset. This is caused by the fact that this dataset

not only contains more ratings data, but also a greater

amount of side information about the movies. For

example, the average number of genres per movie

has increased from 1.65 to 2 in comparison with 1M

dataset. The total number of movies has also in-

creased, enabling users to evaluate their preferences

in more detail. The results for datasets 10M and 20M

do not differ so much due to the fact that they contain

approximately the same amount of side information.

However, based on the fact that the number of

users and ratings increased nearly two times, it can

be assumed, that the proposed method shows stable

results despite variability of users in the dataset.

The results indicate that this method is at the

level of the performance of some recently introduced

semantic-based and social-based methods (Qian et al.,

2014; Sotelo and Juayek, 2015), but without requiring

a large volume of metadata. Another beneﬁt of our

method is the possibility to calculate recommenda-

tions for each user independently, which can be useful

in the case of parallel computations or in case of ab-

sence data about other users (e.g. new e-commerce

website).

Graph-based Rating Prediction using Eigenvector Centrality

231

Table 1: The results of experiments using MovieLens dataset. A smaller value of metrics indicates better accuracy.

MovieLens 1M MovieLens 10M MovieLens 20M

MAE RMSE MAE RMSE MAE RMSE

1 0.8598 1.1719 0.8334 1.1207 0.8189 1.1026

2 0.8558 1.1719 0.8315 1.1198 0.8139 1.0916

3 0.8543 1.1568 0.8184 1.1068 0.8392 1.1169

4 0.8471 1.1601 0.8184 1.1064 0.8082 1.0891

5 0.8762 1.1504 0.8218 1.1114 0,8096 1.1002

Avg. 0.8586 1.1845 0.8250 1.1130 0.8180 1.1001

4 CONCLUSION AND FUTURE

WORK

We have introduced a graph-based rating prediction

method that is based on eigenvector centrality. The

advantages of usage of eigenvector centrality let us

create rating prediction method using items metadata

as well as explicitly given rating. We also proposed

rating impact measuring process to compensate the

variation of rating, considering the individual rating

behaviour of the particular user.

The proposed method can be useful to predict rat-

ing in conditions of insufﬁcient data of past rating ac-

tivities of other users that help dealing with the cold-

start problem. This method can also be used in com-

bination with Collaborative Filtering approaches to

predict rating for untypical users.

We conducted our experiments on three well-

known MovieLens datasets from the movie domain

to evaluate the proposed method and to compare the

results with existing methods. The results indicated

that this method was at the level of the performance

of the recently introduced content-based methods, but

without requiring a large volume of metadata.

Furthermore, the presented method provides the

the ability to calculate the recommendations for each

user separately.

An important future work direction is investigat-

ing of alternative ways to adaptive recommendation

process to the user’s behaviour. We are also inter-

ested in exploiting additional data sources for online

recommendations, for example Linked Open Data.

ACKNOWLEDGEMENTS

This work was supported by the Grant Agency of

the Czech Technical University in Prague, grant No.

SGS16/163/OHK3/2T/13.

REFERENCES

Barjasteh, I., Forsati, R., Masrour, F., Esfahanian, A.-H.,

and Radha, H. (2015). Cold-start item and user rec-

ommendation with decoupled completion and trans-

duction. In Proceedings of the 9th ACM Conference

on Recommender Systems, RecSys ’15, pages 91–98,

New York, NY, USA. ACM.

Bonacich, P. (1987). Power and centrality: A family of mea-

sures. American Journal of Sociology, 92(5):1170–

1182.

Burke, R. (2002). Hybrid recommender systems: Survey

and experiments. User modeling and user-adapted in-

teraction, 12(4):331–370.

Celma, O. (2010). Music recommendation and discovery:

The long tail, long fail, and long play in the digital

music space. Springer Science & Business Media.

Cena, F., Vernero, F., and Gena, C. (2010). User Model-

ing, Adaptation, and Personalization: 18th Interna-

tional Conference, UMAP 2010, Big Island, HI, USA,

June 20-24, 2010. Proceedings, chapter Towards a

Customization of Rating Scales in Adaptive Systems,

pages 369–374. Springer Berlin Heidelberg, Berlin,

Heidelberg.

Gena, C., Brogi, R., Cena, F., and Vernero, F. (2011). User

Modeling, Adaption and Personalization: 19th In-

ternational Conference, UMAP 2011, Girona, Spain,

July 11-15, 2011. Proceedings, chapter The Impact of

Rating Scales on User’s Rating Behavior, pages 123–

134. Springer Berlin Heidelberg, Berlin, Heidelberg.

Gupta, S. and Nagpal, S. (2015). An empirical analysis

of implicit trust metrics in recommender systems. In

KDIR 2016 - 8th International Conference on Knowledge Discovery and Information Retrieval

232

Advances in Computing, Communications and Infor-

matics (ICACCI), 2015 International Conference on,

pages 636–639.

Hagberg, A. A., Schult, D. A., and Swart, P. J. (2008).

Exploring network structure, dynamics, and function

using NetworkX. In Proceedings of the 7th Python

in Science Conference (SciPy2008), pages 11–15,

Pasadena, CA USA.

Harper, F. M. and Konstan, J. A. (2015). The movielens

datasets: History and context. ACM Trans. Interact.

Intell. Syst., 5(4):19:1–19:19.

Newman, M. (2010). Networks: An Introduction. Oxford

University Press, Inc., New York, NY, USA.

nez-Gonzalez, J. D. and Gra

na, M. (2015). Bioin-

spired Computation in Artiﬁcial Systems: Interna-

tional Work-Conference on the Interplay Between

Natural and Artiﬁcial Computation, IWINAC 2015,

Elche, Spain, June 1-5, 2015, Proceedings, Part II,

chapter Graph-Based Learning on Sparse Data for

Recommendation Systems in Social Networks, pages

61–68. Springer International Publishing, Cham.

Qian, X., Feng, H., Zhao, G., and Mei, T. (2014). Person-

alized recommendation combining user interest and

social circle. IEEE Transactions on Knowledge and

Data Engineering, 26(7):1763–1777.

Ricci, F., Rokach, L., Shapira, B., and Kantor, P. B. (2011).

Recommender systems handbook, volume 1. Springer.

Scott, J. (2012). Social network analysis. Sage.

Shi, C., Zhang, Z., Luo, P., Yu, P. S., Yue, Y., and Wu, B.

(2015). Semantic path based personalized recommen-

dation on weighted heterogeneous information net-

works. In Proceedings of the 24th ACM International

on Conference on Information and Knowledge Man-

agement, CIKM ’15, pages 453–462, New York, NY,

USA. ACM.

Sotelo, R. and Juayek, M. (2015). Incidence of speciﬁc se-

mantic characteristics on the performance of recom-

mender systems of audiovisual content. In Broadband

Multimedia Systems and Broadcasting (BMSB), 2015

IEEE International Symposium on, pages 1–4.

Sun, L., Ge, H., and Guo, X. (2014). An algorithm with

user ranking for measuring and discovering important

nodes in social networks. In Biomedical Engineering

and Informatics (BMEI), 2014 7th International Con-

ference on, pages 945–949.

Tiroshi, A., Berkovsky, S., Kaafar, M. A., Vallet, D., Chen,

T., and Kuﬂik, T. (2014). Improving business rat-

ing predictions using graph based features. In Pro-

ceedings of the 19th International Conference on In-

telligent User Interfaces, IUI ’14, pages 17–26, New

York, NY, USA. ACM.

Zeng, A., Gualdi, S., Medo, M., and Zhang, Y.-C. (2013).

Trend prediction in temporal bipartite networks: the

case of movielens, netﬂix, and digg. Advances in

Complex Systems, 16(04n05):1350024.

Graph-based Rating Prediction using Eigenvector Centrality

233