Real-time Recommendation System for Stock Investment Decisions
Artur Bugaj and Weronika T. Adrian
a
AGH University of Science and Technology, al. A. Mickiewicza 30, 30-059 Krakow, Poland
Keywords:
Recommendation Systems, Decision Making, Collaborative Filtering, Prediction of User Behavior Patterns,
Recommendation Engine, Real-time, Knowledge Graphs, Semantic Processing.
Abstract:
Recommendation systems have become omnipresent, helping people making decisions in various areas. While
most of the systems can give accurate recommendations, their learning procedures can be time-consuming. In
some cases, this is not permissible; for example when the information about the items and users changes
very fast in time. In this paper, we discuss a new recommendation engine, based on labelled property graph
knowledge representation and attributed network embeddings, which calculates real-time recommendations
for stock investment decisions. In particular, we demonstrate an application of the DANE (dynamic attributed
network embedding) framework proposed by Li et al. and show the promising results of the system.
1 INTRODUCTION
Making choices among available options – regardless
if we look for an interesting book or want to make
an investment decision is significantly determined
by the context in which we operate: the possibilities
we have, our interests, and the previous decisions we
made. Sometimes, our own history of choices may be
limiting, so recommendations based on other people
behaviour and outcomes may be a valuable option that
widens the horizons of our analysis. A particular de-
cision setting is a situation in which we have to make
decisions quicker than usual, and at the same time,
we do not have enough resources to determine what
will be the optimal choice, or if the choice we claim
the best will be optimal in the near future. To obtain
a fast and relatively accurate recommendations, we
need a system that adapts quickly to changing data.
Recommendation systems (Bouraga et al., 2014)
have been around for a few decades, but the rapid
growth of the information stored on the Web, sparked
a renewed interest in their improvements. Ever-
changing data to be analyzed poses a challenge for
real-time systems that must balance the accuracy of
recommendations with the speed of response. In the
field of stock market, prices and forecasts can change
daily and the investors’ activity is dynamic in its na-
ture they buy or sell stocks or even change fields
of interest. It is thus important to recognize which
and how these factors should be taken into consid-
a
https://orcid.org/0000-0002-1860-6989
eration to deliver fast and accurate recommendation
for a user that wants to make a good investment deci-
sion (Hern
´
andez-Nieves et al., 2020).
In this paper, we propose a recommendation en-
gine for stock market that integrates semantic similar-
ity assessment among investors, to incorporate their
behaviour and areas of interest, with a technical anal-
ysis of the companies considered for investment. By
bringing in the benefits of collaborative filtering and
content-based recommendation, we improve the ac-
curacy of the system’s suggestions. This paper is or-
ganized as follows: in Section 2, we present the ba-
sic concepts and background, Section 3 describes the
proposal and recent results, and we conclude the pa-
per in Section 4.
2 PRELIMINARIES
Recommendation systems can be generally catego-
rized into collaborative filtering (CF)-based (ones that
take into consideration similarities among the sys-
tem’s users), content-based (that rely on considered
items’ attributes) and hybrid that aim to bring in to-
gether the advantages of both approaches and min-
imize their limitations. In recent years, knowledge
graphs have been widely adopted as a side informa-
tion resource for recommendation engines, as they on
the one hand minimize the threat of a “cold start” in
the systems, and on the other – provide a useful mean
for recommendation explanations (for a detailed sur-
490
Bugaj, A. and Adrian, W.
Real-time Recommendation System for Stock Investment Decisions.
DOI: 10.5220/0010714900003058
In Proceedings of the 17th International Conference on Web Information Systems and Technologies (WEBIST 2021), pages 490-493
ISBN: 978-989-758-536-4; ISSN: 2184-3252
Copyright
c
2021 by SCITEPRESS Science and Technology Publications, Lda. All rights reser ved
vey see (Guo et al., 2020; Liu and Duan, 2021)).
Graph representations, although intuitive, expres-
sive and flexible, do not constitute a good input to
various data mining tasks, such as node classifica-
tion, link prediction or community detection. To this
end, various methods for learning numerical represen-
tations of nodes, edges and subgraphs have been put
forward (Perozzi et al., 2014; Tang et al., 2015). The
assumptions of the underlying graphs differ: some of
these embedding methods are only suitable for ho-
mogeneous networks (i.e., graphs with nodes “of the
same kind”), other allow for various node types, but
only one sort of edges etc. Most of the methods as-
sume a static structure to be learned and only some
of the recent proposals have considered dynamically
changing networks. In particular, Li et al. (Li et al.,
2017) have proposed a framework for dynamic at-
tributed networks embedding that addresses both the
rich set of attributes being associated with nodes in a
graph and a dynamic nature of it (addition deletion of
nodes and edges).
3 SYSTEM PROPOSAL AND
IMPLEMENTATION
Our proposal utilizes the attributed network embed-
ding concept described in (Li et al., 2017). We model
the domain of investors and their stock decision with
a labelled property graph that is later appropriately
queried for a given user, and the obtained subgraph is
embedded onto a low-dimensional vector space with
the so-called consensus embedding. Once we obtain
the suggestions for potential companies to invest in,
we incorporate the technical analysis of the candidate
stocks to produce the final recommendations in near
real-time. The system consist of two modules:
recommendation-engine: a module that calcu-
lates recommendations based on users similarity
(common relationships and attributes), and
data-provider: a module that calculates current
stock movement score (i.e., if a price is predicted
to grow, to fall, or to stay the same).
Investors and their data, gathered from Investor
Hunt
1
and Nasdaq
2
services, are stored in a graph
database (see Fig. 1). The model contains four types
of nodes:
1. Investors,
2. Categories,
1
See https://investorhunt.co/.
2
See https://www.nasdaq.com/.
3. Industries,
4. Stocks,
linked with the following relationships:
1. (Investor)-[:POSSESS]-(Stock),
2. (Investor)-[:INTERESTED]-(Category),
3. (Category)-[:INCLUDES]-(Industry),
4. (Industry)-[:COMPANY]-(Companies),
The recommendation system works in three consecu-
tive phases that are:
1. Candidate Selection, in which the system fetches
a subgraph related to the target user from the
database. It ensures that further processing is
done on most similar investors.
2. Scoring: similarity matrices for nodes in the
fetched subgraph are created, in order to create
their embeddings and to find correlations between
them (here goes the actual implementation of con-
sensus embedding)
3. Re-ranking: after computing the recommenda-
tions, we add scores dependent on other factors,
which are important, but could not be retrieved
with previous steps.
In the following subsection, we discuss the details of
the process.
3.1 Candidate Selection
The first phase of the process is finding the most sim-
ilar investors according to the graph database struc-
ture. Consequently, we reduce the number of similar
investors and stocks significantly, using only a single
query that reduces searching to subgraph of related
nodes. Such a subgraph is shown on Figure 1. It is a
result of a parametrized query, where the parameter is
a target investor ID. We expect most similar investors
as those, which we can find as having some common
companies as well as some common interests.
We fetch only the necessary data, which optimizes
the performance of the database; the common cat-
egories with the target investor and the stocks they
have. To simplify a whole process, graph data is
“mapped” into set of records, which are sorted by
amount of common companies. We can fetch all in-
vestors, or limit a number of retrieved record to some
constant number. It can potentially reduce some valu-
able data, but taking into account further calculations,
we get much more speed by sacrificing only a little
of data (in Subsec. 3.2, we describe a part of the rec-
ommendation engine algorithm, where we create sim-
ilarity matrices, calculation of which is O(n
2
)).
Real-time Recommendation System for Stock Investment Decisions
491
Figure 1: Subgraph selected in Candidate selection phase.
3.2 Scoring
After selecting a subset of the most similar investors,
we prepare the data to be used in further recommen-
dations, which are done according to the method pro-
posed in (Li et al., 2017). We take into account the
data that we have fetched in the previous step, that
is, common categories of the stocks that investors are
interested in and the companies they currently have.
We create similarity matrices for them. Since data we
have are actually collections, we can calculate simi-
larity of collections (sets) with Jaccobi similarity met-
ric. In our case, where stocks and categories are rep-
resented as nodes, this is the best way to map them
into matrices. Let us denote similarity matrices for
categories and stocks as C
n×n
and S
n×n
. Let us also
denote D
C
, D
S
and L
C
, L
S
as diagonal and Laplacian
matrices created from C and S, respectively. We cal-
culate D
C
(and D
S
) as
D
C
=
n
i=1
M
C
(1,i) ··· 0
.
.
.
.
.
.
.
.
.
0 ···
n
i=1
M
C
(n,i)
0
and L
C
(and L
S
) as
L
C
= D
C
M
C
Then, we normalize Laplacian matrix, from which
we will calculate eigen decomposition problem.
L
Anorm
=
n
i=1
A(1,i)A(1,1)
n
i=1
A(1,i)
···
A(1,n)
n
i=1
A(1,i)
.
.
.
.
.
.
.
.
.
A(n,1)
n
i=1
A(n,i)
···
n
i=1
A(n,i)A(n,n)
n
i=1
A(n,i)
(1)
Then, we make other calculations of Offline
Model of DANE (Li et al., 2017), which is the eigen
decomposition problem mentioned earlier, and inter-
mediate embeddings, which we denote as Y
C
(for cat-
egories) and Y
S
(for stocks):
L
Anorm
v = λv
Y
C
,Y
S
= [c
2
,...,c
k
,c
k+1
],[s
2
,...,s
k
,s
k+1
]
(2)
Result of Y
C
and Y
S
is a [c
2
,...,c
k
] and [s
2
,...,s
k
],
where c
k
and s
k
is a vector representing similarity
between target investor, and k
th
investor according
to companies, and categories, repectively. As those
vectors have been obtained with eigen decomposi-
tion, they are eigenvectors of normalized Laplacian
matrix. After eigen decomposition and getting top-
k eigen vectors we maximize correlation between in-
termediate embeddings Y
C
and Y
S
, and therefore we
solve eigen decomposition problem for:
Y
C
Y
0
C
Y
C
Y
0
S
Y
S
Y
0
C
Y
S
Y
0
S
p
C
p
S
= γ
Y
C
Y
0
C
Y
C
Y
0
S
Y
S
Y
0
C
Y
S
Y
0
S
p
C
p
S
(3)
We obtain the final result with
Y =
Y
C
,Y
S
× P (4)
where P = [p
C
; p
S
] is a top l eigenvectors from Eq. 3.
Once we have calculated the most similar investors,
we can check which companies could be potentially
most interesting for target investor. Let us denote the
similarity of investor i as s
i
and investor is stocks set
as I
i
= {s
1
,s
2
,...,s
l
}. Then:
r
investor
=
n
i=1
(
0 s / I
i
s
i
s I
i
(5)
where r
investor
is a similarity result for stock basing on
investors’ similarity.
3.3 Re-ranking
After calculation of the most similar stocks accord-
ing to similarity of investors (r
investors
), we re-rank
those stocks, according to the price forecasts. The
price forecasts are calculated in another module, in
which we predict the price movements with technical
analysis. Let us denote this as r
f orecast
. We calcu-
late r
f orecast
based on several technical analysis algo-
rithms of current stock movement, such as: Detection
of support/resistance of price, RSI, and OBV. Such a
calculation allows us to estimate (with some proba-
bility) future movement of specified stock price. We
want to recommend stocks which can grow in near
future, and advice against those, whose price can fall.
Therefore, the final recommendation is calculated as:
r
stock
= r
investors
+ r
f orecast
(6)
In this way, target user can get recommendations not
necessarily from their target of interest, but they can
see more stocks, which can lead to profits.
WEBIST 2021 - 17th International Conference on Web Information Systems and Technologies
492
3.4 Update
In (Li et al., 2017), only eigenvalues and correspond-
ing eigenvectors are updated, which increases the ef-
ficiency of the solution. This is possible, when we
take into account whole knowledge graph. We are
guaranteed, that indexes of columns on intermediate
and consensus embedding will not change. However,
with such an approach, the first step will be incredibly
slow, as creation of similarity matrix, and eigen deco-
mopsition are computationally complex calculations.
As we mentioned in Sect. 3.1, we fetch a sub-
graph. If we assume, that relations between nodes
changes in time, we can get different set of investors,
in different order (a record mentioned in 3.1 refers to
one investor’s data), what implies the fact, that the
similarity matrices can have different sizes, and the
calculated properties can be related to different nodes
in consecutive iterations. Therefore, we cannot make
any updates, as it is proposed in (Li et al., 2017). Due
to this setting, we have to repeat the whole procedure
of calculation of eigendecomposition per each itera-
tion, however, it is still a fast computation.
3.5 Results and Discussion
We have evaluated our proposal on the data scrapped
from Investor Hunt and Nasdaq services: in total,
1368 investors with their business categories interests,
amount of investments and average transaction value,
and 545 companies. As an evaluation criterion, we
state that a recommended stock have to occur mini-
mum 6 times in top 40% most similar investors. Low-
ering the relevancy criterion would decrease this ac-
cordingly. We compared our results to ones obtained
with alternative embedding algorithms, such as Deep-
Walk (Perozzi et al., 2014) and LINE (Tang et al.,
2015). The results are shown in Tables 1 and 2. They
could be better, if we consider whole graph in cal-
culations instead of subgraph, but it would affect the
response speed simultaneously.
Table 1: Comparison of precision@k.
Top@10 Top@25 Top@50
DANE 63,33% 52,67% 31,67%
DeepWalk 25% 24% 21%
LINE 30% 32% 19%
Table 2: Comparison of recall@k.
Top@10 Top@25 Top@50
DANE 63,33% 85,47% 97,9%
DeepWalk 25% 27,67% 49, 17%
LINE 30% 58,88% 71,67%
4 CONCLUSION
In this paper, we addressed the challenge of real-time
recommendation for stock market. We discussed a
new system for investment recommendation based on
attributed network embeddings and technical analy-
sis of stocks. Our recommendation engine provides
fast and robust computation of recommendations for
the investment decisions, based on joint analysis of
similarity of investors and stock predictions. The re-
sults obtained so far are promising and motivating for
further exploration and a broader evaluation of the ap-
proach, covering also the technical analysis.
REFERENCES
Bouraga, S., Jureta, I., Faulkner, S., and Herssens, C.
(2014). Knowledge-based recommendation systems:
A survey. International Journal of Intelligent Infor-
mation Technologies (IJIIT), 10(2):1–19.
Guo, Q., Zhuang, F., Qin, C., Zhu, H., Xie, X., Xiong, H.,
and He, Q. (2020). A survey on knowledge graph-
based recommender systems. IEEE Transactions on
Knowledge and Data Engineering.
Hern
´
andez-Nieves, E., del Canto,
´
A. B., Chamoso-Santos,
P., de la Prieta-Pintado, F., and Corchado-Rodr
´
ıguez,
J. M. (2020). A machine learning platform for stock
investment recommendation systems. In International
Symposium on Distributed Computing and Artificial
Intelligence, pages 303–313. Springer.
Li, J., Dani, H., Hu, X., Tang, J., Chang, Y., and Liu, H.
(2017). Attributed network embedding for learning in
a dynamic environment. In Proceedings of the 2017
ACM on Conference on Information and Knowledge
Management, pages 387–396.
Liu, J. and Duan, L. (2021). A survey on knowledge
graph-based recommender systems. In 2021 IEEE
5th Advanced Information Technology, Electronic and
Automation Control Conference (IAEAC), volume 5,
pages 2450–2453.
Perozzi, B., Al-Rfou, R., and Skiena, S. (2014). Deepwalk:
Online learning of social representations. In Proceed-
ings of the 20th ACM SIGKDD international confer-
ence on Knowledge discovery and data mining, pages
701–710.
Tang, J., Qu, M., Wang, M., Zhang, M., Yan, J., and Mei,
Q. (2015). Line: Large-scale information network em-
bedding. In Proceedings of the 24th international con-
ference on world wide web, pages 1067–1077.
Real-time Recommendation System for Stock Investment Decisions
493