Automatic Construction of Benchmarks for RDF Keyword Search
Systems Evaluation
Angelo Batista Neves
1 a
, Luiz Andr
´
e P. Paes Leme
2 b
, Yenier Torres Izquierdo
1 c
and Marco Antonio Casanova
1 d
1
Pontifical Catholic University of Rio de Janeiro, Rio de Janeiro, RJ, Brazil
2
Universidade Federal Fluminense, Niter
´
oi, RJ, Brazil
Keywords:
Benchmark, Keyword Search, Resource Description Framework (RDF), Offline, Computation.
Abstract:
Keyword search systems provide users with a friendly alternative to access Resource Description Framework
(RDF) datasets. The evaluation of such systems requires adequate benchmarks, consisting of RDF datasets and
keyword queries, with their correct answers. However, the sets of correct answers such benchmarks provide
for each query are often incomplete, mostly because they are manually built with experts’ help. The central
contribution of this paper is an offline method that helps build RDF keyword search benchmarks automatically,
leading to more complete sets of correct answers, called solution generators. The paper focuses on computing
sets of generators and describes heuristics that circumvent the combinatorial nature of the problem. The
paper then describes five benchmarks, constructed with the proposed method and based on three real datasets,
DBpedia, IMDb, and Mondial, and two synthetic datasets, LUBM and BSBM. Finally, the paper compares the
constructed benchmarks with keyword search benchmarks published in the literature.
1 INTRODUCTION
Keyword search is a very popular information discov-
ery method because it allows naive users to retrieve
information without any knowledge about schema de-
tails or query languages. The user specifies a few
terms, called keywords, and it is up to the system to
retrieve the documents, such as Web pages, that best
match the keywords. Recently, keyword search appli-
cations designed for Resource Description Framework
(RDF) datasets (or RDF graphs) have emerged.
Tools for keyword search over RDF datasets, or
RDF-KwS tools, have three main tasks: (1) retrieve
nodes in the RDF graph that the keywords specify; (2)
discover how they are interrelated to compose com-
plete answers; and (3) rank these answers (Menendez
et al., 2019). Hence, answers for keyword searches
over RDF datasets are not just sets of nodes but sets of
nodes and paths between them, i.e., subgraphs of the
dataset.
RDF-KwS tools are evaluated using benchmarks
a
https://orcid.org/0000-0001-8043-1510
b
https://orcid.org/0000-0001-6014-7256
c
https://orcid.org/0000-0003-0971-8572
d
https://orcid.org/0000-0003-0765-9636
with sets of information needs and their respective
lists of correct answers, possibly ordered, for a given
dataset. Despite the many existing benchmarks for
structured data, (Coffman and Weaver, 2010; Dosso
and Silvello, 2020; Dubey et al., 2019; Trivedi et al.,
2017) these benchmarks have at least three limitations
when it comes to RDF-KwS: (1) they are frequently
built for relational data; (2) they are incomplete in the
sense that they do not cover many reasonable answers;
and (3) they are not always publicly available.
To remedy the first limitation, some authors
(Garc
´
ıa et al., 2017) adapted benchmarks originally
developed for relational databases. However, the adap-
tation requires the triplification of relational databases
and benchmark data, leading to problems while com-
paring different systems using different triplifications.
As an example of the incompleteness of existing
benchmarks, consider the keyword query “Mauritius
India”, which is Query 43 for the Mondial dataset in
Coffman’s benchmark. The list of correct answers in
the benchmark covers just the organizations that both
countries belong to. However, there are other answers,
which express the geographical relationship between
Mauritius Islands and India, that should have been
included in the list of correct answers. Incompleteness
in this sense is a serious problem that is difficult to
126
Neves, A., Leme, L., Izquierdo, Y. and Casanova, M.
Automatic Construction of Benchmarks for RDF Keyword Search Systems Evaluation.
DOI: 10.5220/0010519401260137
In Proceedings of the 23rd International Conference on Enterprise Information Systems (ICEIS 2021) - Volume 1, pages 126-137
ISBN: 978-989-758-509-8
Copyright
c
2021 by SCITEPRESS Science and Technology Publications, Lda. All rights reserved
overcome in manually constructed benchmarks.
This paper overcomes these limitation by address-
ing the general problem of constructing RDF keyword
search benchmarks in a holistic approach, i.e., from
the definition of keyword queries to the computation of
correct answers. This is a challenging problem for two
fundamental and interrelated reasons: (1) RDF-KwS
tools vary widely and compute different - albeit correct
- answers for the same keyword query; (2) computing
the set of all correct answers for a given keyword query
over an RDF dataset leads to an explosive combinato-
rial problem.
In more detail, this paper proposes a method that,
given an RDF dataset, automatically defines keyword
queries and their respective correct answers. The
method is designed as an offline process to benefit
from less stringent response time constraints. It deals
with the combinatorial nature of the problem by adopt-
ing heuristics based on the concepts of seeds and solu-
tion generators, which are additional contributions of
the paper.
The rest of this paper is organized as follows. Sec-
tion 2 covers related work. Section 3 contains the
required definitions. Section 4 overviews the proposed
method. Section 5 describes the automatic generation
of keyword queries. Section 6 details the generation of
answers. Section 7 evaluates the proposed benchmark
generation method. Finally, Section 8 contains the
conclusions and suggestions for future work.
2 RELATED WORK
A crucial aspect of keyword search systems is their
evaluation. In recent years, the research community
concentrated on evaluating keyword search systems
over relational databases (Bast et al., 2016) and entity
retrieval (Balog and Neumayer, 2013). Examples of
relational benchmarks are the Coffman’s benchmark
(Coffman and Weaver, 2010), that uses samples of
Mondial, IMDb, and Wikipedia, and the Oliveira’s
benchmark (Oliveira Filho, 2018), that uses Mondial,
IMDb, DBLP, and Northwind.
Unfortunately, benchmarks to assess RDF keyword
search systems are scarce (Dosso and Silvello, 2020).
To remedy this situation, some authors (Garc
´
ıa et al.,
2017), adapted relational benchmarks to RDF. How-
ever, this approach depends on the triplification of
relational databases and does not easily induce com-
plete sets of correct query answers (Izquierdo et al.,
2018).
In fact, state-of-the-art RDF keyword search sys-
tems use different benchmarks, which are not always
available, as shown in Table 1. For example, Dosso
and Silvello (2020) described openly available bench-
marks over three real datasets, LinkedMDB, IMDb,
and a subset of DBpedia (Balog and Neumayer, 2013),
and two synthetic databases, the Lehigh University
Benchmark (LUBM) (Guo et al., 2005) and the Berlin
SPARQL Benchmark (BSBM) (Bizer and Schultz,
2009). For IMDb, they defined 50 keyword queries
and their correct translations to SPARQL queries. For
DBpedia, the authors considered 50 topics from the
classes
QALD2 te
and
QALD2 tr
of the Question An-
swering over Linked Data (QALD) campaigns (http:
//qald.aksw.org). For the synthetic databases, they
used 14 SPARQL queries for LUBM and 13 SPARQL
queries for BSBM. For all original
SELECT
queries
from these datasets, Dosso and Silvello mapped these
queries to SPARQL
CONSTRUCT
queries and produced
their equivalent keyword queries.
3 DEFINITIONS
Recall that an RDF dataset is a set
T
of RDF triples
(s, p,o)
. Furthermore, recall that
T
is equivalent to an
edge-labeled direct graph
G
T
such that the set of nodes
of
G
T
is the set of RDF terms that occur as subject or
object of the triples in
T
and there is an edge
(s,o)
in
G
T
, labeled with
p
, iff the triple
(s, p,o)
occurs in
T
.
Figure 1.a shows an RDF graph.
The resource graph induced by a subset
T
0
T
is
the subgraph
R G
T
0
of
G
T
0
obtained by dropping all
literal nodes from
G
T
0
. The nodes in
R G
T
0
are the
resources of
T
0
. The entity graph induced by a subset
T
0
T
is the subgraph
EG
T
0
of
R G
T
0
obtained by
dropping all class and property nodes from
R G
T
0
.
The nodes in EG
T
0
are the entities of T
0
.
A set of triples
T
0
T
induces a path
π =
(s
0
, p
0
,s
1
, p
1
,s
2
,...,s
n
, p
n
,s
n+1
)
in
G
T
iff, for each
i [0, n]
, either
(s
i
, p
i
,s
i+1
) T
0
or
(s
i+1
, p
i
,s
i
)
T
0
. We also say that
(s
i
, p
i
,s
i+1
)
(or
(s
i+1
, p
i
,s
i
)
)
occurs in
π
. Note that we assume that paths in
G
T
may traverse arcs in both directions. A path
π =
(s
0
, p
0
,s
1
, p
1
,s
2
,...,s
n
, p
n
,s
n+1
)
begins (resp. ends)
on a resource
r
iff
r = s
0
or
r = p
0
(resp.
r = s
n+1
or r = p
n
).
A keyword query, which represents an information
need, is a set K of literals.
A triple
(s, p,o) T
is a matching triple for
K
iff
o
is a string literal that matches at least one keyword
in
K
. We also say that
s
is a matching resource for
K
,
the set
M
s
of all matching triples in
T
for
K
whose
subject is
s
is the set of matching triples of
s
in
T
, and
the graph induced by
M
s
is the matching subgraph of
s
in
G
T
. Note that
s
may occur as the subject, property,
or object of a triple in
T
, but
s
is always the subject of
Automatic Construction of Benchmarks for RDF Keyword Search Systems Evaluation
127
Table 1: Summary of the benchmarks used in some state-of-the-arts keyword search systems.
Tool Ref. Year Description of Benchmark Used
SPARK (Zhou et al.,
2007)*
2007 Database and keyword queries from Mooney Natural Language Learning Data
QUICK (Zenz et al.,
2009)*
2009
An initial set of queries was extracted from a query log of the AOL search engine. Then, the
queries were pruned based on the visited URLs, obtaining 3,000 sample keyword queries for
IMDb and Lyrics Web pages. This process yielded 100 queries for IMDb, and 75 queries for
Lyrics, consisting of 2–5 keywords.
(Tran et al.,
2009)*
2009 DBLP, TAP (http://tap.stanford.edu) and LUBM; 30 queries for DBLP, and 9 for TAP
(Coffman
and Weaver,
2010)†
2010
Samples of the Mondial, IMDb, and Wikipedia datasets; 50 queries for each dataset (not real
user queries extracted from a search engine log).
(Elbassuoni
and Blanco,
2011)*
2011
Datasets derived from the LibraryThing community and IMDb; and 15 queries for each dataset.
(Le et al.,
2014)*
2014
Datasets: LUBM, Wordnet, BSBM, Barton and DBpedia Infobox. 12 Queries: 4 for LUBM, 2
for Wordnet, 2 for BSBM, 2 for Barton, 2 for DBpedia Infobox.
(Zheng et al.,
2016)
2016 DBpedia and Yago; queries derived from QALD-4
(Han et al.,
2017)
2017
DBpedia+QALD-6 and Freebase* + Free917: an open QA benchmark which consists of NL
question and answer pairs over Freebase.
QUIOW (Izquierdo
et al., 2018)
2018 Full versions of the Mondial and IMDb datasets, and queries from Coffman’s benchmark.
(Lin et al.,
2018)
2018
LUBM, Wordnet, BSBM, Barton and DBpedia Infobox; 4 queries for LUBM, and 10 queries
for the other datasets.
KAT (Wen et al.,
2018)
2018
YAGO, DBLP and LUBM; 9 queries for YAGO, 3 queries for DBLP, and 6 queries for LUBM.
(Rihany
et al., 2018)
2018
AIFB and DBpedia; 10 queries for each dataset (the sizes of the queries were between 2 and 8
keywords).
QUIRA (Menendez
et al., 2019)
2019
Full versions of IMDb and MusicBrainz; 50 queries from Coffman’s benchmark for IMDb,
and 25 queries from QALD-2 for MusicBrainz. Details available at https://sites.google.com/
view/quira/
TSA+BM25
and
TSA+VDP
(Dosso and
Silvello,
2020)
2020 Real datasets:
LinkedMDB, IMDb, and a subset of DBPedia; 50 queries of Coffman’s
benchmark for each dataset.
Synthetic datasets:
LUBM and BSBM; with 14 and 13 queries,
respectively.
* Datasets have no public link or are not available for download.
† Benchmark for evaluating keyword search systems over relational databases.
a matching triple.
A set of triples
A T
is an answer for
K
over
T
iff
A
can be partitioned into two subsets
A
0
and
A
00
such that: (i)
A
0
is the set of all matching triples for
K
in
A
; let
R
A
be the set of subjects of such triples;
(ii)
R G
A
00
, the resource graph induced by
A
00
, is con-
nected and contains all resources in
R
A
. Condition (i)
captures keyword matches and Condition (ii) indicates
that an answer must connect the matching resources
by paths in R G
A
00
.
We also say that
R
A
is the set of matching re-
sources of
A
and that
R G
A
00
is the connectivity graph
of A.
Figure 1 shows an RDF graph and three answers
for the keyword query
M S
=
{
“character”, “meryl”,
“streep”, “movie”, “out”, “africa”}.
This definition of answer is less stringent than
those introduced in (Bhalotia et al., 2002; Hristidis and
Papakonstantinou, 2002; Kimelfeld and Sagiv, 2008)
since it neither requires all keywords to be matched
nor includes a minimality criterion. For later reference,
we define that an answer
A
for
K
over
T
is minimal
iff there is no proper subset
B
of
A
such that
B
is
an answer for
K
and
B
and
A
have the same set of
matching resources.
Finally, we can informally state the RDF KwS-
Problem as: “Given an RDF dataset
T
and a keyword
query
K
, find an answer
A
for
K
over
T
, preferably,
with as many keyword matches as possible and with
the smallest set of triples as possible”.
4 OVERVIEW OF THE METHOD
This section outlines the proposed method for the au-
tomatic construction of benchmarks for RDF datasets.
Given an RDF dataset
T
, the key problems are how to
automatically generate sets of keyword queries over
T
and how to compute ranked lists of correct answers
ICEIS 2021 - 23rd International Conference on Enterprise Information Systems
128
(a) The induced graph of an RDF dataset T .
(b) Answer A
1
.
(c) Answer A
2
.
(d) Answer A
3
.
Figure 1: Example of an RDF graph and of three answers for the information need “character of Meryl Streep in the movie Out
of Africa”.
for these queries.
Let
T
be a dataset and
EG
T
be the entity graph
induced by T .
The method starts by computing keyword queries,
that is, sets of keywords, based on a set I of inducing
entities or inducers. First, it selects a set of induc-
ers
I
(see Section 5 for examples) and computes a
neighborhood graph
G
N
from
EG
T
, for each
i I
.
The neighborhood graph is composed of the nodes
and edges visited by a breadth-first walk of distance
d
through the paths in
EG
T
, starting from
i
, where
d
is a user-defined parameter. After that, the method en-
riches
G
N
with the nodes and edges of
G
T
that denote
the classes of the entities in
N
. The enriched graph is
denoted G
N
0
.
Then, it extracts from
T
keyword queries (again,
sets of keywords) from the string literal values of
datatype properties of resources and edges in
N
0
. Note
that edges in
N
0
can be resources in
T
with datatype
properties. Since, by definition, answers are connected
graphs, the neighborhood graph is an appropriate strat-
egy to derive keywords.
Let
K
be a keyword query thus generated. Note
that, although
K
has been extracted for a particular in-
ducer
i I
,
K
may also be present in other subgraphs
of
T
that have no relationship with
i
, but that, accord-
ing to the definition, can also be considered correct
answers of
K
. The inducers then have the only pur-
pose of deriving keyword queries connected in at least
G
N
0
. This approach allows one to compute as many
keyword queries are needed to create a benchmark.
The next step is to compute possible answers for
K
.
The method computes the complete set of matching
resources for
K
, called the set of seeds for
K
, denoted
S
K
. By the previous definitions, the set of matching
resources of an answer
A
for
K
must be a subset of
S
K
. Preferably, answers should contain subsets of
S
K
that match all keywords in K .
Consider the following example. Let “character
of Meryl Streep in the movie Out of Africa” be an
information need over the dataset
T
of Figure 1.a.
This information need can be translated to a key-
word query
M S
=
{
“character”, “meryl”, “streep”,
“movie”, “out”, “africa”
}
. The set of seeds for
M S
is
S
M S
= {A, B,D,F,H}
. Figures 1.b–d show the graphs
induced by three possible answers for
M S
contain-
ing subsets of
S
M S
. Intuitively, answer
A
1
(Figure
1.b) is better than
A
2
(Figure 1.c) and
A
3
(Figure 1.d),
Automatic Construction of Benchmarks for RDF Keyword Search Systems Evaluation
129
(a) The solution generator for
S
K
= {A, B,D, F, H}
from the
dataset in Fig. 1a.
(b) The solution generator for
S
K
= {A,B,F,H}
from the
dataset in Fig. 1a.
Figure 2: Examples of solution generators.
because
A
1
addresses what seems to be the query in-
tention, which is to find the character played by the
actress in the movie. Also,
A
1
matches 6 of the 6 key-
words in
M S
, while
A
2
and
A
3
match only 5 and 3
keywords, respectively. Note that, in
A
1
and
A
2
, but
not in
A
3
, keywords are not matched by more than
one literal. Also note that the keywords “movie” and
“character” are associated with elements of the schema,
nodes B and H.
This example illustrates two important characteris-
tics of keyword queries and correct answers: keyword
queries name existing resources and answers correlate
these resources. In this example, “Meryl Streep”, “Out
of Africa”, “character”, and “movie” are informal re-
source identifiers that represent an actress, a movie, the
Character class, and the Movie class, respectively.
Let
R S
K
be a subset of seeds. Assume that
all seeds in
R
belong to the same connected com-
ponent of
R G
T
, the resources graph induced by
T
.
One can compute all possible answers whose matching
resources are subsets of
R
by computing all acyclic
paths in
R G
T
between pairs of distinct nodes in
R
,
combining them in all distinct ways to construct con-
nected graphs containing all nodes in
R
, and adding
the matching subgraphs for the seeds r R .
Consider the set of seeds
R
0
M S
= {A, B,F, H}
,
for example. If one selects the paths
(B A
G F)
and
(G H)
and adds the matching
subgraphs
(A Out o f A f rica)
,
(B Movie)
,
(F Meryl Streep)
, and
(H Character)
, one
will obtain the answer in Figure 1.b. If one selects
the path
(B A F G H)
and adds the same
matching subgraphs, one will obtain another valid an-
swer (not shown in the figure).
Consider now the set of seeds
R
00
M S
= {A,B,F}
.
If one selects the path
(B A F)
and adds the
matching subgraphs
(A Out o f A f rica)
,
(B
Movie)
, and
(F Meryl Streep)
, one will obtain
the answer in Figure 1.c. If one selects a different
path
(B A G F)
, one will obtain yet another
answer. In general, distinct path combinations may
lead to distinct valid answers for the same set of seeds.
More precisely, let
K
again be a keyword query
and
R S
K
. Assume that all seeds in
R
belong to the
same connected component of
R G
T
. A set of triples
SG T
is a solution generator for
R
iff
SG
can be
partitioned into two sets,
SG
0
and
SG
00
, such that: (i)
SG
0
is the set of all matching triples of the resources in
R
; (ii)
SG
00
is the set of all triples that occur in paths
in
R G
T
that begin and end on the seeds in
R
. We say
that
SG
expresses an answer
A
iff
A S G
and the set
of matching resources of A is R .
Figure 2 shows the solution generators for
R
M S
=
{A,B,D, F,H}
and
R
0
M S
= {A,B,F,H}
. Note that,
even though both
R
M S
and
R
0
M S
cover all keywords,
they are distinct and express distinct sets of answers.
A synthetic benchmark is then a triple
s =
(T , Q
s
,A
s
)
such that
T
is an RDF dataset,
Q
s
is a list
of keyword queries, and
A
s
contains, for each query in
Q
s
, a list of solution generators. Section 7 illustrates
this concept.
However, computing all solution generators can be
expensive, depending on the cardinality of
S
K
and the
number of paths between the seeds. We then propose
to compute solution generators that capture only the
most relevant answers. Fortunately, and unlike tradi-
tional Information Retrieval (IR) systems, the RDF
KwS-Problem has some peculiarities that can be ex-
ploited to define optimization heuristics that can re-
duce the computational cost and also help rank solution
generators.
The differences between traditional IR systems and
ICEIS 2021 - 23rd International Conference on Enterprise Information Systems
130
RDF-KwS systems stem from the fact that keywords
which co-occur in a document may not convey the idea
of keyword correlation. By contrast, an RDF subgraph
connecting resources linked to these keywords is much
more likely to be related to the intended meaning of the
keyword query because resources are not connected
by chance in an RDF graph, as it may be the case in a
text document.
By assuming such differences, one can define some
characteristics of good answers: the keywords must
be connected as closely as possible, and answers must
contain as many keywords as possible. Other features
can be defined based on the number of resources in
an answer and the co-occurrence of keywords among
the literals. These features may guide the automatic
computation of answers by helping prune the search
space.
Section 6 discusses four heuristics to work around
the complexity of the problem of computing solution
generators, guided by five questions: (1) Are all seeds
relevant?; (2) Are all paths between seeds relevant?;
(3) Should we prefer answers that match more literals
or answers that match fewer literals?; (4) Should we
prefer answers in which literals in the keyword query
occur in many seeds, or answers in which literals occur
in only one seed?; (5) Should we prefer answers with
many seeds or answers with fewer seeds?
5 GENERATING KEYWORD
QUERIES
This section describes, with the help of examples, how
to automatically generate sets of keyword queries for
a given RDF dataset T .
An inducer function for
T
is a function that maps
T
into a set of resources of
T
, called inducers. Such
function should follow requirements that are consistent
with the benchmark’s purpose, i.e, it should select
entities from the information domain in question and
with appropriate relevance scores.
The relevance scores typically express users’ pref-
erences and can be used to select entities and, con-
sequently, induce relevant sets of keywords and their
respective answers. They can also be used to challenge
RDF-KwS systems by selecting less relevant resources
and causing the opposite effect. For example, let
T
be the Mondial RDF dataset
1
. An inducers function
for
T
would select the top-k and bottom-k countries
according to their infoRank score, which is a metric
defined in (Menendez et al., 2019) that reflects the
1
http://www.dbis.informatik.uni-goettingen.de/
Mondial/
relevance of the resources.
The next step is to compute the neighborhood
graph for each inducer. Figure 3.a shows a fragment
of the neighborhood graph
G
India
0
for the Country In-
dia, which is the 8th country with the largest infoRank
score in the Mondial dataset. The grayed nodes are
the classes of the entities and the symbol
...
close to
the edges indicates that the corresponding property is
multivalued. For the sake of conciseness, some paths
of length 2 starting from India were omitted.
The 43rd query of Coffman’s benchmark for Mon-
dial,
K
Coff-43
= {mauritius,india}
, can be created
by extrating the keywords from the datatype property
name of the entities India and Mauritius in
G
India
0
.
Figure 3.b shows the ground truth for
K
Coff-43
in the
aforementioned benchmark.
The query generation process then consists in creat-
ing sets of keywords from
G
India
0
and
T
. Let
G
India
00
be
the maximally subgraph of
T
containing all resources
and properties in
India
0
. Note that
G
India
00
is equal to
G
India
0
with additional property and literal nodes.
Let
TF-IDF(k)
be the term-frequency-inverse-
document-frequency score for each non stop-word in
the labels of literal nodes in
G
India
00
. One can then de-
fine sets of rules for generating keywords such as: (1)
extract the top-k words with largest TF-IDF from liter-
als of class resources; (2) extract the top-k words with
largest TF-IDF from literals of property resources; (3)
extract the top-k words with largest TF-IDF from lit-
erals of entity resources; (4) extract the top-k words
with largest TF-IDF from literals of mixed resources.
Note that TF-IDF can be replaced by any other conve-
nient score, such as BM25, that the set of rules must be
defined ad hoc, and that the relevance score TF-IDF
can also be used likewise the infoRank to challenge
the RDF-KwS systems by allowing one to select less
relevant words.
6 COMPUTING SOLUTION
GENERATORS
This section addresses the computation of solution
generators for keyword queries, through four heuristics
to circumvent the complexity of the problem.
The first heuristic refers to the selection of the most
relevant seeds. Assume that Lucene for Apache Jena
Fuseki is the text search engine over RDF adopted. The
Lucene score is a TF-IDF-based score that captures
the relevance of property values with respect to the
keywords. The full set of seeds of a keyword query
can be obtained from the Lucene inverted index, which
can be a large set. One could then limit this set to the
top-
k
resources according to the Lucene score, but the
Automatic Construction of Benchmarks for RDF Keyword Search Systems Evaluation
131
(a) Neighborhood graph of the entity denoting the country India.
(b) A correct answer for the 43rd query
K
Coff-43
= {mauritius,india}
in the Coffman’s benchmark.
Figure 3: Example neighborhood, keyword query and a possible answer.
resource labeled “Meryl Streep” would be the 7th entry
in the ranked list of seeds and the resource labeled “Out
of Africa” would be the 30th entry. If one took the top
30 resources in the ranking, the two seeds would be
selected, but many other less relevant seeds would also
be selected.
Let
K
be keyword query and
S
K
be the set of seeds
of
K
. We define the entity score of a seed
s S
K
with
respect to K as:
es(s,K ) =
1
2
(max
v
j
(lucene(s, v
j
,K ))+
infoRank(s))
(1)
where
v
j
is a string value such that there is a triple
(s, p,v
j
) T
. The infoRank score (Menendez et al.,
2019) reflects the relevance of
s
to users. The entity
score ranges in
[0,1]
, since we used normalized ver-
sions of
lucene(s, v
j
,K )
and
in f oRank(s)
. If a text
search engine other than Lucene is adopted or a re-
source relevance measure other than infoRank is used,
Eq. 1 should be adjusted accordingly.
By ranking the set of seeds of
K
according to
the entity score, the resource labeled “Meryl Streep”
would appear in the 1st position and that labeled with
“Out of Africa” would appear in the 18th position.
ICEIS 2021 - 23rd International Conference on Enterprise Information Systems
132
The first heuristic is then to refine the set of
seeds
S
K
to be the set
Σ
K
of the top-
σ
2
elements of
{s S
K
|es(s,K ) σ
1
}
ordered in decreasing order
of entity score, where
σ
1
and
σ
2
are thresholds em-
pirically defined to optimize computing resources and
match user’s preferences.
The threshold
σ
1
defines a minimum seed entity
score to speed up the Lucene engine, for practical rea-
sons, and
σ
2
effectively limits the number of selected
seeds. By defining
σ
1
= 0
and
σ
2
=
, one would
select the full set of seeds. But, by defining
σ
1
> 0
and
σ
2
<
one can restrict the cardinality of
2
Σ
K
and,
consequently, the total number of paths to compute.
However,
Σ
K
may not cover all keywords in
K
. We then compute another set
S
K
i
of match-
ing resources, where
K
i
is the subset of
K
not
matched by resources in
Σ
K
, and repeat this pro-
cedure to extend
Σ
K
until no further keyword
can be matched or
K
i
= {}
. More precisely, for
any
K
i
K
, let
S
K
i
be the set of seeds in
S
K
that match keywords in
K
i
. Let
µ[σ
1
](K
i
,S
K
i
) =
{s S
K
i
|es(s,K
i
) σ
1
}
, and
τ[σ
1
,σ
2
](K
i
,S
K
i
)
be the top-
σ
2
elements of
µ[σ
1
](K
i
,S
K
i
)
,
ordered by the entity score of the seeds.
Let
(K
0
,..., K
N
)
be the longest sequence such
that
K
0
= K
and, for each
i [1,N]
,
K
i
is the
set of keywords in
K
not matched by seeds in
τ[σ
1
,σ
2
](K
i1
,S
K
i1
)
and
K
i
6=
/
0
and
S
K
i
6=
/
0
. Then,
Σ
K
is defined as
Σ
K
=
N
[
i=0
τ[σ
1
,σ
2
](K
i
,S
K
i
) (2)
The second heuristic refers to the selection of sets in
2
Σ
K
for which one would compute the solution gen-
erators. This is done by scoring each
R 2
S
K
and
selecting the top ranked ones based on four principles.
First, the set of matching keywords of
R
is the union
of the set of keywords matched by each resource in
R
;
the sets
R
with largest number of keyword matchings
are preferable and potentially would generate better an-
swers. Second, the keywords should preferably match
just a few seeds of an answer, because keywords iden-
tify entities. We assume that answers where keywords
do not match more than one resource, such as those
in Figures 1.b and 1.c, are more relevant. Neverthe-
less, as detailed later in this section, this constraint
can be relaxed to allow answers such as that in Figure
1d. Third, smaller answers, in terms of the number
of seeds, are preferable over larger ones, since they
are easier to understand. Lastly, not only small sets
are preferable, but it is also necessary that their re-
sources are the most relevant to users. Based on these
principles, we define the following scores.
Let
E
be a set of resources. The coverage score
of
E
, denoted
cs(E,K )
, measures the number of key-
words in
K
matched by resources in
E
;
cs(E,K ) = 1
,
if all keywords are matched, and
cs(E,K ) = 0
, if no
keyword is matched:
cs(E,K ) =
k
i
K
occur(E,k
i
)
|K |
(3)
where
occur(E,k
i
) = 1
, if some resource in
E
matches
k
i
, and occur(E,k
i
) = 0, otherwise.
The co-occurrence score of
E
, denoted
os(E,K )
,
measures the repetition degree of keywords in
K
among resources in
E
;
os(E,K ) = 1
, if each key-
word is matched by only one resource, and
os(E) = 0
,
if all keywords are matched by all resources:
os(E,K ) =
1
f (E ,K )
|E|
1
1
|E|
(4)
where
f (E,K )
is the average number of resources in
E
that match keywords in
K
. Keywords not covered
in
E
are not taken into account;
os(E,K )
is assumed
to be 1, if the denominator is 0.
Let
C
be the collection of sets of resources con-
sidered. The size score of
E
w.r.t.
C
, denoted
ss(E)
,
measures the relative size of
E
;
ss(E) = 1
, if
E
is one
of the smallest sets, and
ss(E) = 0
, if
E
is one of the
largest sets:
ss(E) =
N |E|
N 1
(5)
where
N
is the cardinality of the largest set in
C
;
ss(E,K ) is assumed to be 1, if the denominator is 0.
The infoRank score of
E
, denoted
is(E)
, is the
average infoRank value (Menendez et al., 2019) of the
resources in E:
is(E) = average({in f oRank(s
i
)|s
i
E}) (6)
The second heuristic is then the refinement of the set
2
Σ
K
by choosing only those
R 2
Σ
K
with better cover-
age, lower co-occurrence, fewer number of resources,
and with more relevant nodes. Recall that a lower
co-occurrence would favor answers such as those in
Figures 1.b and 1.c, while allowing answers such as
that in Figure 1.d. On the other hand, no co-occurrence
(
os(R ,K ) = 1
) would allow answers such as those in
Figures 1.b and 1.c only. The refinement is expressed
by defining set Π
K
as follows:
Π
K
= {R 2
Σ
K
|cs(R ,K ) σ
3
os(R , K )
σ
4
ss(R ) σ
5
is(R ) σ
6
}
(7)
where
σ
3
,
σ
4
,
σ
5
, and
σ
6
are empirically defined ac-
cording to the available computing resources and user
preferences. If
σ
3
= σ
4
= σ
5
= σ
6
= 0
then
Π
K
= 2
Σ
K
and all possible solution generators with
Σ
K
would
Automatic Construction of Benchmarks for RDF Keyword Search Systems Evaluation
133
be computed. The above scores can be redefined for
subsets of triples
U T
by taking
E = E
U
, where
E
U
is the set of all resources in U.
The third heuristic is to consider only paths be-
tween seeds with length less than or equal to a given
limit
L
, say
L = 4
, to compute solution generators. In
fact, as argued in Nunes et al. (2014), paths longer than
4 would express unusual relationships, which might
be misinterpreted by users.
The fourth heuristic is to rank the solution gener-
ators in
Π
K
according to their scores, following user
needs. For example, if one ranks based only on the cov-
erage and size then one may define a Boolean function
“order” between solution generators as follows
order(SG
1
,SG
2
) =
cs(SG
1
,K ) cs(SG
2
,K ), if C holds
ss(SG
1
,K ) ss(SG
2
), otherwise
(8)
where
C
is the condition
cs(SG
1
,K ) 6= cs(S G
2
,K )
.
Alternatively, one could rank solution generators by
their average scores as in Eq. 9 to balance the losses
and gains of each individual score:
order(SG
1
,SG
2
) =
1
4
(cs(SG
2
,K )+
os(SG
2
,K ) + ss(SG
2
) + is(SG
2
))
(9)
Eqs. 2, 7, 8, and 9 are in fact flexibilization points of
the method. For example, instead of using the Lucene
score in Eq. 1, one could use a keyword count, and
instead of a selection in Eq. 7, one could use the top
of the ranking of the sets
R 2
Σ
M S
with the order
function defined in Eq. 9. The method chosen will
determine the set of solution generators that will be
computed. The central contribution of this work lies,
then, in using the peculiarities of RDF keyword search
to define a process for optimizing the computation of
solution generators.
Algorithm 1 embeds the proposed heuristics to
reduce the cost of computing solution generators by
disregarding the less relevant ones. It takes as input a
keyword query
K
and an RDF dataset
T
, and outputs
a ranked list of solution generators according to the
score function in Eq. 8. Lines 1 and 2 prepare the
sets of nodes that will guide the computation of the
solution generators according to Eqs. 2 and 7. Lines
4–12 compute solution generators for each set of seeds
in
Π
K
. In lines 9–11, if the set of triples
SG
computed
for a set of seeds
R Π
K
does not induce a connected
graph
G
0
SG
(connectivity graph of
SG
), then
SG
is
discarded because, for each connected component
C
i
of
G
0
SG
, there is a strict subset
R
j
of
R
such that the
set of triples computed for R
j
induces C
i
.
Algorithm 1: Computes a ranked list of solution generators
for a keyword query K over an RDF dataset T .
Require:
a keyword query
K
and an RDF dataset
T
1: Σ
K
=
S
N
i=0
τ[σ
1
,σ
2
](K
i
,S
K
i
)
2: Π
K
= {R 2
S
K
|cs(R ,K ) σ
3
os(R ,K )
σ
4
ss(R ) σ
5
is(R , K ) σ
6
}
3: U = {}
4: for all R Π
K
do
5: SG = {}
6: for all distinct unordered pairs of nodes {n
1
,
n
2
} such that n
1
,n
2
Σ
K
do
7: SG = SG {(s, p,o) T |(s, p,o)
is in a
path with length 4 between n
1
and n
2
}
8: end for
9: if G
0
SG
is connected then
10: U = U {S G}
11: end if
12: end for
13:
Create
U
0
by ordering
U
using the score function
in Eq. 8
14: return U
0
7 EVALUATION OF THE
BENCHMARK GENERATION
METHOD
This section addresses the key question of evaluat-
ing the proposed benchmark generation method. The
evaluation concentrates on assessing the quality of
the solution generators. An evaluation of the strategy
to generate keyword queries is omitted due to space
limitations.
The evaluation strategy goes as follows. Let
b = (D,Q
b
,A
b
)
be a baseline benchmark, where
D
is an RDF dataset,
Q
b
is a set of keyword queries,
and
A
b
defines the correct answers for the queries in
Q
b
. The strategy is to construct a synthetic benchmark
s = (D,Q
s
,A
s
)
for the same dataset
D
, using the pro-
posed method, where
Q
b
Q
s
6=
/
0
and
A
s
contains, for
each keyword query in
Q
s
, a list of solution generators
over
D
, synthesized using an implementation of Algo-
rithm 1. Then, for each keyword query in
Q
b
Q
s
, we
compare the answers in
A
b
with the solution generators
in
A
s
, as explained in what follows, and this is the key
point of the evaluation. Note that we have to guarantee
that
Q
b
Q
s
6=
/
0
as otherwise the benchmark com-
parison would have a vacuous effect. In fact, rather
than using the keyword query generation method dis-
cussion in Section 5, we manually selected keyword
queries from the baseline benchmarks to include in the
synthetic benchmarks, as discussed in what follows.
ICEIS 2021 - 23rd International Conference on Enterprise Information Systems
134
As baselines, we adopted a benchmark for RDF-
KwS based on Coffman’s benchmark (Coffman and
Weaver, 2010) and the Dosso’s benchmarks described
in (Dosso and Silvello, 2020). Coffman’s bench-
mark was created to evaluate keyword search sys-
tems over relational databases, and is based on data
and schemas of relational samples of IMDb, Mondial,
and Wikipedia. For each database, it has 50 keyword
queries and their correct answers. Dosso and Silvello
(2020) used three real RDF datasets, LinkedMDB,
IMDb, and DBpedia, and two synthetic RDF datasets,
The Lehigh University Benchmark – LUBM, and The
Berlin SPARQL Benchmark – BSBM.
As for the datasets, we chose triplifications of rela-
tional versions of the full Mondial and IMDb datasets,
and not just samples as in Coffman’s benchmark, and
the RDF datasets BSBM, LUBM, and DBpedia from
Dosso’s benchmark. For each such RDF dataset, we
computed the infoRank scores.
We selected the keyword queries of the synthetic
benchmarks as follows. Quite a few keyword queries
in Coffman’s benchmark are simple queries in that
their expected answers are single entities. Since these
queries do not explore the complexity of the graph
structure of the RDF datasets, we disregarded them for
IMDb and Mondial. We used the keyword queries for
BSBM, LUBM, and DBpedia as defined in Dosso’s
benchmark. In total, we used 35 keyword queries for
IMDb and 24 for Mondial, from Coffman’s benchmark,
and 50 keyword queries for DBpedia, 14 for LUBM,
and 13 for BSBM, from Dosso’s benchmark.
Finally, we ran an implementation of Algorithm 1
to obtain a list of solution generators, for each of the
selected keyword queries. The parameters described
in Section 6 were set as
σ
1
= 0, σ
2
= 5, σ
3
= 1.0, σ
4
=
0.5,σ
5
= 0.2, σ
6
= 0.2, for all datasets.
The above process resulted in 5 synthetic bench-
marks, for IMDb, Mondial, BSBM, LUBM, and
DBpedia. The RDF datasets are available at https:
//doi.org/10.6084/m9.figshare.11347676.v3, and the
implementation of Algorithm 1, the keyword queries,
the solution generators, and statistics are available at
https://doi.org/10.6084/m9.figshare.9943655.v12.
For each of the five RDF datasets, we now compare
the baseline benchmark with the corresponding syn-
thetic benchmark. Consider the following questions
(where
K
is a keyword query of both the baseline
benchmark and the corresponding synthetic bench-
mark, as explained above):
Q1.
What is the total number
s
K
of answers expressed
by the solution generators for
K
in the synthetic
benchmark?
Q2.
What is the total number
bs
K
of answers of
K
,
defined in the baseline benchmark, that are ex-
pressed by the solution generators for
K
in the
synthetic benchmark?
Q3.
What is the total number
sn
K
of answers ex-
pressed by the solution generators for
K
in the
synthetic benchmark that are not defined in the
baseline benchmark?
Let
B
K
be the set of answers for
K
defined in the
baseline benchmark and b
K
= |B
K
|.
To address these questions, one has to compute
the set
S
K
of answers for
K
that the solution gener-
ators for
K
express. It depends on the exact notion
of answer one is adopting. For example, the set of
minimal answers can be estimated by counting the
minimal Steiner Trees (Oliveira et al., 2020; Dourado
and de Oliveira, 2009) of a solution generator
SG
whose terminal nodes are the set of seeds of
SG
. To
compute
|B
K
S
K
|
, one has to test, for each answer
A B
K
, if there is some solution generator for
K
that
expresses A. Hence, we have that
s
K
= |S
K
|
bs
K
= |B
K
S
K
|
sn
K
= |S
K
B
K
| = |S
K
| |B
K
S
K
|
Column
#M s
of Table 2 shows the total number of
minimal answers expressed by the solution generators
for K that are not defined in the original benchmarks.
Q1 and Q2 lead to an interesting discussion. Con-
sider that the baseline benchmarks are keyword search
systems to be evaluated against the synthetic bench-
marks. Then, one can compute the precision of the
baseline benchmark for
K
against the equivalent syn-
thetic benchmark as
p
K
= bs
K
/b
K
. The larger
p
K
is,
the larger will be the number of correct answers for
K
, in the baseline benchmark, that the solution gener-
ators express. Likewise, one can compute the recall of
the baseline benchmark for
K
against the equivalent
synthetic benchmark as r
K
= bs
K
/s
K
.
Table 2 summarizes statistics for Mondial, IMDb,
and DBpedia. For sample keyword queries (to save
space), it shows the number of retrieved seeds, the pre-
cision values that the baseline benchmarks achieved,
the number of solution generators obtained from the
seeds, and the number of minimal answers expressed
by the solution generators that are not defined in the
baseline benchmarks. For example, for the keyword
query
K = {niger,country}
from Mondial, the
algorithm selected four seeds: the
country Niger
,
the
province Niger
, the
river Niger
, and the class
Country
. Then, it computed four solution generators:
1) with all seeds; 2) with all seeds, except the class
Country
; 3) with two seeds, the class
Country
and the
node
Niger
, which is an instance of class
Country
;
and 4) with only the class Country.
Automatic Construction of Benchmarks for RDF Keyword Search Systems Evaluation
135
Table 2: Benchmarks statistics obtained for Mondial, IMDb, and DBpedia.
Datasets Sample Keyword Queries #Seeds Precision #Sol. Generators #M s
Mondial
niger country 4 1.00 4 23
haiti religion 2 1.00 1 2
mongolia china 4 1.00 2 3
lebanon syria 5 1.00 3 10
poland cape verde organization 5 0.82 4 132
rhein germany province 5 0.50 2 82
OVERALL AVERAGE 0.91 5.00 184.83
IMDb
Johnny Depp Actor 5 1.00 12 46
Will Smith Male 5 1.00 6 21
Atticus Finch Movie 5 1.00 10 35
russell crowe gladiator character 5 0.50 11 21
sean connery ian fleming work 5 0.11 10 27
OVERALL AVERAGE 0.52 5.51 38.60
DBpedia
Captain America creator notable works 5 1.00 5 47
Canada Capital 5 1.00 3 57
governor of Texas 5 1.00 5 58
Francis Ford Coppola film director 5 1.00 11 61
mayor of new york city 5 0.00 2 9
NASA launchpad 5 0.00 3 5
OVERALL AVERAGE 0.58 8.22 91.86
The Overall Averages can be interpreted as the per-
centage of the correct answers, defined in the baseline
benchmarks, that the solution generators express - 91%
for Mondial, 52% for IMDb, and 58% for DBpedia -
which is quite reasonable for synthetic benchmarks.
Finally, we remark that, for the synthetic datasets
(BSBM and LUBM), Algorithm 1 found exactly the
correct answers listed in Dosso’s benchmark.
8 CONCLUSIONS
One of the main issues in the development of RDF
keyword search algorithms is the lack of appropri-
ate benchmarks. This paper then proposed an offline
method to construct benchmarks for RDF keyword
search algorithms. The method automatically specifies
sets of keyword queries and their correct answers over
a given RDF dataset. It circumvents the combinato-
rial nature of generating correct answers by pruning
the search space, following four heuristics based on
the concepts of seeds and solution generators. The
proposed heuristics introduce flexibilization points of
the method that enable the construction of different
benchmarks according to the purpose.
The paper proceeded to describe synthetic bench-
marks for IMDb, Mondial, BSBM, LUBM, and DBpe-
dia. The experiments compared the synthetic bench-
marks with baseline benchmarks, and showed that the
solution generators obtained express the majority of
the correct answers defined in the baseline benchmarks,
but express many more answers, for the datasets with
real data, IMDb, Mondial, and DBpedia, and express
exactly the answers defined for the synthetic datasets,
BSBM and LUBM.
As future work, we plan to fine-tune Algorithm
1 to improve the results summarized in Table 2. We
also plan to modify the proposed method to construct
training datasets with Natural Language queries over
complex RDF datasets.
ACKNOWLEDGEMENTS
This work was partly funded by FAPERJ under
grants E-26/010.000794/2016 and E-26/202.818/2017;
by CAPES under grant 88881.134081/2016-01 and
88882.164913/2010-01 and by CNPq under grant
302303/2017-0. We are grateful to Jo
˜
ao Guilherme
Alves Martinez for helping with the experiments.
REFERENCES
Balog, K. and Neumayer, R. (2013). A Test Collection for
Entity Search in DBpedia. In Proceedings of the 36th
International ACM SIGIR Conference, pages 737–740.
Bast, H., Buchhold, B., and Haussmann, E. (2016). Semantic
ICEIS 2021 - 23rd International Conference on Enterprise Information Systems
136
search on text and knowledge bases. Foundations and
Trends® in Information Retrieval, 10(2-3):119–271.
Bhalotia, G., Hulgeri, A., Nakhe, C., Chakrabarti, S., and
Sudarshan, S. (2002). Keyword searching and brows-
ing in databases using BANKS. In Proceedings of the
18th International Conference on Data Engineering
(ICDE’02), pages 431–440.
Bizer, C. and Schultz, A. (2009). The berlin sparql bench-
mark. International Journal on Semantic Web and
Information Systems (IJSWIS), 5(2):1–24.
Coffman, J. and Weaver, A. C. (2010). A framework for
evaluating database keyword search strategies. In Pro-
ceedings of the 19th ACM International Conference
on Information and Knowledge Management, pages
729–738.
Dosso, D. and Silvello, G. (2020). Search Text to Retrieve
Graphs: a Scalable RDF Keyword-Based Search Sys-
tem. IEEE Access, 8:14089–14111.
Dourado, M. C. and de Oliveira, R. A. (2009). Generating
all the Steiner trees and computing Steiner intervals
for a fixed number of terminals. Electronic Notes in
Discrete Mathematics, 35:323–328.
Dubey, M., Banerjee, D., Abdelkawi, A., and Lehmann, J.
(2019). LC-QuAD 2.0: A Large Dataset for Complex
Question Answering over Wikidata and DBpedia. In
Proceedings of the 18th International Semantic Web
Conference (ISWC’19), pages 69–78.
Elbassuoni, S. and Blanco, R. (2011). Keyword search over
RDF graphs. In Proceedings of the 20th ACM Inter-
national Conference on Information and Knowledge
Management (CIKM’11), pages 237–242.
Garc
´
ıa, G. M., Izquierdo, Y. T., Menendez, E. S., Dartayre,
F., and Casanova, M. A. (2017). RDF Keyword-based
Query Technology Meets a Real-World Dataset. In
Proceedings of the 20th International Conference on
Database Theory (ICDT’17), pages 656–667.
Guo, Y., Pan, Z., and Heflin, J. (2005). LUBM: A Benchmark
for OWL Knowledge Base Systems. Journal of Web
Semantics, 3(2-3):158–182.
Han, S., Zou, L., Yu, J. X., and Zhao, D. (2017). Key-
word search on RDF graphs - A query graph assembly
approach. In Proceedings of the 2017 ACM on Confer-
ence on Information and Knowledge (CIKM’17), pages
227–236.
Hristidis, V. and Papakonstantinou, Y. (2002). DISCOVER:
Keyword Search in Relational Databases. In Proceed-
ings of the 28th VLDB (VLDB’02), pages 670–681.
Izquierdo, Y. T., Garc
´
ıa, G. M., Menendez, E. S., Casanova,
M. A., Dartayre, F., and Levy, C. H. (2018). QUIOW: A
keyword-based query processing tool for RDF datasets
and relational databases. In Proceedings of the 30th
International Conference on Database and Expert Sys-
tems Applications (DEXA’18), volume 11030 LNCS,
pages 259–269.
Kimelfeld, B. and Sagiv, Y. (2008). Efficiently enumerating
results of keyword search over data graphs. Informa-
tion Systems, 33(4-5):335–359.
Le, W., Li, F., Kementsietsidis, A., and Duan, S. (2014). Scal-
able keyword search on large RDF data. IEEE Trans-
actions on Knowledge and Data Engineering (TKDE),
26(11):2774–2788.
Lin, X. Q., Ma, Z. M., and Yan, L. (2018). RDF keyword
search using a type-based summary. Journal of Infor-
mation Science and Engineering, 34(2):489–504.
Menendez, E. S., Casanova, M. A., Paes Leme, L. A. P., and
Boughanem, M. (2019). Novel Node Importance Mea-
sures to Improve Keyword Search over RDF Graphs.
In Proceedings of the 31st International Conference on
Database and Expert Systems Applications (DEXA’19),
volume 11707, pages 143–158.
Nunes, B. P., Herrera, J., Taibi, D., Lopes, G. R., Casanova,
M. A., and Dietze, S. (2014). SCS Connector - Quanti-
fying and Visualising Semantic Paths Between Entity
Pairs. In Proceedings of the Satellite Events of the
11th European Semantic Web Conference (ESWC’14),
pages 461–466.
Oliveira, P. S. d., Da Silva, A., Moura, E., and De Fre-
itas, R. (2020). Efficient Match-Based Candidate Net-
work Generation for Keyword Queries over Relational
Databases. IEEE Transactions on Knowledge and Data
Engineering, pages 1–1.
Oliveira Filho, A. d. C. (2018). Benchmark para m
´
etodos
de consultas por palavras-chave a bancos de dados
relacionais. Technical report.
Rihany, M., Kedad, Z., and Lopes, S. (2018). Keyword
search over RDF graphs using wordnet. In Proceedings
of the 1st International Conference on Big Data and
Cyber-Security Intelligence (BDCSIntell’18), volume
2343, pages 75–82.
Tran, T., Wang, H., Rudolph, S., and Cimiano, P. (2009). Top-
k exploration of query candidates for efficient keyword
search on graph-shaped (RDF) data. In Proceedings of
the 25th International Conference on Data Engineer-
ing (ICDE’09), pages 405–416.
Trivedi, P., Maheshwari, G., Dubey, M., and Lehmann, J.
(2017). LC-QuAD: A Corpus for Complex Question
Answering over Knowledge Graphs. In Proceedings
of the 16th International Semantic Web Conference
(ISWC’17), pages 210–218.
Wen, Y., Jin, Y., and Yuan, X. (2018). KAT: Keywords-to-
SPARQL translation over RDF graphs. In Proceedings
of the 23rd International Conference on Database Sys-
tems for Advanced Applications (DASFAA’18), volume
10827 LNCS, pages 802–810.
Zenz, G., Zhou, X., Minack, E., Siberski, W., and
Nejdl, W. (2009). From keywords to semantic
queries—Incremental query construction on the Se-
mantic Web. Web Semantics: Science, Services and
Agents on the World Wide Web, 7(3):166–176.
Zheng, W., Zou, L., Peng, W., Yan, X., Song, S., and Zhao,
D. (2016). Semantic SPARQL similarity search over
RDF knowledge graphs. In Proceedings of the 42nd
VLDB (VLDB’16), volume 9, pages 840–851.
Zhou, Q., Wang, C., Xiong, M., Wang, H., and Yu, Y. (2007).
SPARK: Adapting keyword query to semantic search.
In Proceedings of the 6th International Semantic Web
Conference (ISWC’07), volume 4825 LNCS, pages
694–707, Busan, Korea.
Automatic Construction of Benchmarks for RDF Keyword Search Systems Evaluation
137