
by examples can be found in (Kozmina and 
Solodovnikova, 2011). 
In practice, there are two types of similarity 
coefficient calculated: fact-based (i.e., value of 
hierarchical similarity is calculated for each report 
for measures, fact tables and schemas) and 
dimension-based (i.e., for attributes, hierarchies, 
dimensions and schemas). It has been decided to 
distinguish two types of similarity coefficients due 
to the well-known characteristics of the data stored 
in data warehouses, i.e., quantifying (measures) and 
qualifying (attributes). However, the essence of any 
data warehouse is in facts, while the describing 
attributes give the auxiliary information. Thereby, 
the recommendations are filtered (i) firstly, by the 
value of the fact-based similarity coefficient, (ii) 
secondly, by the one of dimension-based similarity 
coefficient, and (iii) finally, by aggregate function 
DOI. 
Recommendations generated in Activity mode 
for one of the reports – Total monthly students’ 
grade count by course (i.e, Kopējais vērtējumu 
skaits mēnesī pa kursiem) – are presented in Figure 
2. The usage scenario includes 10 recommendations 
sorted in descending order, first, by the fact-based 
similarity coefficient value, then, by the dimension-
based similarity coefficient, and finally, by 
aggregate function DOI. As fact-based and 
dimension-based similarity coefficient values may 
highly differ, they are both shown to the user, for 
instance, to make him/her aware of high extent of 
dimension-based similarity even if the fact-based 
similarity is average (e.g., reports #1: Monthly 
distribution of students’ grade types by course, #3: 
Total monthly grade count by course and by 
professor, #4: Total monthly students’ final grade 
count by course, #5: Total monthly students’ interim 
grade count by course, and #6: Total monthly 
students’ grade count by course) or low (e.g., 
reports  #7: Gradebook usage by course, #9: 
Students’ tasks by course, and #10: Total monthly 
students’ task count  by course and by professor). 
The rest of the examples are with average fact-based 
similarity and low dimension-based similarity 
(report #2: Monthly distribution of students’ grade 
types by study program) and low values of both 
fact-based and dimension-based similarities (report 
#8: Gradebook usage by course category).  
In its turn, aggregate function DOI coefficient is 
hidden from the user as it is considered to be less 
informative but helpful in sorting in case when two 
or more reports have same fact-based and 
dimension-based similarity coefficient values, e.g., 
reports #4–#6 have equal fact-based and dimension-
based similarity values (respectively, 0.512; 0.679). 
Such coefficient values illustrate that all three 
reports consist of logical metadata with similar total 
DOI value, whereas restrictions on data in these 
reports may vary. 
The cold-start method is composed of two steps: 
(i) performance of structural analysis of existing 
reports, and (ii) revealing likeliness between pairs of 
reports. To be more precise, a pair of reports 
consists of the report executed by the user at the 
moment, and any other report which the user has a 
right to access.  
Here the report structure means all elements of 
the data warehouse schema (e.g., attribute, measure, 
fact table, dimension, hierarchy), schema itself, and 
acceptable aggregate functions, which are related to 
items of some report. In terms of structural analysis, 
each report is represented as a Report Structure 
Vector (RSV). In its turn, each coordinate of the 
RSV is a binary value that indicates presence (1) or 
absence (0) of the instance of the report structure 
element. For example, in a RSV of a report Total 
monthly grade count by course and by professor the 
only element instances that are marked with 1 are: 
attributes  Month,  Course, and Professor, measure 
Grade count, dimensions Time, Course, and
 Person, 
fact table Students’ grades, schema Gradebook, and   
aggregate function SUM.  All the rest element 
instances are marked with 0. Note that all report 
structure elements are ordered the same way in all 
reports. In case if any kind of change occurs, for 
instance, a report is altered or a new report is 
created, RSV of each report should be created all 
over again. 
To reveal likeliness between pairs of reports by 
calculating the similarity coefficient, it is offered to 
make use of Cosine/Vector similarity. It was 
introduced by (Salton & McGill, 1983) in the field 
of information retrieval to calculate similarity 
between a pair of documents by interpreting each 
document as a vector of term frequency values. 
Later it was adopted by (Breese et al., 1998) in 
collaborative filtering with users instead of 
documents, and items’ user rating values instead of 
term frequency values.  
In recommender systems literature 
Cosine/Vector similarity is extensively used 
(Vozalis and Margaritis, 2004); (Rashid et al., 
2005); (Adomavicius et al., 2011), etc. to compute a 
similarity coefficient for a pair of users in 
collaborative filtering, or items in content-based 
filtering. So, Cosine/Vector similarity of a pair of 
AddingRecommendationstoOLAPReportingTool
173