The methods of unsupervised machine learning 
allow to avoid dependence on training data. For their 
work, one also needs a Corpus of documents, but 
preliminary markup is not required. Within the 
framework of this approach, the probabilistic-
statistical regularities of the text are found and, on 
their basis, the key subtasks of the aspect-emotional 
analysis are solved: identification of aspect terms and 
determination of their tonality. However, such 
methods require complex tuning to a given domain. 
For example, the method based on Latent Dirichlet 
Allocation (LDA) in its original form is not able to 
effectively detect topics, therefore, its additional 
adaptation and adjustment of correspondence of 
identified topics to the target set of contexts is 
required (Titov, 2008).  
The methods of Text Classification, considered 
above, requires the presence of Sentiment Dictionary 
of text tonality evaluation. There are three basic 
approaches to such Dictionary (Liu, 2012): expert; 
based on dictionaries / thesaurus; and on the basis of 
text collections.  
With the expert approach, the dictionary is 
compiled by experts. The approach differs, on the one 
hand, by complexity and high probability of the 
absence of domain-specific words in the dictionary, 
on the other – by high quality of the dictionary in 
sense of adequacy of the assigned key.  
In the dictionaries / thesaurus approach, the initial 
small list of evaluation words is expanded by various 
dictionaries, for example, explanatory or synonyms / 
antonyms. This also does not take into account the 
subject area. 
In the approach based on text collections, 
statistical analysis of the marked texts, as a rule, 
belonging to the subject domain in question, is used 
to compile the Dictionary. 
In (Klekovkina and Kotelnikov, 2012), the 
dictionary of emotional vocabulary, compiled by 
experts manually, was used to determine the tone of 
individual words. In the dictionary, each word and 
phrase are associated with orientation of the key 
(positive / negative) and with strength (in points).  
The author's methods proposed in (Taboada et al., 
2011; Boiy, 2007) are based on a dictionary approach: 
to determine the tonality of texts, a dictionary of 
estimated words is used, where each word has a 
numerical weight that determines the degree of word 
significance. In the method of working with the 
dictionary closest to the paper (Boucher and Osgood, 
1969), however: the dictionary firstly is created on the 
basis of a statistical analysis of training collection; 
secondly, the weight of words is determined with the 
help of a genetic algorithm. 
In most studies, tone of the text is determined on 
the basis of calculation of weights of the appraisal 
words included in it: 
=
=
C
N
i
i
С
T
wW
1
 
(1)
where 
С
T
W
 – weight of text T for tonality C; w
i
 – 
weight of the evaluated word i; 
C
N
 – number of 
estimated bigrams of tonality C in the text T. 
To classify texts according to the linear function: 
()
neg
Tneg
pos
T
neg
T
pos
T
WkWWWf •+=,
 
(2)
where 
pos
T
W
is the positive weight of the text T;  
neg
T
W
 is the negative weight of the text T; 
neg
k
– 
coefficient, compensating the fact of preponderance 
of positive vocabulary in text (Pang, 2008). If the 
value of the function f is greater than zero, the text is 
positive, otherwise – negative. 
3 METHODOLOGY OF 
WEBSITES CONTENT 
SENTIMENT CLASSIFICATION  
The objective of this research is testing and evaluation 
of Text Classification Methodology grounded on the 
Manually Created Corpora-based Sentiment 
Dictionary (SC- methodology). 
The developed Methodology assumes realization 
of three main practical stages: 
1) Manual Creation of Corpora-based Sentiment 
Dictionary (CBSD). 
2) Carrying out Texts Classification based on 
created CBSD. 
3) Evaluation of the adequacy of Texts 
Classification results. 
As a case study for testing the basic workability 
and proposed Methodology quality the
  Polish-
language Film Reviews Corpora will be used. 
3.1  Novelty and Motivation  
In this paper the following scientific research 
questions (RQ) were raised: 
RQ_1: Does the structure of the Sentiment 
Dictionary influence the quality of classification? 
RQ_2: Does the writing style of the analyzed text 
influence the quality of Classification?