Data Science in Economics Education: Examples and Opportunities

Nina O. Rizun

1 a

, Maryna V. Nehrey

2 b

and Nataliia P. Volkova

3 c

Gda

nsk University of Technology, 11/12 Gabriela Narutowicza, 80-233 Gda

nsk, Poland

National University of Life and Environmental Sciences of Ukraine, 15 Heroyiv Oborony Str., Kyiv, 03041, Ukraine

Alfred Nobel University, 18 Naberezhna Sicheslavska Str., Dnipro, 49000, Ukraine

Keywords:

Data Science, High Education, Economics, Machine Learning, Topic Modelling, Natural Language Process-

ing.

Abstract:

Data science is the ﬁeld of study that involves tools, algorithms, and knowledge of maths and statistics to

discover knowledge from the raw data. Data science is developing fast and penetrating all spheres of life.

More people understand the importance of the science of data and the need for implementation in everyday

life. Data science is used in business for business analytics and production, in sales for offerings and, for

sales forecasting, in marketing for customizing customers, and recommendations on purchasing, digital mar-

keting, in banking and insurance for risk assessment, fraud detection, scoring, and in medicine for disease

forecasting, process automation, and patient health monitoring, in tourism in the ﬁeld of price analysis, ﬂight

safety, opinion mining, etc. This article concerns the issue of data science tools implementation, including

the Text Mining and Natural Language Processing algorithms for increasing the value of Economics Edu-

cation for the development of modern and technologically ﬂexible society. The article deeply discusses the

opportunities of using Text Analytics and Topic modeling for conducting scientiﬁc studies and applying them

in the educational process. Presented examples demonstrate the nature of tasks and approaches which could

develop students’ research skills in the public perception analysis. Such approaches also allow students to gain

practical experience in the study and interpretation of the inﬂuence of additional metadata, characterizing the

comments authors, on differences in their opinions about events, companies, goods, and services. Finally, the

Data science study programs for economics at top-20 universities are selected and discovered.

1 INTRODUCTION

2020 was a turning point for the whole world.

COVID-19 and the resulting pandemic have identiﬁed

weaknesses in society and opened up opportunities

for development in many areas. The education sector

has also felt the signiﬁcant impact of the pandemic:

the digitalization of the educational process, the tran-

sition to online learning and the abolition of educa-

tional activities – all this forces to seek effective so-

lutions and adapt to new conditions. The ﬁeld of eco-

nomics has also undergone signiﬁcant changes, ac-

companied by the digitalization of processes, the tran-

sition to remote work and changes in service and com-

munication with customers (Soloviev et al., 2020b).

The fast-growing world has become even more digi-

tal. Therefore, the skill is becoming increasingly pop-

https://orcid.org/0000-0002-4343-9713

https://orcid.org/0000-0001-9243-1534

https://orcid.org/0000-0003-1258-7251

ular use data correctly, model processes and make de-

cisions using modern methods and technologies.

Data science is a ﬁeld of study that includes tools,

algorithms, and knowledge of mathematics and statis-

tics to identify knowledge from raw data. Data sci-

ence is evolving rapidly and is penetrating all walks of

life. More people understand the importance of data

science and the need to implement it in everyday life.

Data science is used in business for business intelli-

gence and manufacturing, for sales offerings and for

sales forecasting, marketing for customer customiza-

tion and procurement recommendations, digital mar-

keting, banking and insurance for risk assessment,

fraud detection, valuation and in medicine for fore-

casting diseases, process automation and monitoring

of patients’ health, in tourism in the ﬁeld of price anal-

ysis, ﬂight safety, etc. However, the application of

data science in education has been relatively limited,

and many opportunities for advancing industries have

not yet been explored.

550

Rizun, N., Nehrey, M. and Volkova, N.

Data Science in Economics Education: Examples and Opportunities.

DOI: 10.5220/0010926100003364

In Proceedings of the 1st Symposium on Advances in Educational Technology (AET 2020) - Volume 1, pages 550-564

ISBN: 978-989-758-558-6

Data science should be used in education to solve

scientiﬁc problems, for example, in the study of

behaviour in economics, in macro- and microeco-

nomics, marketing, ﬁnance, agriculture, environmen-

tal and ecological economics and so on.

2 LITERATURE REVIEW

Data Science has a big list of tools: Linear Regres-

sion, Logistic Regression, Density Estimation, Con-

ﬁdence Interval, Test of Hypotheses, Pattern Recog-

nition, Clustering, Supervised Learning, Time Se-

ries, Decision Trees, Monte-Carlo Simulation, Naive

Bayes, Principal Component Analysis, Neural Net-

works, k-means, Recommendation Engine, Collabo-

rative Filtering, Association Rules, Scoring Engine,

Segmentation, Predictive Modeling, Graphs, Deep

Learning, Game Theory, Arbitrage, Cross-Validation,

Model Fitting, etc. Some of these tools were used in

the next researches.

Teaching data science, for example, were intro-

duced in (Brunner and Kim, 2016), Big data and

Data Science methods presented in (Chen et al.,

2012; George et al., 2016; Shoro and Soomro, 2015;

Xiong et al., 2017; Cao, 2017; Ignatyuk et al.,

2020), machine learning used in (Parish and Du-

raisamy, 2016; Derbentsev et al., 2020; Guryanova

et al., 2020b; Babenko et al., 2021; Nosratabadi

et al., 2020; Zelinska, 2020), Monte Carlo method

presented in (Balabay and Chernonog, 2007; Patri-

arca et al., 2017), Artiﬁcial Intelligence presented in

(Rizun and Shmelova, 2017). Data Science is fast de-

veloping. A large volume of information that grows

with each passing year makes it possible to build

high-precision models that simplify and partially au-

tomate the decision-making process. Models are be-

ing developed that implement the key data science

algorithms for different areas of economics: ﬁnan-

cial Data Science (Bielinskyi et al., 2021; Brooks

et al., 2019; De Prado, 2018; Danylchuk et al., 2019;

Soloviev and Belinskiy, 2019; Soloviev et al., 2020a;

Guryanova et al., 2020a; Kuzmenko et al., 2020; Kly-

menko et al., 2019), for institutional economics –

(Pr

ufer and Pr

ufer, 2018; Hrabovskyi et al., 2020;

Ilchuk et al., 2019; Oliskevych et al., 2018; Shi et al.,

2020; Matviychuk et al., 2019), for agriculture –

(Kaminskyi et al., 2020; Nehrey et al., 2019; Voro-

nenko et al., 2020), for taxation – (Ausloos et al.,

2017), and labor market – (Oliskevych and Lukia-

nenko, 2019).

Data Science developing for education discussed

in (National Academies of Sciences, Engineering, and

Medicine et al., 2018; Volkova et al., 2019; Perevo-

zova et al., 2020; Dimitrov et al., 2019).

3 DATA SCIENCE: PRINCIPLES

AND TOOLS

Data Science in education is a multidisciplinary ap-

proach to technologies, processes, and systems for ex-

tract knowledge, understanding of data, and supports

decision-making under uncertainty. Data science

deals with mathematics, statistics, statistical model-

ing, signal processing, computer science & program-

ming, database technologies, data modeling, machine

learning, natural language processing, predictive an-

alytics, visualization, etc. Data Science in education

has two aspects of the application: (i) the manage-

ment and processing of data and (ii) analytical meth-

ods for analysis and modeling, and includes nine main

steps (ﬁgure 1). The ﬁrst aspect includes data sys-

tems and their preparation, including databases facili-

ties, data cleansing, engineering, visualization, moni-

toring, and reporting. The second aspect includes data

analytics data mining, machine learning, text analyt-

ics, probability theory, optimization, and visualiza-

tion. The basis of the learning process is the avail-

ability of relevant data that is of sufﬁcient quality, ap-

propriately organized for the task. Primary data often

requires pre-processing. First of all, it is necessary to

investigate the availability of the necessary data and

how they can be obtained. The data search ends with

the creation of a data set in which data coexistence

is to be provided. Data science has a wide range of

tools for data evaluation and preparation, in particu-

lar for data mining, data manipulation (value conver-

sion, data aggregation and reordering, table aggrega-

tion, breakdown or merge of values, etc.) and vali-

dation of data (checking format, ranges of test values

and search in legal values tables). The problem of

missing values is solved by using different analytical

methods: simulation, inserting default values, statis-

tical simulation. Data science provides broad oppor-

tunities for text analytics. In addition, the use of data

science tools facilitates work with big data. The main

approaches in Data Science are Supervised learning

models and Unsupervised learning models.

3.1 Supervised Learning Models

Supervised learning is one of the methods of machine

learning, in which the model learns on the basis of la-

beled data. Using Supervised learning is possible to

decide on two types of tasks: regression and classiﬁ-

cation. The main difference between them is the type

of variance that is predicted by the corresponding al-

Data Science in Economics Education: Examples and Opportunities

551

Figure 1: Data Science process.

gorithm. In regression training, it is a continuous vari-

able, in the classiﬁcation, it is a categorical variable.

To solve these problems, many algorithms have been

developed. One of the most common is a linear and

logistic regression, a decision tree.

Linear regression. Regression analysis can be

considered as the basis of statistical research. This

approach involves a wide range of algorithms for fore-

casting a dependent variable using one or more factors

(independent variables). The advantage of applying

such an approach to modeling is the simplicity and

clarity of the results, the speed of learning, and the

release of the forecast. The disadvantage is not al-

ways sufﬁciently high precision (since in economics

and ﬁnances, the linear relationship between changes

is rare).

Logistic regression is used when it is necessary

to predict the release of a binary variable using a

dataset of continuous or categorical variables. Situa-

tions, where the parent variable has more than 2 possi-

ble values, can be simulated by a one-vs-all approach

when constructing a logistic classiﬁer for a possible

output, or one-vs-one when constructing logistic clas-

siﬁers for each possible combination of categories of

the original variable. The dependence between the

independent and the logarithmic variable in logistic

regression is linear, the only difference with linear re-

gression is sigmoidal functions, which converts a lin-

ear result in the probability of belonging to a class

within [0; 1]. The advantages and disadvantages of

logistic regression are due to the advantages and dis-

advantages of linear regression. This is the speed of

the algorithm and the possible interpretation of the re-

sults, on the one hand, and a little accuracy – on the

other. Logistic regression is often used to construct

vote-counting models. An important factor in this is

the interpretation of its results. The inﬂuence of each

factor is clearly expressed by the magnitude of the co-

efﬁcient b, which allows it to be clearly deﬁned which

of them positively and to what extent inﬂuence the de-

cision.

A decision tree is an approach to both regression

and classiﬁcation. It is widely used in intelligent data

analysis. The decision tree consists of “nodes” and

“branches”. The tree nodes have attributes that are

used to make decisions. In order to make a decision,

it is needed to go down to the bottom of the decision

tree. The sequence of attributes in a tree, as well as the

values that divide the leaves into branches, depends

on such parameters as the amount of information or

entropy that the attribute adds to the prediction vari-

able. The advantages of decision trees are the sim-

plicity of interpretation, greater accuracy in decision-

making simulation compared with regression mod-

els, the simplicity of visualization, natural modeling

of categorical variables (in regression models it is

needed to be coded by artiﬁcial variables). However,

the decision trees have one signiﬁcant drawback – low

predictive accuracy (James et al., 2013).

3.2 Unsupervised Learning

Unsupervised learning describes a more complex sit-

uation in which, for each observation i = 1, ..., n, ob-

servation of the measurement vector x

, but without

any variables in the output y

. In such data, the con-

struction of linear or logistic regression models is im-

possible, since there are no predictive variables. In

such a situation, a so-called “blind” analysis is con-

ducted. Such a task belongs to the class of tasks of

unsupervised learning, due to the absence of an out-

put variable that guided the analysis. Unsupervised

AET 2020 - Symposium on Advances in Educational Technology

552

learning algorithms can be divided into algorithms for

space reduction and clustering algorithms. The main

task of clustering is to ﬁnd patterns in the data that

allow you to divide the data into groups and then in a

certain way analyze them and give them an interpre-

tation.

K-means is one of the most popular clustering

algorithms, whose main task is to divide n observa-

tions into k clusters. The minimum sum of squares is

the distance of each observation to the center of the

corresponding cluster. This algorithm is iterative, at

each step the cluster centers are re-indexed and redis-

tributed observation between them until a stable result

is achieved. The beneﬁts of such an algorithm of clus-

tering are the simplicity, speed, and the ability to pro-

cess large amounts of data. But the user must specify

the number of clusters he wants to use for clustering

before computing; the instability of the result (it de-

pends on the initial separation of points between the

clusters).

Hierarchical clustering is an alternative ap-

proach to clustering, which does not require a prelim-

inary determination of the number of clusters. More-

over, the hierarchical clustering ensures the stability

of the result and gives the output an attractive vi-

sualization based on the tree-like structure of obser-

vations/clusters – dendrogram. This clustering algo-

rithm uses different distance metrics and cluster ag-

glomeration cluster criteria, which makes it very ﬂex-

ible to the data on which clustering is performed.

However, the disadvantage of hierarchical clustering

is the need to calculate the matrices of the distance

between observations before agglomeration, which

complicates the application of this algorithm for large

data and data with many dimensions.

Time series analysis. A time series is built by

observations that have been collected with a ﬁxed in-

terval. It could be daily demand, or monthly proﬁt

growth rates, number of ﬂights, etc. The time se-

ries analysis takes an important part in the analysis

of data that covers the region, from the analysis of ex-

change rates to sales forecasting (Nehrey and Hnot,

2017; Voronenko et al., 2021). One of the tasks of

time series analysis is the allocation of trend and sea-

sonal components and the construction of the fore-

cast. There are many algorithms that have been de-

veloped, and we consider models such as ARIMA and

Prophet.

The ARIMA algorithm is one of the most com-

mon algorithms for forecasting time series. The basic

idea is to use the previous time series values to predict

the future. This can use any number of lags, which

makes such an approach difﬁcult in setting because it

is necessary to select the parameter so as to minimize

the error and not override the model. ARIMA is of-

ten used for short-term forecasting. A disadvantage is

the complexity of learning a model in many seasonal

conditions.

Algorithm Prophet was developed by Facebook

at the beginning of 2017 for forecasting based on time

series (Nehrey and Hnot, 2017). It is based on an ad-

ditive model in which nonlinear trends are of annual

and weekly seasonality. This approach also allows

to model holidays and weekends, thereby allowing to

predict residuals in a time series. Also, the Prophet is

insensitive to missed values, the bias in the trend, and

signiﬁcant residuals, which is an important advantage

over ARIMA. Another advantage is the rather high

speed of training, as well as the ability to use large-

scale time series.

4 TOPIC MODELING IN DATA

SCIENCE

Under the notion of texts mining in natural language

we understand the application of methods of texts

computer analysis and presentation in order to achieve

the quality, which corresponds to the “manual” pro-

cessing for further usage in various tasks and appli-

cations. One of the actual tasks of automatic texts

mining is topic modelling.

4.1 Latent Dirichlet Allocation

Topic modelling is a statistical approach to extract

the hidden semantics that occurs in a collection of

documents or reviews. Latent Dirichlet Allocation

(LDA) model proposed by (Blei et al., 2003) is one

of the most notable approach for unsupervised topic

modeling, which assumes documents and the words

within them are derived from a “generative proba-

bilistic model”. Within the class of unsupervised sta-

tistical topic models, themes are deﬁned as distribu-

tions over a vocabulary of words that represent se-

mantically interpretable “topic” (Roberts et al., 2014).

‘Meaning’ of those topics (usually, in the form of

topic Label and topic Description) is an emergent

quality of the relationship between words (Robinson,

2019; DiMaggio et al., 2013). The task of topic mean-

ing recognizing is often fraught with difﬁculty and re-

quires the application of a triangular approach to its

implementation, namely: (i) a literature review of ex-

isting topics found in the analyzed problem domain;

(ii) independent work of experts on assigning labels

to topics; (iii) conducting joint expert discussions in

order to compare and revise the obtained labelling re-

sults.

Data Science in Economics Education: Examples and Opportunities

553

Figure 2: Latent Dirichlet allocation model (Blei, 2012).

As for main assumption of LDA method, there are

the following (Roberts et al., 2016): (i) document is

represented as a mixture of topics; (ii) each topic are

present in many documents; (iii) each word within a

given document belonging to exactly one topic; (iv)

each document can be represented as a vector of pro-

portions that denote what fraction of the words belong

to each topic.

The basic LDA model is shown in ﬁgure 2.

Figure 2 serves as a visual explanation of the

model and could be described as follows: (i) we have

D documents and K topics; (ii) each topic presented

by β

words distribution over the vocabulary within

the topic k; (iii) each document is presented by θ

topic proportions within the document, where θ

d,k

the topic proportion for topic k in document d. Fi-

nally, we have (iv) for each n

word in the docu-

ment d – topic assignments z

d,n

(depends on the per-

document topic proportions θ

) and (v) for each d

document – observed words w

d,n

which is an element

from the ﬁxed vocabulary (depends on the topic as-

signment z

d,n

and all of the topics β

1:k

) (Blei, 2012).

In is obviously that data scientist in cooperation

with other science domains increasingly seek ways to

apply NLP and especially LDA topic modelling tech-

niques to extract, organize, recognize, label and clas-

sify customers opinions and experiences (Kobayashi

et al., 2018a). Next examples demonstrate the pos-

sibilities to sol the apply LDA topic modelling for

solving: (i) human resources management, (ii) ser-

vice quality assessment, (iii) research & development

policy coordination tasks and (iv) strategic planning

in universities.

Kobayashi et al. (Kobayashi et al., 2018b) used

topic modelling to summarize the worker attributes

and ﬁnd worker attribute constructs and use these to

cluster jobs. 140 main topics were identiﬁed, and

such skills, as, for example, interpersonal communi-

cation (vocabulary of words: communication, written,

oral, verbal, interpersonal, presentation, effective, lis-

tening); analytical and problem-solving (vocabulary

of words: problem, solving, analytical, solver, trou-

bleshooting, approach, abilities, capabilities); data

analytical skills (vocabulary of words: data, Analy-

sis, quantitative, research, statistics, economics, sta-

tistical, modeling); willingness to travel and the abil-

ity to operate on a ﬂexible work schedule (vocabu-

lary of words: travel, willingness, willing, work, time,

needed, internationally, international) and other. As

authors mentioned, topic modelling showed that it is

not only possible to classify job information from va-

cancies but that we can also derive behavioral char-

acteristics that are valued or required by employers

from potential or existing job holders. Moreover, as

a further analysis of this research was planned the

analysing trends of worker attributes required by or-

ganizations (i) over time, (ii) occupations, companies,

and (iii) geographical regions, and also (iv) possibil-

ity to build a network of work activities to examine

relationship among tasks.

Wallace et al. (Wallace et al., 2014), Sharma et al.

(Sharma et al., 2016) captured the main positive and

negative words within latent aspects (topics), which

characterise interpersonal manner, technical compe-

tence, and systems issues (L

opez et al., 2012) from

online physician reviews. Similar with previous work,

James et al. (James et al., 2017) based on L

opez et al.

opez et al., 2012) categorization, examined unstruc-

tured textual feedback of physicians in order to deter-

mine: (i) how the extracted sentiment and topics com-

pared to traditional identiﬁed dimensions of service

quality in healthcare and (ii) what tone and topic el-

ements were driving patients’ service quality ratings.

As a main ﬁnding were the following list of topics

and their tone: (1) Negative system quality: Staff and

Timeliness (vocabulary of words: ofﬁce, staff, time,

doctor, wait, appointment); (2) Positive interpersonal

quality: Physician Compassion (vocabulary of words:

doctor, caring, great, knowledgeable, excellent, rec-

ommend); (3) Negative system quality: Experience

AET 2020 - Symposium on Advances in Educational Technology

554

(vocabulary of words: told, don’t, doctor, ask, bad,

money, call); (4) Positive Technical quality: Family

(vocabulary of words: doctor, questions, staff, prac-

tice, children, son, pregnancy); (5) Positive Techni-

cal quality: Surgery (vocabulary of words: surgery,

pain, procedure, staff, hospital, knee, cancer, age); (6)

Negative Technical quality: Diagnosis (vocabulary of

words: years, treatment, medical, patient, conditions,

test, diagnosis, time, treated). The obtained results al-

lowed the authors to establish the dependence on the

degree of inﬂuence of the identiﬁed aspects (topics)

on the general perception of the physician’s quality,

as well as the behavioural characteristics of patients

when choosing a doctor online, depending on the con-

tent of comments and overall rating.

4.2 Structural Topic Modelling

When conducting research on the basis of textual

documents or customers comments, researchers of-

ten have a more of information “about the text” than

“about the content of the text”. From the perspec-

tive of topic modelling as a statistical approach, the

existence of such information “about the text” (meta-

data) allows and initiates the inclusion in the model of

additional covariates that could inﬂuence the follow-

ing components of the topic model: (1) Proportion

of the document devoted to the topic (”prevalence of

the topic”). For example, we can know that “clients

who buy products online are more likely to talk about

delivery problems than clients who buy ofﬂine”. (2)

Word rates used in the discussing of the topic (”topi-

cal content”). For example, we can clarify that “when

clients talking about delivery problems, clients who

buy products online are more likely discuss the prob-

lems about products returning, but patients clients

who buy ofﬂine are more likely discuss staff rude-

ness issues” (Roberts et al., 2019). Such possibilities

are proposed by Structural topic modelling (STM) as

an extension of the LDA framework (Robinson, 2019;

Roberts et al., 2019, 2013) .

Drawing analogies with LDA: (i) each document

in STM arises as a mixture over K topics; (ii) topic

proportions (θ

) can be correlated (LDA limitation

1); (iii) topics prevalence θ

can be inﬂuenced by set

of covariates X through a standard regression model

with covariates; (ii) for each w

word in the docu-

ment d (iii) a topic Z

d,n

is drawn from the document-

speciﬁc distribution, and (iv) conditional on that

topic, a word is chosen from a multinomial distribu-

tion over words parameterized by β

d,k,v

, where k =

d,n

. This distribution can include a second set of co-

variates Y (Roberts et al., 2019). Thus, the main dif-

ferences between the LSA and STM models (ﬁgure

3) are that the prevalence (content) parameters deter-

mined in the LDA by the general a priori Dirichlet

parameters α(η) in the STM model are replaced with

prior structures speciﬁed in the form of generalized

linear models parameterized by document speciﬁc co-

variates X(Y ) (Hu et al., 2019) These covariates in-

form either the topic prevalence (covariates X) or the

topical content (covariates Y ) latent variables with in-

formation “about the text” (metadata).

5 EXAMPLE OF STRUCTURAL

MODELLING ALGORITHMS

APPLICATION IN EDUCATION

In order to study customer perception of the qual-

ity of services, assess their satisfaction with goods

or services received, as well as identify factors that

inﬂuence customer acceptance of new offers on the

market, students were asked to use STM tools. As

a data source 610 textual comments about hospitals

from the site http://www.ratemyhospital.ie/ (over the

past two years – 2018–2019) were used. STM pack-

age allows to use all additional variables to demon-

strate the power of meta-data for topic modelling.

With this aim, textual comments data was extended

by information about (1) hospital ownership (private,

public), (2) sentiment (positive or negative) (table 1)

(Ojo and Rizun, 2020). After that, all steps of text

pre-processing were performed.

First, the STM model’s setup were performed. To

determine the optimal number of topics, STM models

from 10 till 30 topics were built were analyzed. Se-

mantic coherence is maximized when the most prob-

able words in a given topic frequently co-occur to-

gether, and it is a metric that correlates well with a hu-

man judgment of topic quality. Having high semantic

coherence is relatively easy, though, if we only have a

few topics dominated by very common words, so we

wanted to look at both semantic coherence and exclu-

sivity of words to topics. So, the most valuable num-

ber of topics should be very coherent and also very ex-

clusive. Looking at ﬁgure 4, we draw the conclusion

that the 15 topics suit the most to these criteria. Most

of the topics, in this case, are above the average of

exclusivity and have high coherence, especially com-

pared to the other number of topics which are often

spread out on both axes. 15-topic STM model was se-

lected based on subjectively optimal combination of

the average semantic coherence and exclusivity out-

comes.

As a result, for 15-topic model, we received the (i)

topic-words distribution β; (ii) document-topic pro-

Data Science in Economics Education: Examples and Opportunities

555

Figure 3: A graphical illustration of the structural topic model (Roberts et al., 2016).

Table 1: Comments before pre-processing.

Comments Hospital Ownership Sentiment

A lovely friendly patient-focussed hospital Public Positive

Consultant I found seriously lacking compassion for my mother the

patient. Sniggered while informing us that while my mother’s condi-

tion is uncomfortable, it is not life threatening. To be frank, consul-

tant spoke down to us.

Public Negative

Tullamore is a very clean hospital and looks very well. All staff I

had the pleasure of meeting were lovely and very professional at all

times. The staff in all capacities do not receive enough thanks for the

jobs they do

Private Positive

portions θ; (iii) list of Highest probability-, FREX-

, Lift- and Score-keywords (Highest Prob: are the

words within each topic with the highest probability;

FREX: are the words that are both frequent and ex-

clusive, identifying words that distinguish topics; Lift:

give more weight to words that appear less frequently

in other topics by dividing their frequency into other

topics; Score: score words are weighted by dividing

the log frequency of the word in the topic by the log

frequency in other topics (Roberts et al., 2013; Chang,

2015; Grifﬁths and Steyvers, 2004)); (iv) set of doc-

uments, mostly associated with this topic. The ﬁgure

5 allows us to get information on the share of the dif-

ferent topics at the overall corpus.

Second, students needed to realize the Topics la-

belling step. For that: (1) two students independently

labelled the topics to produce the ﬁrst version of la-

bels based on top weighted keywords; (2) two stu-

dents discussed the labels and resolved discrepancies

in labelling; (3) two students independently reﬁned

topic labels based on the computationally guided deep

reading 20 of the most representative tweets of the

topics; (5) two students agreed on ﬁnal 15 topic labels

and jointly developed the topics descriptions (short

summarization of the topic content) (Ojo and Rizun,

2020). The result of topic labelling is presented in the

table 2.

Third, the STM covariate analysis could be per-

formed. In this stage, we aimed the evaluating the

Sentiment effect on the formation of more positively

and more negatively oriented aspects of hospitals ser-

vice quality (HSQ). Thus, we use Sentiment metadata

as Covariate in the STM model. Formally, we can

identify an aspect as negative if, according to the re-

sults of effect estimation, the proportion of this aspect

in negative comments (Sentiment = Negative) is sig-

niﬁcantly higher than in comments in positive com-

ments (Sentiment = Positive). According to the re-

sults of our experiment, 5 topics (33.33%) are posi-

tive (right side of ﬁgure 6), and 10 topics (66.66%)

are negative (left side of ﬁgure 6).

The dots in the ﬁgure 6 indicated the mean values

of the estimated proportion differences (power of in-

ﬂuence, PI) with 95% conﬁdence intervals, allows us

AET 2020 - Symposium on Advances in Educational Technology

556

Figure 4: Semantic coherence and exclusivity of STM models.

Figure 5: Expected topic proportions over corpus.

to evaluate the relative degree of inﬂuence of senti-

ment on of hospitals service quality aspects. For ex-

ample, the ﬁve most negative Topic of are (1) Infor-

mation Exchange with Patient/Family (Topic 13) with

highest power of negative inﬂuence; (2) Communica-

tion Skills (Topic 2); (3) A&E/Admission (Topic 12),

(4) Waiting Time (Topic 4) and (5) Patient-Focusing

Service (Topic 6). In turn two most positive topics are

(1) Service Rapidness (Topic 14); (2) Personnel Reli-

ability/Treatment (Topic 8). Knowledge about Topics

with a positive and negative impact of comments Sen-

timent allow to indicate the strength of patient satis-

faction/dissatisfaction with the hospitals service qual-

ity.

Fourth, the power of Time inﬂuence on positive

and negative Topics dynamics (from 2018 to 2019)

using the STM model (with Year and Sentiment as

a Covariates) should be performed. In terms of the

Data Science in Economics Education: Examples and Opportunities

557

Table 2: Topics labels.

No Topics label Topic keywords Topic pro-

portion, %

1 Appointment Time Reliability time, service, wait, appoint, nurses, clinic, profession 4.47

2 Communication Skills nurses, rude, hospital, patient, found, staff, ward 6.34

3 Service Standards hospital, consult, year, many, staff, standard, old 9.45

4 Waiting Time staff, hospital, member, given, sever, hour, time 3.03

5 Staff Feedback/Explanation staff, kind, time, patient, depart, great, explain 8.09

6 Patient-Focusing Service ask, hospital, doctor, day, told, week, care 2.56

7 Maternity Unit/Care baby, doctor, midwife, time, inform, midwife, week 2.89

8 Personnel Reliability / Treatment scare, staff, receive, excel, thank, ward, treatment 11.81

9 Food Service hospital, staff, need, food, poor, good, doctor 8.10

10 Hospital Environment hospital, mother, conditions, room, week, inform, doctor 4.48

11 Care and Recovery nursed, care, good, great, love, doctor, patient 9.29

12 A&E/Admission pain, hospital, appoint, staff, still, patient, never 5.37

13 Information Exchange with Pa-

tient/Family

hour, doctor, wait, told, seen, blood, home 9.99

14 Service Rapidness hospital, staff, well, profession, attend, efﬁciency, visit 8.31

15 Ward/Hospital’s Facilities patient, staff, trolley, corridor, time, ward, hospital 5.82

Figure 6: Difference in the power of Sentiment inﬂuence on topic proportion.

Inﬂuence of the Time Factor on the Service Qual-

ity, the following four groups of HSQ Topics can be

distinguished: (1) Topics causing the growth of pa-

tient satisfaction with the Service Quality over the

time: positive topics with a positive dynamic over the

time; (2) Topics causing a recession in patient satis-

faction with the hospitals service quality (HSQ) over

the time: positive topics with a negative dynamic over

the time; (3) Topics causing the growth of patient dis-

satisfaction with the HSQ over the time: negative top-

ics with a positive dynamic over the time (4) Topics

causing a recession in patient dissatisfaction with the

HSQ over the time: negative topics with a negative

dynamic over the time.

As an indicator that allows us to identify the di-

rection and growth rate (GR) of change in the level of

positive or negative comments describing the Topic,

the slope of the regression (dependence between the

proportion of Positive/Negative Aspects and Time)

will be used. The presented four charts (ﬁgure 7 a,

b, c, d) show examples of four possible types of Inﬂu-

ence of the Time Factor on the Service Quality:

1. Positive impact on Service Quality over the time:

Service Rapidness topic characterized by growth

rate (GR=1.100763) of patient satisfaction with

the HSQ over the time (ﬁgure 7, b);

2. Worsening of Service Quality over the time: Per-

sonnel Reliability/Treatment topic characterized

by and recession (GR=0.821713) in patient sat-

isfaction with the HSQ over the time (ﬁgure 7, a);

3. Negative impact on Service Quality over the time:

AET 2020 - Symposium on Advances in Educational Technology

558

Figure 7: Examples of identiﬁcation the inﬂuence of the Years Metadata.

Information Exchange with Patient/Family topic

characterized by growth (GR= 1.758421) of pa-

tient dissatisfaction with the HSQ over the time

(ﬁgure 7, d);

4. Improvement of Service Quality over the time:

Food Service topic causing a recession in cus-

tomer dissatisfaction (GR= 0.575861) with the

HSQ over the time (ﬁgure 7, c).

As a result, student could see that the largest num-

ber of aspects (37.5%) has a negative impact on the

HSQ. The highest degree of growth in patient dissat-

isfaction is characterized by A$E/Waiting Time topic.

Moreover, this growth rate is not only the largest in

the category of Negative impact, but in all analyzed

topics. The most rapid (within the whole set of top-

ics) decrease in the number of positive comments is

characterized by the aspect of Maternity Unit/Care.

The group of topics on which improvement in their

quality is noted is 25.1%. At the same time, the Hos-

pital Environment is characterized by the highest rate

of improvement. 16.7% of topics have a positive ef-

fect on the HSQ, among which Service Rapidness and

Maternity Unit/Treatment have the largest increase in

the number of positive comments.

Fifth, students may identify the inﬂuencing the

Hospital Ownership on more positively and more

Data Science in Economics Education: Examples and Opportunities

559

Figure 8: Difference in the power of Hospital Ownership inﬂuence on Topic Proportion.

negatively oriented HSQ aspects structure (using the

Sentiment and Hospital Ownership factors as in the

Covariates STM model). For this purpose, the fol-

lowing interpretation of the results could be proposed:

(1) the Topics, more related to Public Hospital Own-

ership according to the results of effect estimation,

in which the proportion of this Topics in comments

about Public hospitals (Hospital Ownership = Public)

is signiﬁcantly higher than in comments about Private

hospitals and vice versa; (2) the direction (positive or

negative) of Hospital Ownership inﬂuencing on HSQ.

For reaching the ﬁrst purpose, the Hospital Owner-

ship effect estimation was performed for revealing

the aspects in which the proportion of the comments

about Public hospitals (Hospital Ownership = Public)

is signiﬁcantly higher than comments about Private

hospitals and vice versa.

For formalization the rules for second purpose

reaching, in terms of discovering the Inﬂuence of

the Hospital Ownership on the Service Quality, the

following groups of aspects proposed to be distin-

guished: (1) Topics causing the growth the level of pa-

tients satisfaction with Service Quality in Public hos-

pitals: positive topics with a positive dynamic from

Private to Public; (2) Topics causing the growth in the

level of patients satisfaction with Service Quality in

Private hospitals: positive topics with a positive dy-

namic from Public to Private; (3) Topics causing the

growth the level of patients dissatisfaction with Ser-

vice Quality in Public hospitals: negative topics with

a positive dynamic from Private to Public; (4) Topics

causing the growth in the level of patients dissatisfac-

tion with Service Quality in Private hospitals: neg-

ative topics with a positive dynamic from Public to

Private.

According to the results of our experiment, 8 Top-

ics are more associated with Public Hospitals (right

side of ﬁgure 8), and 6 Topics are more associated

with Private Hospitals (left side of ﬁgure 8), and one

topic (Topic 13) is for both types of hospitals. Based

on received results, we can conclude that the four top-

ics (one positive and 3 negative), which more char-

acterize the Public Hospital Ownership are (1) Ser-

vice Rapidness (positive); (2) Food Service (negative)

(3) Maternity Unit/Care (negative) and (4) Patient-

Focusing Service (negative). In turn ﬁve Aspects,

which more characterize the Private Hospital Owner-

ship (two positive and two negative) are (1) Appoint-

ment Time Reliability (negative); (2)Service Stan-

dards (positive); (3) Staff Feedback/Explanation (pos-

itive) and (4) Hospital Environment (negative).

Thus, this example of the use of STM modeling

in teaching students shows how versatile and in-depth

research can be carried out using data science. Pre-

sented examples demonstrate the nature of tasks and

approaches which could develop students’ technical

and research skills in the public perception analysis.

Such approaches also allow students to gain practical

experience in the study and interpretation the inﬂu-

ence of additional metadata, characterizing the com-

ments authors, on differences in their opinions about

events, companies, goods, and services.

6 DATA SCIENCE STUDY

PROGRAMS IN ECONOMICS

FIELD

Classical methods of statistical analysis, modeling

methods, and data mining are used in economics. The

analysis of data in these areas is aimed at the study of

AET 2020 - Symposium on Advances in Educational Technology

560

Table 3: Data Science courses and programs for economics at top-20 universities.

University Location Programs, courses

Massachusetts Institute United States MicroMasters Program in Data, Economics,

of Technology (MIT) and Development; Policy Computer Science,

Economics and Data Science – course

Stanford University United States M.S. in Statistics: Data Science; Tackling Big

Questions Using Social Data Science – course

Harvard University United States Data Science for Business – course;

Using Big Data Solve Economic

and Social Problems – course

California Institute United States Business Analytics – course

of Technology

University of Oxford United Kingdom MSc in Social Data Science

ETH Zurich - Swiss Federal Switzerland Data Science in Techno-Socio-Economic

Institute of Technology Systems – course

University of Cambridge United Kingdom Economics: Data Science and Policy – course

Imperial College London United Kingdom MSc Business Analytics

University of Chicago United States Economic Policy Analysis – course

UCL United Kingdom Economics and Statistics BSc;

Social Sciences with Data Science BSc

National University Singapore Master of Science in Business Analytics

of Singapore

Princeton University United States Statistics and Machine Learning – course

Nanyang Technological Singapore Master of Science in Analytics

University

EPFL Switzerland Master’s program in Data science

Tsinghua University China (Mainland) Master’s Program in Data Science

University of Pennsylvania United States Master of Information Systems Management,

Business Intelligence and Data Analytics;

MS in Information Technology,

Business Intelligence and Data Analytics;

Online Master of Science in Business Analytics

Yale University United States Applied Econometrics: Politics, Sports,

Microeconomics; Applied Econometrics:

Macroeconomic and Finance Forecasting

Cornell University United States Introduction to Data Science – course

Columbia University United States Data Science for Social Good –

summer program

The University of Edinburgh United Kingdom Statistics with Data Science MSc

causation. In economics, current issues include policy

development, determining the impact of a decision,

long-term and short-term planning and forecasting,

choosing the best solution from many possible, and

many others. Drawing conclusions is also important

in economics. In addition, the modern economy and

ﬁnance are characterized using big data, so it is not al-

ways possible to use classical methods. Therefore, the

methods of data science are precisely those methods

that should be used in economics, which gives pos-

itive results and effect. Data Science methods were

ﬁrst used in economic research and gradually pene-

trated into practice. Today, economics need special-

ists who have knowledge in these areas and are able

to apply Data Science methods. In response to this

market need, universities have begun to implement

Data Science courses and programs for students of

economics. The table 3 presents the courses and pro-

grams of the top 20 universities in the world.

A study programs in economic ﬁeld in Ukrainian

universities has shown that Data Science courses and

programs are still being introduced in Ukraine. Cur-

rently, there are separate programs for studying Data

Science, mainly for computer science. Therefore, we

believe that the prospects that Data Science opens

for modern economists necessitate the introduction of

Data Science in Economics Education: Examples and Opportunities

561

courses and programs in Data Science.

7 CONCLUSIONS

Data science annually extends to more and more ar-

eas is used in various areas of research, in society,

and in business processes. Businesses and govern-

ments are making huge investments in this area. Ed-

ucation, accordingly, must keep pace with the times

and teach students new modern technologies. How-

ever, as research has shown, the process of studying

Data science for economists is at an early stage. Lead-

ing universities are gradually introducing new courses

and programs to study Data science in economics, but

this phenomenon has not yet become widespread and

needs to be developed.

As an example of the implementation of Data

science methods, we have shown the use of STM-

modeling in teaching students. The application of

such approaches promotes the development of tech-

nology and research skills of students, demonstrates

work with big data, and allows to gain experience

in studying and interpreting the inﬂuence of addi-

tional metadata characterizing the authors’ comments

on differences in their opinions about events, compa-

nies, goods, and services.

The described methods and algorithms are just

some of the basics of modeling and analysis of eco-

nomic processes. There are many examples of how

all these methods can be used in education. For ex-

ample, using time series analysis, we could predict

the future value of a cryptocurrency, using regression

models, we could determine customer loyalty or the

likelihood of customer insolvency, and so on. Today,

there are many more algorithms that can be applied in

economics.

Education must meet the modern development of

the digital economy, digital society, innovation, and

creative entrepreneurship. The use of Data science in

education should be multi-platform, ie used not only

in the study of the subject but in the teaching of all

subjects, interaction of students with each other and

with teachers, real experts, research, and individual

learning.

REFERENCES

Ausloos, M., Cerqueti, R., and Mir, T. A. (2017). Data

science for assessing possible tax income manipula-

tion: The case of Italy. Chaos, Solitons & Fractals,

104:238–256.

Babenko, V., Panchyshyn, A., Zomchak, L., Nehrey, M.,

Artym-Drohomyretska, Z., and Lahotskyi, T. (2021).

Classical machine learning methods in economics re-

search: Macro and micro level examples. WSEAS

Transactions on Business and Economics, (18):209–

217.

Balabay, R. and Chernonog, E. (2007). Alteration on the

surface of the pore walls of the porous silicon under

high temperature ageing: Computer simulation. Radi-

ation Measurements, 42(4-5):739–741.

Bielinskyi, A., Khvostina, I., Mamanazarov, A., Matviy-

chuk, A., Semerikov, S., Serdyuk, O., Solovieva, V.,

and Soloviev, V. (2021). Predictors of oil shocks.

Econophysical approach in environmental science.

IOP Conference Series: Earth and Environmental Sci-

ence, 628(1):012019.

Blei, D. M. (2012). Probabilistic topic models. Communi-

cations of the ACM, 55(4):77–84.

Blei, D. M., Ng, A. Y., and Jordan, M. I. (2003). La-

tent dirichlet allocation. Journal of Machine Learning

Research, 3:993–1022. https://www.jmlr.org/papers/

volume3/blei03a/blei03a.pdf.

Brooks, C., Hoepner, A. G. F., McMillan, D., Vivian, A.,

and Simen, C. W. (2019). Financial data science:

the birth of a new ﬁnancial research paradigm com-

plementing econometrics? The European Journal of

Finance, 25(17):1627–1636.

Brunner, R. J. and Kim, E. J. (2016). Teaching data science.

Procedia Computer Science, 80:1947–1956.

Cao, L. (2017). Data science: a comprehensive overview.

ACM Computing Surveys, 50(3):1–42.

Chang, J. (2015). lda: Collapsed Gibbs sampling methods

for topic models. https://rdrr.io/cran/lda/.

Chen, H., Chiang, R. H. L., and Storey, V. C. (2012).

Business intelligence and analytics: From big data

to big impact. MIS Quarterly, 36(4):1165–1188.

http://www.jstor.org/stable/41703503.

Danylchuk, H., Chebanova, N., Reznik, N., and Vitkovskyi,

Y. (2019). Modeling of investment attractiveness

of countries using entropy analysis of regional stock

markets. Global Journal of Environmental Science

and Management, 5(Special Issue):227–235. https:

//www.gjesm.net/article 35558.html.

De Prado, M. L. (2018). Advances in ﬁnancial machine

learning. John Wiley & Sons.

Derbentsev, V., Matviychuk, A., Datsenko, N., Bezko-

rovainyi, V., and Azaryan, A. (2020). Machine learn-

ing approaches for ﬁnancial time series forecasting.

CEUR Workshop Proceedings, 2713:434–450.

DiMaggio, P., Nag, M., and Blei, D. (2013). Exploiting

afﬁnities between topic modeling and the sociolog-

ical perspective on culture: Application to newspa-

per coverage of US government arts funding. Poetics,

41(6):570–606.

Dimitrov, I., Davydenko, N., Lotko, A., and Dimitrova, A.

(2019). Comparative study of main determinants of

entrepreneurship intentions of business students. In

2019 International Conference on Creative Business

for Smart and Sustainable Growth (CREBUS), pages

1–4. IEEE.

AET 2020 - Symposium on Advances in Educational Technology

562

George, G., Osinga, E. C., Lavie, D., and Scott, B. A.

(2016). Big data and data science methods for man-

agement research. The Academy of Management Jour-

nal, 59(5):1493–1507.

Grifﬁths, T. L. and Steyvers, M. (2004). Finding scientiﬁc

topics. Proceedings of the National Academy of Sci-

ences, 101(suppl 1):5228–5235.

Guryanova, L., Bogachkova, L., Zyma, O., Novosel, M.,

Poluektova, N., and Gvozdytskyi, V. (2020a). Mod-

els of estimation and analysis of a systemic risk in

the banking sector. In 2020 IEEE 2nd International

Conference on System Analysis Intelligent Computing

(SAIC), pages 1–6.

Guryanova, L., Yatsenko, R., Dubrovina, N., and Babenko,

V. (2020b). Machine learning methods and models,

predictive analytics and applications. CEUR Work-

shop Proceedings, 2649:1–5.

Hrabovskyi, Y., Babenko, V., Al’Boschiy, O., and Gerasi-

menko, V. (2020). Development of a Technology for

Automation of Work with Sources of Information on

the Internet. WSEAS Transactions on Business and

Economics, 17(25):231–240.

Hu, N., Zhang, T., Gao, B., and Bose, I. (2019). What

do hotel customers complain about? text analysis

using structural topic model. Tourism Management,

72:417–426.

Ignatyuk, A., Liubkina, O., Murovana, T., and Magome-

dova, A. (2020). FinTech as an innovation challenge:

From big data to sustainable development. E3S Web

of Conferences, 166:13027.

Ilchuk, M., Davydenko, N., and Nehoda, Y. (2019). Sce-

nario modeling of ﬁnancial resources at the enterprise.

Intellectual Economics, 13(2).

James, G., Witten, D., Hastie, T., and Tibshirani, R. (2013).

An introduction to statistical learning: with Applica-

tions in R, volume 112 of Springer Texts in Statistics.

Springer, New York.

James, T. L., Calderon, E. D. V., and Cook, D. F. (2017).

Exploring patient perceptions of healthcare service

quality through analysis of unstructured feedback. Ex-

pert Systems with Applications, 71:479–492.

Kaminskyi, A., Nehrey, M., and Rizun, N. (2020). The im-

pact of COVID-induced shock on the risk-return cor-

respondence of agricultural ETFs. CEUR Workshop

Proceedings, 2713:204–218.

Klymenko, N., Nosovets, O., Sokolenko, L., Hryshchenko,

O., and Pisochenko, T. (2019). Off-balance

accounting in the modern information sys-

tem of an enterprise. Academy of Account-

ing and Financial Studies Journal, 23(2).

https://www.abacademies.org/articles/offbalance-

accounting-in-the-modern-information-system-of-an-

enterprise-8403.html.

Kobayashi, V. B., Mol, S. T., Berkers, H. A., Kismihok, G.,

and Den Hartog, D. N. (2018a). Text classiﬁcation for

organizational researchers: A tutorial. Organizational

research methods, 21(3):766–799.

Kobayashi, V. B., Mol, S. T., Berkers, H. A., Kismih

ok, G.,

and Den Hartog, D. N. (2018b). Text mining in orga-

nizational research. Organizational research methods,

21(3):733–765.

Kuzmenko, O. V., Lieonov, S. V., and Boiko, A. O.

(2020). Data mining and bifurcation analysis of the

risk of money laundering with the involvement of ﬁ-

nancial institutions. Journal of International Studies,

13(3). https://www.jois.eu/ﬁles/22 871 Kuzmenko%

20et%20al.pdf.

opez, A., Detz, A., Ratanawongsa, N., and Sarkar, U.

(2012). What patients say about their doctors online:

a qualitative content analysis. Journal of general in-

ternal medicine, 27(6):685–692.

Matviychuk, A., Strelchenko, I., Vashchaiev, S., and Ve-

lykoivanenko, H. (2019). Simulation of the crisis con-

tagion process between countries with different lev-

els of socio-economic development. CEUR Workshop

Proceedings, 2393:485–496.

National Academies of Sciences, Engineering, and

Medicine, Division on Engineering and Physical Sci-

ences, Division of Behavioral and Social Sciences and

Education, Computer Science and Telecommunica-

tions Board, Board on Mathematical Sciences and An-

alytics, Committee on Applied and Theoretical Statis-

tics, Board on Science Education, and Committee on

Envisioning the Data Science Discipline: The Under-

graduate Perspective (2018). Data science for under-

graduates: Opportunities and options. The National

Academies Press, Washington, DC.

Nehrey, M. and Hnot, T. (2017). Using recommendation

approaches for ratings matrixes in online marketing.

Studia Ekonomiczne, (342):115–130.

Nehrey, M., Kaminskyi, A., and Komar, M. (2019). Agro-

economic models: a review and directions for re-

search. Periodicals of Engineering and Natural Sci-

ences, 7(2):702–711. http://pen.ius.edu.ba/index.php/

pen/article/view/579.

Nosratabadi, S., Mosavi, A., Duan, P., Ghamisi, P., Filip,

F., Band, S. S., Reuter, U., Gama, J., and Gandomi,

A. H. (2020). Data science in economics: comprehen-

sive review of advanced machine learning and deep

learning methods. Mathematics, 8(10):1799.

Ojo, A. and Rizun, N. (2020). Structural and temporal topic

models of feedbacks on service quality–a path to the-

ory development? In Americas Conference on In-

formation Systems (AMCIS 2020). Healthcare Infor-

matics & Health Information Tech (SIGHealth), vol-

ume 15.

Oliskevych, M., Beregova, G., and Tokarchuk, V.

(2018). Fuel consumption in Ukraine: Evi-

dence from vector error correction model. Inter-

national Journal of Energy Economics and Policy,

8(5). https://www.econjournals.com/index.php/ijeep/

article/view/6825/3925.

Oliskevych, M. and Lukianenko, I. (2019). Labor force

participation in Eastern European countries: nonlinear

modeling. Journal of Economic Studies, 46(6):1258–

1279.

Parish, E. J. and Duraisamy, K. (2016). A paradigm for

data-driven predictive modeling using ﬁeld inversion

and machine learning. Journal of Computational

Physics, 305:758–774.

Data Science in Economics Education: Examples and Opportunities

563

Patriarca, R., Di Gravio, G., and Costantino, F. (2017). A

Monte Carlo evolution of the Functional Resonance

Analysis Method (FRAM) to assess performance vari-

ability in complex systems. Safety science, 91:49–60.

Perevozova, I., Babenko, V., Krykhovetska, Z., and Popa-

dynets, I. (2020). Holistic approach based assessment

of social efﬁciency of research conducted by higher

educational establishments. E3S Web of Conferences,

166:13022.

ufer, J. and Pr

ufer, P. (2018). Data science for institutional

and organizational economics. Technical report.

Rizun, N. and Shmelova, T. (2017). Decision-making mod-

els of the human-operator as an element of the socio-

technical systems. In Strategic Imperatives and Core

Competencies in the Era of Robotics and Artiﬁcial In-

telligence, pages 167–204. IGI Global.

Roberts, M. E., Stewart, B. M., and Airoldi, E. M. (2016).

A model of text for experimentation in the social sci-

ences. Journal of the American Statistical Associa-

tion, 111(515):988–1003.

Roberts, M. E., Stewart, B. M., and Tingley, D. (2019). Stm:

An R package for structural topic models. Journal of

Statistical Software, 91(1):1–40.

Roberts, M. E., Stewart, B. M., Tingley, D., Airoldi, E. M.,

et al. (2013). The structural topic model and applied

social science. In Advances in neural information pro-

cessing systems workshop on topic models: compu-

tation, application, and evaluation, volume 4, pages

1–20. Harrahs and Harveys, Lake Tahoe.

Roberts, M. E., Stewart, B. M., Tingley, D., Lucas, C.,

Leder-Luis, J., Albertson, B., Gadarian, S., and Rand,

D. (2014). Topic models for open ended survey re-

sponses with applications to experiments. American

Journal of Political Science, 58:1064–82.

Robinson, S. D. (2019). Temporal topic modeling applied

to aviation safety reports: A subject matter expert re-

view. Safety science, 116:275–286.

Sharma, R. D., Tripathi, S., Sahu, S. K., Mittal, S., and

Anand, A. (2016). Predicting online doctor ratings

from user reviews using convolutional neural net-

works. International Journal of Machine Learning

and Computing, 6(2):149.

Shi, Y., Zhu, J., and Charles, V. (2020). Data science and

productivity: A bibliometric review of data science

applications and approaches in productivity evalua-

tions. Journal of the Operational Research Society,

72(5):975–988.

Shoro, A. G. and Soomro, T. R. (2015). Big data analysis:

Apache spark perspective. Global Journal of Com-

puter Science and Technology: C Software & Data

Engineering, 15(1):7–14.

Soloviev, V. and Belinskiy, A. (2019). Complex systems

theory and crashes of cryptocurrency market. Com-

munications in Computer and Information Science,

1007:276–297.

Soloviev, V., Serdiuk, O., Semerikov, S., and Kiv, A.

(2020a). Recurrence plot-based analysis of ﬁnancial-

economic crashes. CEUR Workshop Proceedings,

2713:21–40.

Soloviev, V. N., Bielinskyi, A. O., and Kharadzjan, N. A.

(2020b). Coverage of the coronavirus pandemic

through entropy measures. CEUR Workshop Proceed-

ings, 2832:24–42.

Volkova, N., Rizun, N., and Nehrey, M. (2019). Data sci-

ence: Opportunities to transform education. CEUR

Workshop Proceedings, 2433:48–73.

Voronenko, I., Nehrey, M., Kostenko, S., Lashchyk, I., and

Niziaieva, V. (2021). Advertising strategy manage-

ment in Internet marketing. Journal of Information

Technology Management, 13(Special Issue):35–47.

Voronenko, I., Skrypnyk, A., Klymenko, N., Zherlitsyn,

D., and Starychenko, Y. (2020). Food security risk

in Ukraine: assessment and forecast. Agricultural

and Resource Economics: International Scientiﬁc E-

Journal, 6(4):63–75.

Wallace, B. C., Paul, M. J., Sarkar, U., Trikalinos, T. A., and

Dredze, M. (2014). A large-scale quantitative analy-

sis of latent factors and sentiment in online doctor re-

views. Journal of the American Medical Informatics

Association, 21(6):1098–1103.

Xiong, J., Yu, G., and Zhang, X. (2017). Research on gov-

ernance structure of big data of civil aviation. Journal

of Computer and Communications, 5(5):112–118.

Zelinska, S. (2020). Machine learning: Technologies and

potential application at mining companies. E3S Web

of Conferences, 166:03007.

AET 2020 - Symposium on Advances in Educational Technology

564