Tatyana Yudina
Moscow State University Research Computing Center, Leninskie Gory, Moscow, 119992, Russian Federation
Anna Bogomolova
Moscow State University Economic Faculty, Moscow, 119992, Russian Federation
Keywords: National information infrastructure, information system, content-based search engine, informers, value-
added services, relational database, system analysis, social research, knowledge products, bilingual search.
Abstract: Described in the paper is the Moscow State University-based Information System RUSSIA
( – a multifunctional resource that have been started in 2000 as an electronic library for
economic and social investigations and innovative education programs. NLP-technologies implemented
under the project provide for content-based cross-collections search and query refinement. Research-
assisting services are complimented. Accomplished under the project is a statistical information system that
integrates main RF state agencies data collections and makes data analysis available at federal, regional and
local levels. Work underway is on an ontology to integrate data and knowledge products. Currently the
system is in growing demand for public administration.
It is a world wide phenomenon that universities used
to be among the most active to promote new
information technologies and play a leading role at
each stage of IT development. Since Internet history
started universities-based resources have being been
among most reliable – in legal status, contents,
relevancy, update, functionality and innovations.
Now that leading countries are composing a new
information infrastructures to meet a digital
society/electronic governance challenge the
universities-based resources are of growing demand
and become an important part of national
information infrastructures(Berman F. and Brady H.,
2005). Initially designed to serve for full scale
economic and social investigations university
resources have been the first to maintain state
statistics holdings and accomplish value-added
research-assisting services to process, arrange and
analyze social, economic, demographic data as
important base for relevant research(Yudina T.,
Academic communities have worked out
computer-based methods of data processing and
investigation, implemented teaching courses to
educate and train in data analysis.
By now the academic community resources
provide for full scale economic and social
investigations, system and comparative analysis.
Universities experience in IT research, applications,
education programs and approach to rationally
arrange and maintain huge resources are of great
value to society. In some sense universities have
prepared the new step in IT development in the USA
and European countries and pursue the role of
promoters and advancers of knowledge-based
technologies to enhance decision-making.
In Russia IT development history is similar to that of
leading countries. Universities were the first to
obtain computers, adopt ICT and promote in the
regions. Universities and think tanks implemented
Yudina T. and Bogomolova A. (2007).
In Proceedings of the Third International Conference on Web Information Systems and Technologies - Society, e-Business and e-Government /
e-Learning, pages 233-237
DOI: 10.5220/0001279402330237
the first Internet-based resources and keep the stuff
among the most advanced and reliable. Moscow
State University - based Information System
designed and is maintained as a digital library for
research and education in economics and social
sciences. It has been in operation since 2000.
Figure 1: Main Page of University Information System
Figure 2: University Information System RUSSIA
The system maintains collections of social
domain data and documents. Currently 2+ million
documents from 60+ collections are integrated.
Contents include official data and documents (laws,
presidential decrees and directives, governmental
enactments, acts and regulations); international
agreements signed by the Russian Federation (RF);
stenograms (daily records) of State Duma of the
Federal Assembly of RF; statistics and analytical
reports of government agencies; reports and
databases maintained by "think tanks"; academic
publications – Moscow State University Bulletin
(economic and social sciences series) and other
journals; public opinion polls data; mass media.
Holdings in English are also maintained - archive
of academic publications in economics and social
sciences (full text documents available in RePEc
database); international organizations documents
and databases; foreign universities collections.
In its current version the system maintains 2+million
documents from 60+ collections. To accomplish
content-based integration of all the holdings the
technology for automatic linguistic text processing
(ALTP, special software-lingware-knowledgeware
complex) has been designed, developed and
implemented within the framework of the project at
its first stage. Currently the NLP technology is
customized to process all main types of business
prose text corpora – laws, presidential decrees and
directives, governmental enactments, acts and
regulations); international agreements signed by the
Russian Federation (RF); stenograms (daily records)
of State Duma of the Federal Assembly of RF;
statistics and analytical reports of government
agencies; reports and databases maintained by "think
tanks"; academic publications – Moscow State
University Bulletin (economic and social sciences
series) and other journals; public opinion polls data;
mass media. The procedures include processing of
electronic text in several main formats (ASCII,
HTML, MS Word) in Windows and operating as
DLL; morphological analysis of Russian/English
texts; terms' recognition/disambiguation; Thesaurus-
based thematic analysis—event categorization,
indexing, annotation/summarization; download of
results to an Oracle database server.
The main ALPT instrument is a Socio-Political
Thesaurus (Thesaurus). Its current version
incorporates 70,000 concepts/descriptors with
synonyms, including 6,500 geographic names.
WEBIST 2007 - International Conference on Web Information Systems and Technologies
Figure 3: Using of bilingual Thesaurus for documents in
Thesaurus is bilingual. The tool assists in
identifying main and subordinate topics in a
The NLP technology and bilingual Thesaurus
provide for processing and content analysis in
Russian of documents in English.
The technology provides for up to 100 Mb of
electronic texts to be processed and integrated into
the University Information System RUSSIA daily.
ALPT results are utilized to ensure advanced search
engine – in addition to traditional tools - content-
based search and query refinement is available,
exploiting Thesaurus with thesaurus hierarchy-
based query refinement/ Also available for search
are several systems of subject headings, including
that of the UIS RUSSIA and Congressional
Research Service, Library of Congress, USA,
Legislative Indexing Vocabulary Top Terms; JEL
(Journal of Economic Literature)-based
classification system.
Figure 4: Informers customized to content, RF regions,
A query result may be refined exploiting the
informers customized to documents content, RF
regions/geography, dates.
The UIS RUSSIA provides for value-added services
to assist research. The most developed are the
services for state statistics-based investigations.
With the Russian state statistics agency publishing
data in aggregated tables and mostly in .doc format
the following UIS RUSSIA services are most
- State statistics converted into relational data
base format (power tables) available at federal,
regional and local levels;
- MS Excel 97 format available for all statistical
tables, including the tables presented in
analytical reports and scientific journals;
- Links to the Methodological Notes and Glossary
for statistics;
- System of Subject Headings to integrate data
from different publications;
- Content-based search exploiting Thesaurus and
subject domain ontology.
Subject-oriented modules are accomplished on
domains, most demanded for research and university
education – the modules on Russia include “Social
and Economic Statistics”, “Budget System”,
“Agriculture”; “Population and Leaving Standards”;
“Regions, municipalities, households”(Bogomolova
A. et al., 2006). The databases integrate statistics
and analytical reports from Russian State Statistical
Service (Rosstat), Ministry of Finance, Russian and
international think tanks. Tables are available in
HTML and MS Excel format. Implemented in 2006
is relational database format for socio-economic and
budget data available at federal, regional and local
levels. Each indicator may be monitored with 10
years coverage and analyzed exploiting applied math
methods and developed models. System and
comparative investigations are available. The table
below presents the data on budget funding for
birthing centers in 2003 (published by Ministry of
Finance), population and infant mortality in 2
regions of RF (data is published by Russian State
Results in English
Query in Russian
Content informer
in Russian
Date informe
Informers list
Statistics Service).
Figure 5: An example of comparative analysis of several
Russian regions.
Maps and TeeChart are available for data
presentation and analysis.
Figure 6: Presentation on the map of number of the theater
visitors in Russian regions.
Figure 7: TeeChart presentation of number of the library
visitors in Russian cities.
Data from other state agencies will be added in
The database is widely used for investigations
and innovated education programs at Moscow State
University and other RF universities. In recent
months interest to the database is growing among
government agenciesit provides for system
analyses in support for decision making at federal,
regional, local level.
UIS RUSSIA recent accomplishment is database
on RF households 2003 survey. There is no practice
of regular households surveys at government level in
Russia. First households-based survey - National
Survey of Households Well-being and Participation
in Social Programs - took place in 2003 and covered
45 thousand households in 46 Russian regions. It
includes 227 variables aggregated into 13 parts.
These survey results is the initial data holding
downloaded into the database. While working on the
database the UIS RUSSIA team took the most
advanced accomplishments as an example – the ones
implemented under the Harvard-MIT Data Center's
Virtual Data Center and Social Data Archive. A user
can choose variables. For each variable frequency
distribution and basic statistics is presented. One can
also save original database in SPSS format or
customized data set in .csv format. This means that a
user can choose either some variables or some
meanings of variables. As a next stage we plan to
realize cross tabulations and graphs. In 2007 other
households level resources developed in Russia and
in international research centers will be included.
The most developed project is the Russian
Longitudinal Monitoring Survey which covers
thirteen nationally representative surveys beginning
in 1992 and conducted by the Carolina Population
Center at the University of North Carolina at Chapel
Hill in collaboration with Russian Federal Statistics
Agency and several Russian institutes.
Mostly developed in the UIS RUSSIA current
version is content and services for investigations in
economic and public administration.
To meet other social sciences information needs
several new databases have been accomplished.
Available since 2005 is the Information system that
maintains international organizations documents on
human rights - a)full archive of European Court of
Human Rights decisions and judgments, b)main
Council of Europe, United Nations Organization,
Commonwealth of Independent States documents
and c)publications on the human rights protection
produced by partners in Russia and other countries’
specialists, ECHR Secretariat. The system provides
for friendly interface and value-added user services,
Regions: Chechnya,
Indicators: Population
Indicators: infant mortality
Indicators: budget funding for
birthing centers
WEBIST 2007 - International Conference on Web Information Systems and Technologies
- flexible search instruments in Russian;
- profile in Russian to compliment each of European
Court’s documents;
- access to the European Court’s/Council of Europe
documents available in Russian;
- special module to monitor cases against Russia
with links to national law at issue and publications
on the topic;
- hyperlinks to case law (Court’s precedents).
The ECHR archive is legally obtained from the
European Court Secretariat, archive is updated 3
times a year, August 2006 version covers 45000+
documents in English and in French.
The project cooperates with other projects in the
domain of human rights protection. All respected
partners have agreed to provide the archives to
integrate into the information system.
The product is designed and maintained as a
multifunctional resource to serve university
teaching, comparative research, and public
informing and education.
Given how large the scope of texts to analyze
while comparing national legislative acts with
international law norms only modern approach and
computer technologies may provide for full scale
comparative investigations. To meet this challenge
the project team has accomplished a procedure to
hyperlink the documents that accompany a certain
case against Russia – European Convention on
Human Rights articles, protocols to the Convention,
other Council of Europe documents, RF national law
at issue, Strasbourg case law (European Court’s
A specialist has all package of texts at hand and
may navigate across the stuff. This approach saves
up time and provides for more effective work.
In 2006 work has started on a training module
and a university teaching course. Specialists admit
that human rights is among the most poorly
developed sphere of public law in Russia. Still, most
university programs are mainly composed in
traditional manner and do not educate students in
human rights protection legislation and practice.
Students are not trained in comparative analyses and
lack knowledge and skills needed to compare and
harmonize RF laws with international norms.
Working on this new resource the UIS RUSSIA
team hopes to assist in new generation of layers
education and training.
Contents, technology and value-added user
services make the UIS RUSSIA a valuable resource
for full-scale interdisciplinary and socially relevant
investigations and innovative educational courses at
universities and higher education institutions. The
system is free for researchers and educators,
registration is needed. 400+ universities, higher
education institutions, colleges, academic institutes,
think tanks and 4000+ individuals are subscribed
and work with the system. The system is also
accessible via public libraries and assists in citizens
RF state agencies of federal, regional and local
levels are becoming active in exploiting the UIS
RUSSIA services. Workshops and training are
regularly arranged to assist in government agencies
staff education in data analysis in order to stimulate
e-government technologies and principles in Russia.
The UIS RUSSIA team future plans include
further development of all the resources, especially
to accomplish a new version of statistical
information system that integrates all RF state
agencies data collections and provides for data
analysis at federal, regional and local levels. Work
underway is on an ontology to classify the indicators
and integrate data and knowledge products This
information system will operate as a multifunctional
one and serve university teaching and training,
research, citizens education and public
Since 1993 the project has been supported by grants
from the Russian Fund for Basic Research; the
Russian Fund for Humanities; the MacArthur
Foundation, USA; the Ford Foundation, USA; and
the Eurasia Foundation, USA.
Berman F., Brady H., 2005. Workshop on Cyber
infrastructure for Social and Behavioral Sciences.
Final Report. National Science Foundation
Yudina T., 2006. University Information System RUSSIA:
data, knowledge products and services for social
research. Proceedings of the 7
Annual International
Conference on Digital Government Research. Digital
Government Research Center. San Diego, California.
Bogomolova A., Karassev O., Sennov R., Yudina T.,
2006. University Information System RUSSIA:
Relational Database to Analyze Social, Economic and
Budget Indicators. 8th All-Russian Conference on
Digital Libraries. Vladimir-Suzdal.