BUILDING A LARGE-SCALE INFORMATION SYSTEM FOR THE

EDUCATION SECTOR: A PROJECT EXPERIENCE

Pawel Gruszczynski, Bernard Lange, Michal Maciejewski, Cezary Mazurek,

Krystian Nowak, Stanislaw Osinski, Maciej Stroinski, Andrzej Swedrzynski

Poznan Supercomputing and Networking Center

ul. Noskowskiego 10, 61-704 Poznan, Poland

Keywords:

Large-scale, distributed system, education, public sector.

Abstract:

Implementing a large-scale information system for the education sector involves a number of engineering

challenges, such as high security and correctness standards imposed by the law, a large and varied group of end

users, or fault-tolerance and a distributed character of processing. In this paper we report on our experiences

with building and deploying a senior high school recruitment system for ﬁve major cities in Poland. We discuss

system architecture and design decisions, such as thin vs. rich client, on-line vs. off-line processing, dedicated

network vs. Internet environment. We also analyse potential problems our present approach may cause in the

future.

1 INTRODUCTION

With more and more universal access to the broad-

band network infrastructure, advanced services and

applications can now be available not only to scien-

tiﬁc and business communities, but also to local ad-

ministration and the society in general. The trend to-

wards public sector applications of new technologies

is clearly visible in the objectives of such programmes

as eEurope (European Comission, 2002), Large Scale

Networking in the U.S. (NITRD, 2004) and PIONIER

in Poland (Weglarz et al., 2000).

One of the recently introduced information systems

for the city od Poznan (Poland) was the High School

On-line Admission System called Nabor

. Its intro-

duction in 2003 not only signiﬁcantly reduced the

time and effort needed for the whole procedure, but

also made the process less stressful for the candidates

and their parents. Although Nabor is based on rules

and procedures speciﬁc to Poland, we feel that both

the idea of on-line admission (Capita, 2004) and the

technical solutions we used are relevant in a more

general context.

In this paper we report on our experiences with de-

signing and implementing Nabor. Section 2 provides

background information on Poznan’s high school ad-

mission procedures and highlights the problems our

Nabor is the Polish word for admission

software solved. In section 3 we describe the archi-

tecture of Nabor placing special emphasis on the de-

sign trade-offs we had to evaluate. Section 4 provides

some implementation details and discusses the impli-

cations our design decisions may have in the future.

Finally, section 5 concludes the paper.

2 MOTIVATION

The key characteristic of senior high school admis-

sion in the city of Poznan is that the process is highly

competitive. Some of over 120 Poznan’s high schools

are always on demand and must every year reject a

great majority of their applicants, while the less popu-

lar schools can barely ﬁll the places they offer. There-

fore, each candidate is allowed to apply to more than

one school, which increases the chances of getting

to a school matching his or her standards. For the

purposes of the admission process, all selection cri-

teria, such as candidates’ marks, sporting or artistic

achievements, are aggregated to a single numerical

value ranging from 0 to 200. In this way, each school

orders all its applicants by the aggregated mark and

ﬁlls all its places with the highest-scoring candidates.

However fair and simple the above procedure may

seem at the ﬁrst sight, the lack of coordination be-

tween schools can cause many problems, the most se-

vere of them being the ’blocking problem’. This is

145

Gruszczynski P., Lange B., Maciejewski M., Mazurek C., Nowak K., Osinski S., Stroinski M. and Swedrzynski A. (2005).

BUILDING A LARGE-SCALE INFORMATION SYSTEM FOR THE EDUCATION SECTOR: A PROJECT EXPERIENCE.

In Proceedings of the Seventh International Conference on Enterprise Information Systems, pages 145-150

DOI: 10.5220/0002549301450150

 SciTePress

where the highest-scoring candidates apply and get

admitted to a large number of schools, but still de-

fer the decision as to which of them they ﬁnally want

to go to. Such candidates ’block’ places that could

have been taken by the medium-scoring applicants,

who, in turn, ’block’ places for the candidates with

the lowest marks. When a highest-scoring candidate

makes the ﬁnal decision and thus frees the ’blocked’

places, the rest of the candidates may want to change

their previous decisions and move to more prestigious

schools. In practice, because of the ’blocking prob-

lem’, the ’manual’ admission would every year boil

down to a series of cascading decision changes and

endless updating of admission lists, which was both

time-consuming and stressful.

The main objective of Nabor was therefore to pro-

vide high enough a level of co-ordination between all

high schools in the city as to reduce the time and effort

involved in the admission process. The key idea was

to gather all necessary data (e.g. candidates’ marks,

admission limits set by schools) in a central database

and design an algorithm that would generate admis-

sion lists for all schools in Poznan in one step. This,

however, was to be achieved at the cost of uniﬁcation

and minor modiﬁcations to the admission procedures.

3 SYSTEM DESIGN AND

ARCHITECTURE

3.1 Requirements

Being largely inﬂuenced by legal regulations, the de-

sign and architecture of Nabor had to meet a number

of requirements. Below we summarise the most im-

portant of them:

• High security standards. A great majority of data

processed by Nabor (e.g. candidate’s marks) was

of a conﬁdential character and had to be carefully

protected from unauthorised access and modiﬁca-

tion. Also, the system had to be designed in such a

way as to minimise the possibility of manipulating

the admission results and to enable tracing possible

manipulation attempts.

• Fault-tolerance. The high school admission proce-

dures divide the whole process into several phases

and impose strict deadlines on each of them. For

example, one of the ﬁnal phases is entering the can-

didates’ marks to the system, which must be com-

pleted by all schools within a time span of two or

three days. Therefore, our software had to be able

to operate even in case of a temporary failure of the

central database or the network infrastructure.

• Correctness of results. The future of thousands of

Poznan’s pupils depended on the admission results

generated by Nabor. For this reason, every effort

had to be made to ensure that the results were 100%

correct.

• Varied user group. An important characteristic of

Nabor was also the fact that it required active in-

volvement of all parties of the admission process.

High school candidates used the system to ﬁnd out

about the offered schools, ﬁll in and print out the

application form. School administration used Na-

bor to enter the candidates’ data and download the

admission lists once the admission had been closed.

Finally, for the local education authorities the sys-

tem provided a range of analytical reports, which

aided global planning.

3.2 Design decisions

Deciding on the architecture of Nabor required an-

swering a number of fundamental design questions,

such as whether to create a rich- or web-client appli-

cation or whether to adopt the on-line or the off-line

processing model. Below we evaluate the trade-offs

resulting from the three design questions we found

most important.

3.2.1 Rich client vs. web client

In the rich client application model (also referred

to as thick or fat client model) the major part of

data processing is performed by software deployed

on the client’s workstation. The software is usually

a standalone GUI (Graphical User Interface) appli-

cation, which contacts the central server only when

retrieving or updating business data. Below we sum-

marise the advantages and disadvantages of the rich

client model in the context of Nabor.

Rich client advantages

• Responsiveness and usability. Compared to web

applications, GUIs offer much shorter response

time and are easier to usability-tune (keyboard

shortcuts and navigation, accessibility options,

etc.). These are very important factors for sys-

tems which like Nabor involve entering massive

amounts of data over a short period of time.

• Familiarity. If the GUI application is consistent

with the host Operating System’s look and feel, it

can be a great learnability advantage. With a varied

end users group, good learnability of the software

may shorten the training period.

• Data caching. A standalone client can more eas-

ily cache data that a web application would have

to repeatedly fetch from the server. An example

of such data, a so called system dictionary, can be

ICEIS 2005 - SOFTWARE AGENTS AND INTERNET COMPUTING

146

a list of the city’s districts or schools. More im-

portantly, having a local cache, the rich client can

operate even when a temporary failure of the server

or network occurs.

• Customisable security options. Having more con-

trol over the communication between the server and

the rich client application makes it possible to im-

plement more advanced or non-standard data secu-

rity schemes.

• Server load. Compared to web clients, the GUI ap-

plications create much lower server load.

Rich client disadvantages

• Data serialisation. We feel that the most serious

problem with rich client applications is the need

for business data serialisation and deserialisation.

With data models containing lots of master-detail

relationships and high security requirements the

implementation of the business data communica-

tion layer becomes a nontrivial task.

• The need for installation. Usually, before the rich

client application can be used, it needs to be in-

stalled and conﬁgured on the user’s workstation,

which may be a source of external software depen-

dencies and incompatibilities.

• Possible platform-dependence. With certain im-

plementation technologies, a GUI application may

become platform-dependant. This creates an ad-

ditional burden of testing on all conﬁguration the

software is meant to support.

• Software updates. A standalone application is

more difﬁcult to update and maintain. Although

there exist frameworks, such as Java Web Start

(Sun Microsystems, 2004d), Rich Client Platform

(Eclipse Foundation, 2004) or Smart Client (Mi-

crosoft, 2004) that automate the process of update

checking and downloading, for the majority of end

users the process will not be perfectly transparent

and unobtrusive.

• Secure storage. To meet the required security stan-

dards, a standalone GUI application must imple-

ment its business data cache in such a way as to

eliminate the possibility of unauthorised access or

modiﬁcation.

The central idea of the web application model is

that all processing of business data takes place on the

server side, the interaction with the client relying only

on a standard web browser and the HTTP protocol.

Below we highlight the advantages and disadvantages

of the web model from the perspective of the Nabor

system.

Web client advantages

• No need for installation. The only required soft-

ware on the client side is a standard-compliant web

browser. This eliminates the need for software dis-

tribution, installation and the possible dependen-

cies.

• On-the-ﬂy updates. A web application is updated

only on the server side, which makes the process to-

tally transparent and unobtrusive for the end users.

• No need for data serialisation. In a web applica-

tion all processing takes place on the server side

and the only data exchanged with the client are

HTML/XML/HTTP streams.

• Platform-independence. Because the only required

software on the client side is a web browser, it is

much easier to achieve platform-independence of a

web application.

Web client disadvantages

• Poor responsiveness. As all business data process-

ing occurs on the server side, and because the user

interface is based on HTML pages and forms, com-

pared to the rich client, responsiveness of a web ap-

plication is rather poor.

• Usability issues. Compared to GUI, an HTML-

based user interface is more difﬁcult to work with

for the end users, because of e.g. only a limited

support for keyboard shortcuts.

• On-line operation. A web application requires that

the server connection be present at all times. In

case of a failure of the server or the network in-

frastructure, the web application cannot function at

all.

• Server load. Compared to the rich client model, a

web application generates a much higher load on

the server side.

3.2.2 Off-line processing vs. on-line processing

In the off-line processing model, the rich client appli-

cation

normally stays disconnected from the server

and performs all operations on the locally cached

data. A connection is established, usually for a lim-

ited period of time, only to synchronise the stored data

with the central database. This architecture had the

following advantages and disadvantages from the per-

spective of the Nabor system.

implementing a web application in the off-line process-

ing model seems impractical

BUILDING A LARGE-SCALE INFORMATION SYSTEM FOR THE EDUCATION SECTOR: A PROJECT

EXPERIENCE

147

Off-line processing advantages

• Permanent server link not required. For the end

users still using dial-up connections, the off-line

processing model will be a less costly option.

• Fault-tolerance. Because it is very natural for an

off-line application to operate without server con-

nection, the tolerance to server and network failures

is in a sense built-in into its architecture.

Off-line processing disadvantages

• Data synchronisation. Part of the off-line process-

ing model must be an algorithm for synchronis-

ing business data and resolving possible conﬂicts,

e.g. concurrent modiﬁcations of the same data. On

many occasions, conﬂict resolution will require the

end user’s active involvement and decisions.

• Security infrastructure. Secure storage of cached

data will require additional development and test-

ing effort.

The primary assumption behind the on-line

processing model is that the client application (im-

plemented using either the rich or the web model)

for normal operation requires a permanent connection

with the server. Below we analyse the implications of

the on-line architecture for the Nabor software.

On-line processing advantages

• No need for data synchronisation. Changes made

to the data are immediately reﬂected on the server

and therefore the problems of conﬂict resolution

are avoided.

• Up-to-date data. In the on-line model the global

and up-to-date state of the data is always available

to all clients. This greatly simpliﬁes the implemen-

tation of operations that depend on the global state

of the database.

On-line processing disadvantages

• Permanent server link required. The on-line

processing model requires that a permanent and

possibly broadband connection to the server be

available to all clients, which may turn out costly

for some end users.

• Possible security threats. An on-line rich client ap-

plication would usually require direct access to the

system’s database, e.g. through some Object to Re-

lational Model mapping layer. This would greatly

simplify the development, but also increase the se-

curity risks.

3.2.3 Dedicated network vs. Internet

In order to use the system in the dedicated network

setting, all clients would need to connect to a sepa-

rate network using e.g. a dial-up service. Clearly, this

solution incurs additional costs related to setting up

dial-up access points on the server side and provid-

ing modems for the end users. On the other hand, in

a dedicated network it is much easier to protect the

server and the client software from external attacks.

An alternative to the dedicated network is the Inter-

net, where the additional infrastructure expenses are

avoided at the cost of higher security risks.

3.3 SYSTEM ARCHITECTURE

Having analysed all available options we have de-

cided that the architecture of Nabor should be based

around the off-line rich client model and Internet

communication. We have therefore traded off the in-

creased complexity and possibly higher development

cost for better usability and fault-tolerance of the sys-

tem as a whole.

The reasons for choosing the off-line rich client

model were twofold. First of all, this model would

allow us to meet the most important design require-

ments — the high standard of data security and fault-

tolerance. To this end, we have implemented a so-

phisticated off-line business data transfer layer with

encryption and digital signature support (see section

4). Another incentive to use the off-line processing

model was the fact that due to the speciﬁc character

of Poznan’s high school admission procedures, many

of the general data synchronisation problems would

not at all occur in Nabor

The only exception to the off-line rich client model

was the web site for high school candidates, where

they could ﬁll in the application form and check the

admission results after the admission lists had been

released. For this service we wanted to avoid the

overhead of installation and conﬁguration of the rich

client application. For the security reasons, the web

server operated on a separate read-only data source

that was periodically synchronised with the system’s

main database and contained results only for those

candidates who decided to open the electronic data

access channels (about 87% of the candidates decided

to do so).

Figure 1 summarises our discussion on the over-

all architecture of Nabor. High school administration

used rich client GUI applications to enter candidates’

For example, only one of the schools a candidate is ap-

plying to has the right to enter and further modify the candi-

date’s data. Having noticed and exploited such assumption,

we can avoid the majority of conﬂict resolution problems.

ICEIS 2005 - SOFTWARE AGENTS AND INTERNET COMPUTING

148

details, marks etc. to the system and to retrieve ad-

mission lists, while local education authorities used

rich client GUI applications to retrieve analytical re-

ports. Finally, high school candidates could ﬁnd out

about the admission results using a web application or

a text message sent to a mobile phone.

Figure 1: The architecture of Nabor of Nabor

4 IMPLEMENTATION AND

EVALUATION

4.1 Implementation

We have decided to implement the Nabor system us-

ing Java technology. Employing Java Swing (Sun Mi-

crosystems, 2004b) on the client side would make the

GUI application platform-independent, while using

non-proprietary Java technologies on the server side

would reduce the cost of the whole undertaking.

Noteworthy is how we implemented the secure off-

line storage and transport layer. To this end we devel-

oped the Ofﬂine Business Objects (OBO) framework,

whose primary motivation was to avoid duplication

of code and logic among various layers of the system.

Because of the high security and fault-tolerance re-

quirements, every operation performed by the off-line

client application had to be:

• packaged into encrypted and digitally signed units

called packets,

• logged both on the client and server side to enable

automated recovery after a failure

• authenticated and authorized,

• cached on the client side in an encrypted form,

• incrementally synchronized with the server.

The key observation here is that the above require-

ments specify some mechanisms underlying the com-

munication between client and server rather than the

Table 1: Nabor production source code metrics

Hand-coded Generated TOTAL

KLOC 92 89 181

Classes 1956 650 2606

application business logic. Manual implementation

of all these mechanisms in each of 25 different types

of business objects would be tedious, error-prone and

would severely duplicate code. On the other hand,

none of the existing Object to Relational Model map-

pings, such as Hibernate (Bauer and King, 2004), pro-

vided off-line operation mode at that time

. We have

therefore decided to create the Ofﬂine Business Ob-

jects Framework, which would streamline and auto-

mate the process of generating Nabor’s business ob-

ject transport layer.

An essential part of OBO is a code generator. It

takes an XML-based declarative speciﬁcation of busi-

ness classes and generates code which implements

all the required security mechanisms and addition-

ally serves as a persistence layer for business ob-

jects on both client

and server side. The generated

code has many extension points which can be used

by a developer to implement the speciﬁc behaviour

of different business objects. One advantage of the

extension points approach is that the automatically

generated code need not be modiﬁed and can be re-

generated without losing or overwriting the customi-

sations (Herrington, 2003).

The Java implementation of the Ofﬂine Business

Objects framework was based on a number of non-

proprietary technologies, such as Torque (Apache

Foundation, 2004a), Velocity (Apache Foundation,

2004b), Java Cryptography Extension (Sun Microsys-

tems, 2004a) and Java Secure Socket Extension (Sun

Microsystems, 2004c).

In table 1 we provide a set of basic software metrics

related to the Nabor project source code, and in table

2 we show Nabor software statistics from the user’s

point of view.

4.2 Evaluation

In 2004 Nabor was deployed in ﬁve cities in Poland

on over 200 independent client workstations and han-

dled almost 30.000 high school candidates. Through-

out the operation period, the system performed

Only recently the Service Data Objects standard (IBM

and BEA, 2004) has been proposed, which includes support

for off-line operation mode, deﬁned there as the ’discon-

nected programming model’

The design of the client-side persistence layer was to

a large extent inspired by the Prevayler (Prevayler, 2004)

framework.

BUILDING A LARGE-SCALE INFORMATION SYSTEM FOR THE EDUCATION SECTOR: A PROJECT

EXPERIENCE

149

Table 2: Nabor user interface statistics

UI element Size

GUI windows >100

WWW pages >20

PDF reports 22

Excel reports 60

smoothly and reliably, and to the users’ great satis-

faction

all results were correct and delivered before

the required deadlines. No attempts to manipulate the

admission results were reported, no major usability

issues were discovered. We therefore feel that the

requirements of security, fault-tolerance, correctness

and usability were fully met, which proves the via-

bility of the off-line rich client model we decided to

adopt.

Security, usability and fault-tolerance of Nabor,

however, came at a cost. The cost was not only the in-

creased development and testing effort, but also lim-

ited extensibility of the whole system. Adding new

capabilities, such as a continuously operating edu-

cation management application, would break the as-

sumption on which we based our decision to use the

off-line processing model. Without the postulate that

one piece of data (e.g. pupil’s personal data) can be

accessed and modiﬁed only by one client (e.g. one

school), data synchronisation and conﬂict resolution

become unmanageably complex. Major functional-

ity enhancements would therefore require abandoning

the off-line rich client paradigm in favour of the on-

line web model.

5 CONCLUSION

In this paper we have reported on our experiences

with building a large-scale distributed information

system for the education sector. We described the

requirements the system was to meet and how these

requirements inﬂuenced the design and architecture

of the software. We also highlighted selected imple-

mentation issues and evaluated Nabor from the per-

formance and extensibility standpoint.

Our future plans include extending Nabor with

nursery, primary and secondary school admission

tools, as well as a continuously operating on-line ed-

ucation management application. Although in two

consecutive editions of Nabor, the off-line rich client

paradigm proved a perfectly viable approach, with

the shift from occasional towards continuous opera-

tion of the system, we will most likely abandon the

According to the survey we carried out after the ad-

mission had ﬁnished, of all the users 53% were generally

satisﬁed and 45% were fully satisﬁed with Nabor.

present model in favour of the more scalable on-line

web client approach.

REFERENCES

Apache Foundation (2004a). Torque: Persistence Layer.

http://db.apache.org/torque/.

Apache Foundation (2004b). Velocity Template Engine.

http://jakarta.apache.org/velocity/.

Bauer, C. and King, G. (2004). Hibernate in Action. Man-

ning Publications.

Capita (2004). Education Management

System: On-line Admissions and Transfers.

http://home.capitaes.co.uk/EMS/Modules/

Admissions

and Transfers.asp.

Eclipse Foundation (2004). Rich Client Plat-

form. http://dev.eclipse.org/viewcvs/index.cgi/

∼checkout∼/platform-ui-home/rcp/index.html.

European Comission (2002). eEurope 2005:

An information society for all.

http://europa.eu.int/information

society/eeurope/

2005/index

en.htm.

Herrington, J. (2003). Code Generation in Action. Manning

Publications.

IBM and BEA (2004). Service Data Objects speciﬁcation.

ftp://www6.software.ibm.com/software/developer/

library/j-commonj-sdowmt/Commonj-SDO-

Speciﬁcation-v1.0.doc.

Microsoft (2004). Smart Clients: Combining the

Power of the PC with the Reach of the Web.

http://www.microsoft.com/net/products/client.asp.

NITRD (2004). Large Scale Networking.

http://www.itrd.gov/iwg/lsn.html.

Prevayler (2004). Prevayler: Free-software prevalence

layer for Java. http://www.prevayler.org/.

Sun Microsystems (2004a). Java Cryptography Extensions.

http://java.sun.com/products/jce/.

Sun Microsystems (2004b). Java Foundation Classes.

http://java.sun.com/products/jfc/.

Sun Microsystems (2004c). Java Secure Socket Extension.

http://java.sun.com/products/jsse/.

Sun Microsystems (2004d). Java Web Start Technology.

http://java.sun.com/products/javawebstart/.

Weglarz, J., Rychlewski, J., Starzak, S., Stroinski, M.,

and Nakonieczny, M. (2000). PIONIER — Optical

Internet in Poland. ISThmus, Poznan.

ICEIS 2005 - SOFTWARE AGENTS AND INTERNET COMPUTING

150