DOCXS

A Distributed Computing Environment for Multimedia Data Processing

Tobias Lohe, Michael Fieseler, Steffen Wachenfeld and Xiaoyi Jiang

Department of Computer Science, University of M

unster, Einsteinstraße 62, D-48149 M

unster, Germany

Keywords:

Distributed multimedia systems, workﬂow systems, visual programming.

Abstract:

This paper presents DocXS, a distributed computing environment for multimedia data processing, which was

developed at the University of M

unster, Germany. DocXS is platform independent due to its implementation

in Java, is freely available for non-commercial research, and can be installed on standard ofﬁce computers.

The main advantage of DocXS is that it does not require its users to care about code distribution or paralleliza-

tion. Algorithms can be programmed using an Eclipse-based user interface and the resulting Matlab and Java

operators can be visually connected to graphs representing complex data processing workﬂows. Experiments

with DocXS show that it scales very well with only a small overhead.

1 INTRODUCTION

In this paper we present DocXS (Distributed Opera-

tor Construction and eXecution System), a computing

environment for multimedia data processing. DocXS

harnesses the power of distributed computing, allows

the easy combination and integration of existing al-

gorithms or software packages, and facilitates the

scientiﬁc exchange among researchers. Additionally

DocXS provides a visual programming environment

for the deﬁnition of workﬂows based on smaller units

called operators.

In the literature, several reports on distributed sys-

tems for multimedia data processing exist. One of the

ﬁrst reported systems is DIPE (Zikos et al., 1997),

which uses binary executables as operators. DIPE

provides no control structures like branches or loops

and is the only system without a visual programming

interface.

The LONI pipeline processing environment (Rex

et al., 2003) also uses binary executables as operators

and is able to distribute operators automatically, but

does not provide any control structures.

Khoros/Cantata (Konstantinides and Rasure,

1994; Young et al., 1995) provides the control

structures IF/ELSE, SWITCH, WHILE and COUNT,

but the operators (also binary executables) have to be

manually distributed by the user.

The IRMA (Image Retrieval in Medical Applica-

tions) platform (G

uld et al., 2003) is able to automati-

cally distribute operators, which have to be written in

C++, but only provides an IF/ELSE control structure.

SCIRun (Parker et al., 1997) ﬁnally supports only

C++ operators, provides no control structures and

supports only manual distribution.

In contrast to DocXS, all these systems lack the

possibility to include operators written in Matlab or

Java and to combine operators from different lan-

guages in the same workﬂow. Also none of the

systems supports a combination of loops and auto-

matic distributed processing. DocXS in contrast al-

lows branches as well as loops and automatically dis-

tributes operators. Further, to facilitate identical oper-

ations on multiple data, DocXS allows use of a con-

struct called FOREACH. This loop-like construct is

very useful as the identical operations are independent

and can be automatically distributed and processed in

parallel.

This paper is structured as follows. Section 2 gives

a detailed overview about the architecture and imple-

mentation of DocXS. In Section 3 we present experi-

mental results which include a performance analysis.

The paper concludes with a discussion of our achieve-

ments in Section 4.

389

Lohe T., Fieseler M., Wachenfeld S. and Jiang X. (2007).

DOCXS - A Distributed Computing Environment for Multimedia Data Processing.

In Proceedings of the Second International Conference on Signal Processing and Multimedia Applications, pages 379-382

DOI: 10.5220/0002140003790382

 SciTePress

2 ARCHITECTURE AND

IMPLEMENTATION

We will use several technical terms to describe

DocXS. An operator is some piece of code that ex-

ecutes arbitrary computations. A chain is a higher-

order deﬁnition of a workﬂow consisting of sev-

eral connected operators and control structures (like

IF/ELSE, WHILE or FOREACH) which form a di-

rected graph. An example of a simple chain which

represents a process to detect edges in images is

shown in Figure 1. It can be seen that operators can

have multiple typed and labeled inputs and outputs,

which are called ports in DocXS.

A chain which represents a speciﬁc algorithm can

be applied by different users onto different data at the

same time. Each application leads to an active in-

stance of the chain within the system, which is called

a task.

DocXS is designed to support Matlab and Java op-

erators and allows to combine them in the same chain.

It uses a lightweight API for the addition of new oper-

ators which makes the integration of already existing

code into DocXS very easy. The chains, which can be

constructed by combining operators and control struc-

tures, are designed to be able to model arbitrary work-

ﬂows, which are automatically analyzed, distributed,

and computed in parallel by the system. Furthermore,

DocXS emphasizes the scientiﬁc collaboration inside

a group or company, as it allows to share operators,

chains, and data.

DocXS is implemented in Java and requires only

a Java virtual machine to run. Therefore DocXS is

completely platform independent. For the execution

of Matlab operators of course a valid Matlab installa-

tion and license is required.

2.1 Distributed System Architecture

The architectural overview of the distributed DocXS

system can be seen in Figure 2. The system consists

of various components that can be distributed among

different computers. The central server hosts the Ker-

nel, which serves as the main coordinator and con-

troller of the system. Tightly integrated with the Ker-

nel is the server running the central database. The

distributed execution of tasks is performed on multi-

ple computers each running an Executor. The number

of Executors is not limited.

DocXS provides two separate user interfaces: The

so-called SystemGUI to create operators and chains

and the WebGUI to execute chains without requir-

ing programming knowledge. The WebGUI is imple-

mented using the JavaServer Faces technology, runs

Figure 1: A chain representing an edge detection algorithm.

in an Apache Tomcat servlet container, and can be

used with any modern Web browser. The SystemGUI

of DocXS is based on the Eclipse Rich Client Plat-

form (McAffer and Lemieux, 2005) and does not run

on a server, but on the developers’ computers.

2.2 Operators and Chains

For the creation of Java operators using the Sys-

temGUI, the full functionality of the Eclipse Java

IDE (syntax highlighting, code completion, refactor-

ing support) can be employed, while for Matlab oper-

ators only syntax highlighting is provided. All built-in

data types of Java and Matlab can be used as input and

output parameters for operators.

Integrating existing Java code or creating new Java

operators is done by simply implementing an inter-

face and deﬁning getter and setter methods. Mat-

lab operators just need a main function which can be

called by DocXS. A single DocXS operator may con-

sist of several Java classes or Matlab ﬁles.

Available operators can be inserted into a chain

using drag-and-drop. Java and Matlab operators can

be mixed in an arbitrary manner inside a chain. The

SIGMAP 2007 - International Conference on Signal Processing and Multimedia Applications

390

Executor 1

Tomcat

Database

Executor 3Executor 2

DocXS System

Kernel

...

Executor k

......

WebGUI n

SystemGUI 1

SystemGUI m

WebGUI 1

Figure 2: Distributed system architecture of DocXS.

data ﬂow is represented by edges between ports. Nec-

essary type conversions are done automatically when

the chain is executed.

For the deﬁnition of complex workﬂows several

control structures are provided. Conditional execu-

tion can be expressed using the IF/ELSE or SWITCH

control, and loops using the WHILE control. Espe-

cially important is the FOREACH control structure that

allows a user to execute a part of the chain for every

element of a list or array. As the identical operations

applied to each element are independent of each other,

the FOREACH can be automatically distributed among

the Executors.

2.3 Task Execution

Available chains can be executed using the WebGUI.

After the user has selected the required input parame-

ters, the execution of the task can be started. The Ker-

nel analyzes the task and splits it into several parallel

jobs for distribution. An internal scheduler assigns the

resulting jobs to the available Executors, where they

are executed in parallel. The Kernel also takes care

about handling the dependencies between jobs of the

same task and the coordination of the Executors run-

ning the jobs.

The Executor analyzes the job, provides and con-

verts the input data, executes the contained operators

using the Java Reﬂection API or the JMatLink Java-

Matlab connector, takes care about the proper execu-

tion of control structures and writes the output data.

2.4 Data Storage

The system data—operators, chains, tasks, task pa-

rameters, and task results—is stored in a central

database. Images and other media ﬁles are stored us-

ing the ﬁle system and only links to their location are

stored in the database. We use Hibernate (Bauer and

King, 2006) as object-relational mapper, which deliv-

ers a convenient object-oriented abstraction layer of

the underlying relational SQL database. Therefore al-

most any relational database system can be used with

DocXS and a switch from one database system to an-

other is possible without changing any line of code

and requires only to change the according system

properties. We currently use the IBM DB2 Express-C

database system.

3 EXPERIMENTAL RESULTS

In this section we present some experimental re-

sults considering the performance of DocXS. We use

a cluster of k standard ofﬁce computers as Execu-

tors, each having a 1.7 GHz Intel Pentium 4 CPU

and 512 MB RAM, and a non-dedicated server with

two 2.8 GHz Intel Xeon Dual-Core CPUs and 6 GB

RAM for the Kernel. The database runs on a non-

dedicated server with an AMD Athlon XP 2000 CPU

and 512 MB RAM. All computers are connected us-

ing a 100 MBit Ethernet network. We used repeated

test runs and took the median of all runs to reduce the

impact of resulting outliers.

3.1 Estimation of System Overhead

To estimate the computational overhead of DocXS for

system management and task distribution, we use a

task that consists of a NOP operator implemented in

Java, which does nothing and simply returns the in-

puts without modiﬁcation. The operator is placed in

a FOREACH control so that the operator has to be

executed for each input item. We measure the time

DocXS needs to run such a task.

We show two different cases. In the ﬁrst case

(NOP-few) the input data consists of 64 integer val-

ues to keep the time for data distribution to a mini-

mum. The second case (NOP-large) involves a larger

amount of data, a set of 64 ﬁles (each 1.3 MB), that

has to be distributed. This case not only reﬂects the

network speed, but moreover the internal handling of

the data by the system.

Table 1 shows the total time needed for both cases

depending on the number k of participating Execu-

tors and the execution time of the same tasks without

DocXS. It can be seen that DocXS itself causes only a

small overhead. The overhead in the NOP-large case

decreases with higher numbers of Executors due to

distributed I/O. Both cases show that using DocXS

already pays off if a task takes about a minute without

DocXS, in the case of low I/O demands even less.

DOCXS - A Distributed Computing Environment for Multimedia Data Processing

391

Table 1: Execution times for different numbers k in comparison to the execution time of the task without DocXS.

k NOP-few NOP-large Comp-few (speedup) Comp-large (speedup)

No DocXS < 1ms 1m 02s 407ms 59m 52s (= 1.00) 1h 03m 59s (= 1.00)

1 875 ms 1m 57s 801ms 1h 05m 06s (× 0.92) 1h 06m 01s (× 0.97)

4 6s 546ms 1m 08s 675ms 16m 34s (× 3.61) 17m 12s (× 3.72)

8 8s 140ms 1m 17s 640ms 8m 22s (× 7.16) 9m 15s (× 6.91)

16 13s 191ms 1m 01s 935ms 4m 14s ( ×14.17) 5m 02s (×12.72)

3.2 Performance Comparison

To measure the performance of our system we use two

cases very similar to the cases for the overhead esti-

mation. Both cases use a computationally intensive

Java operator. While the ﬁrst case (Comp-few) uses

only primitive data types, in the second case (Comp-

large) the amount of data which has to be trans-

ferred over the network and into memory is higher.

For both cases the speedup of DocXS in comparison

to a single computer without DocXS, calculated as

speedup = T

no DocXS

DocXS

, is shown.

It can be seen in Table 1 that DocXS scales very

well in the Comp-few case. For one Executor (k = 1)

DocXS needs slightly longer due to the already dis-

cussed overhead. But the speedup grows linearly with

an increasing number of Executors and for k = 16 the

task is ﬁnished more than 14 times faster than on a

single computer. In the Comp-large case, which in-

volves sending larger amounts of data over the net-

work, DocXS scales very well, too. Tasks can be ﬁn-

ished almost 13 times faster using DocXS instead of

a single computer.

DocXS can also make efﬁcient use of a multipro-

cessor computer by running an Executor instance on

each processor available in the system. Tests using a

single multiprocessor computer with eight CPUs re-

sulted in a speedup of 7.73 (Comp-few) resp. 6.53

(Comp-large).

4 CONCLUSION

We presented DocXS, a distributed computing envi-

ronment for multimedia data processing. The main

advantage of DocXS is that it does not require its

users to care about code distribution or parallelization,

but handles these issues automatically. Algorithms

can be programmed using an Eclipse-based user inter-

face and the resulting Matlab and Java operators can

be visually connected to a complex workﬂow using

various branch and loop control structures. Addition-

ally the scientiﬁc exchange of operators, algorithms,

and data is facilitated using a central database and two

user interfaces, one for developers and one for system

users.

We showed that DocXS produces only a small

overhead and that it scales very well for computation-

ally expensive tasks. As DocXS is going to be freely

available for non-commercial research and may run

on cheap PC hardware, it is a useful tool which can

simplify and facilitate every researcher’s work.

REFERENCES

Bauer, C. and King, G. (2006). Java Persistence with Hi-

bernate. Manning.

uld, M. O., Thies, C., Fischer, B., Keysers, D., Wein,

B. B., and Lehmann, T. M. (2003). A platform for

distributed image processing and image retrieval. In

Visual Communications and Image Processing 2003,

volume 5150 of Proceedings of SPIE, pages 1109–

1120.

Konstantinides, K. and Rasure, J. R. (1994). The Khoros

software development environment for image and sig-

nal processing. IEEE Transactions on Image Process-

ing, 3(3):243–252.

McAffer, J. and Lemieux, J.-M. (2005). Eclipse Rich Client

Platform: Designing, Coding, and Packaging Java

Applications. Addison-Wesley Professional.

Parker, S., Beazley, D., and Johnson, C. (1997). Computa-

tional steering software systems and strategies. IEEE

Computational Science and Engineering, 4(4):50–59.

Rex, D. E., Ma, J. Q., and Toga, A. W. (2003). The

LONI pipeline processing environment. NeuroImage,

19(3):1033–1048.

Young, M., Argiro, D., and Kubica, S. (1995). Cantata:

Visual programming environment for the Khoros sys-

tem. Computer Graphics, 29(2):22–24.

Zikos, M., Kaldoudi, E., and Orphanoudakis, S. C. (1997).

DIPE: A distributed environment for medical image

processing. In Proceedings of MIE’97 (Medical In-

formatics Europe), pages 465–469.

SIGMAP 2007 - International Conference on Signal Processing and Multimedia Applications

392