RAPID VISION APPLICATION DEVELOPMENT USING HIVE

A Modular and Scaleable Approach to Vision System Engineering

Gregor Miller, Amir Afrah and Sidney Fels

Human Communication Technology Laboratory, University of British Columbia, Vancouver, Canada

Keywords:

Computer vision, Middleware, Sensor fusion, Distributed processing.

Abstract:

In this paper we demonstrate the use of Hive as a novel basis for creating multi-sensor vision systems. Hive

is a framework in which reusable modules called drones are deﬁned and connected together to create larger

systems. Drones are simple to implement, perform a speciﬁc task and using the powerful interface of Hive can

be combined to create sophisticated vision pipelines. We present a set of drones deﬁned within Hive and a suite

of applications built using these drones which utilize the input from multiple cameras and a variety of sensors.

Results demonstrate the ﬂexibility of approaches possible with Hive as well as the real-time performance of

the Hive applications.

1 INTRODUCTION

This paper presents Hive as a novel framework for

rapidly developing vision applications. The Hive ar-

chitecture and communication system was presented

in (Afrah et al., 2008), which demonstrated the ad-

vantages of a layered architecture and a ﬂexible me-

diation mechanism. This paper ﬁrst discusses the de-

velopment of drones (dumb modules which perform

a speciﬁc task) and then describes a suite of appli-

cations constructed by forming pipelines using the

drones. The aim of this paper is to demonstrate that

once a set of well-deﬁned drones are implemented,

vision application development is both simpler and

more efﬁcient.

Multi-sensor systems are of interest since they can

exploit redundancy of data from a variety of sensing

technologies to compensate for the shortcomings of

a single sensor. In addition to potentially increas-

ing the functionality and accuracy of a system, em-

ploying multiple sensors can also decrease the anal-

ysis complexities of single sensor applications (Luo

et al., 2002). The advantages offered by multi-sensor

use are especially evident in vision systems where

analysis complexity and accuracy of results often

pose issues. Employing additional cameras and other

complementary technologies leads to interesting and

promising vision based applications.

The overhead of dealing with a diverse set of tech-

nologies, interfaces and protocols provided for the

sensors in combination with the non-trivial and com-

plex task of analysing images makes vision based

multi-sensor application development quite challeng-

ing. There is a often a large effort required in setting

up and using such networks; system developers are

often faced with issues such as compatibility, com-

munication, performance, etc. in order to build and

prototype multi-sensor systems.

Hive is a component based platform for creat-

ing distributed vision systems. In addition Hive pro-

vides the necessary framework, standardization and

abstraction required for developing sensor fusion net-

works in a simple and plug-in fashion using reusable

modules (drones). We present an overview of Hive in

Section 2, describe a set of drones (Section 3) and the

applications built from them (Section 4). Results, pre-

sented in Section 5, demonstrate the example vision

systems and the performance of their components.

There are not many other systems similar to Hive

in the computer vision literature. OpenCV is a library

which provides an extensive set of image processing

routines while offering moderate support for image

and data I/O (Bradski and Kaehler, 2008). OpenCV

attempts to provide methods for data access and pro-

cessing through a function based interface, however

it is mostly focused on implementing complex algo-

rithms and does not address issues essential in multi-

sensor networks such as control and data manage-

ment.

YARP and Player are two widely used frame-

works for sensor fusion in robotics (Metta et al.,

2006; Gerkey et al., 2001). These platforms pro-

101

Miller G., Afrah A. and Fels S. (2009).

RAPID VISION APPLICATION DEVELOPMENT USING HIVE - A Modular and Scaleable Approach to Vision System Engineer ing.

In Proceedings of the Fourth International Conference on Computer Vision Theory and Applications, pages 101-108

DOI: 10.5220/0001805401010108

 SciTePress

vide mechanisms for communication between differ-

ent modules in a robot. YARP focuses on providing

high level access to OS functionality, and while it ad-

dresses communication and uniform access for indi-

vidual sensors, it does not provide external control

and requires a custom YARP layer for each applica-

tion. Hive is a generic system with drones that are

connected by applications, giving greater ﬂexibility

and control over pipeline creation. Player provides

a multi-threaded framework that facilitates commu-

nication between multiple devices. Player’s architec-

ture supports robotic based networks however it does

not provide the ﬂexibility and control requirements in

creating diverse vision systems.

Various interfaces have been developed to ac-

cess data from input devices, from OS speciﬁc ex-

amples such as DirectShow on Windows, Quick-

time and Video4Linux (Schimek et al., 2008), to

cross-platform libraries such as OpenCV (Bradski

and Kaehler, 2008). These systems are generally lim-

ited to capturing data from devices physically con-

nected to the machine the software runs on. One of

our goals with Hive is to develop a generic camera

framework so that data can be captured from a vari-

ety of devices and delivered to any machine on the

network. This involves writing a hive drone for each

capture API, which could be for a speciﬁc camera, a

range of cameras (from the same manufacturer), or a

particular platform (e.g. Quicktime). Once all these

drones are created, it is possible to capture data with

very little development effort on any platform.

2 HIVE

Hive is a middleware platform which abstracts sys-

tem speciﬁcs and allows communication between ar-

bitrary reusable modules called drones. The plat-

form is based on a layered architecture that provides a

plug-in interface over an event system and a transport

layer that allow for distributed processing. The Hive

architecture and framework is explained in detail in

(Afrah et al., 2008), while the following presents a

brief overview of the important components.

Hive systems are built from drones: reusable mod-

ules which perform a speciﬁc task (e.g. capture im-

ages from a camera), and have well-deﬁned inputs

and outputs. Applications are the command-and-

control centres for drones, conﬁguring and connect-

ing sets of drones into processing structures called

swarms to perform a sequence of tasks. Applica-

tions can collate the output from drones to construct a

larger more sophisticated system.

Applications may construct multiple swarms, ei-

ther to distribute processing of a computationally in-

tensive task or to perform various tasks simultane-

ously, and later combine the results. There may also

be multiple applications using the same drones to re-

ceive and process information.

3 DRONES

Hive drones form the basis of vision systems, per-

forming data capture and processing. Drones are cre-

ated by registering functions as handlers for incom-

ing data, and registering a main method which runs

once per drone cycle. The expected type of input and

output are also speciﬁed in the drone deﬁnition. The

system speciﬁcs issues and communication methods

are taken care of by Hive, making the construction of

new drones a trivial task.

The following sections present an overview of the

drones currently operational in our systems.

3.1 Data Capture

All vision algorithms work on input from sensors,

from digital cameras to laser range scanners. Sensors

are abstracted as drones in Hive, exposing their ser-

vices across a network to other processing drones and

applications. The data capture drones we currently

have are as follows:

AXIS Camera. Data is captured from an AXIS

network camera over TCP/IP using the AXIS API

(VAPIX, 2008). The drone is executed from any

computer with access to the network and retrieves

the JPEG compressed image from the camera. The

image is then sent to other Hive modules in either

JPEG or decompressed format.

Inputs: None

Outputs: Colour image

Conﬁguration: Camera settings;

Output format (JPEG or raw)

Image Sequence. When testing algorithms often

the same sequence needs to be used for evaluation

purposes, or the data for a particular application was

pre-recorded. This drone loads an image sequence

from disk and allows seamless switching of data

sources, e.g. from a live camera to an image sequence

stored on any computer on the network. The drone

itself can load any image sequence stored on its local

machine. The root name of the sequence and the

frame rate to supply data are given as conﬁguration

parameters. The Image Sequence drone also supports

video ﬁles. Counterparts for storing image sequences

VISAPP 2009 - International Conference on Computer Vision Theory and Applications

102

and video to disk are also available.

Inputs: None

Outputs: Colour image

Conﬁguration: Root ﬁlename;

Frame rate

Fastrak. Having a notion of spatial position is an

important aspect of many vision applications. Vision

based algorithms for estimating 3D position require

intensive processing and are often inaccurate. This

task can be performed easily and accurately using

tracking hardware. The Polhemus Fastrak(Polhemus,

2008) is a magnetic device that performs real-time

6DOF tracking of sensors. Fastrak provides the 3D

position and orientation of each sensor (pitch, roll and

yaw) relative to a base station. The Fastrak device

allows up to four sensors to be tracked simultaneously

and the drone allows conﬁguration of which sensors

to use.

Inputs: None

Outputs: 3D position and orientation

for each sensor

Conﬁguration: Number of active sensors

3.2 Vision Algorithms

This section presents the algorithms wrapped as

drones which are used by applications in Section 4.

Background Subtractor. Many algorithms in

Computer Vision make use of background subtrac-

tion (or foreground extraction) as a precursor to

the main computation. This drone provides eight

different methods of background subtraction, ranging

from simple frame differencing to more sophisticated

techniques.Algorithm selection and parameter setting

can be altered via drone conﬁguration.

Inputs: Image

Outputs: Foreground image;

Alpha matte

Conﬁguration: Algorithm selection;

Algorithm parameters

Face Detector. This drone makes use of the face

detection supplied with OpenCV, which utilizes

a cascade of boosted classiﬁers using Haar-like

features(Lienhart and Maydt, 2002; Bradski and

Kaehler, 2008). For each input image the drone

produces an array of rectangles corresponding to

regions possibly containing a face.

Inputs: Image

Outputs: Array of rectangles

Conﬁguration: Algorithm parameters

Colour Point Detector. Locating colour points in

images is a useful method for tracking objects of

interest. This drone ﬁnds the centre of a region in

an image that corresponds to a certain colour. The

image is ﬁrst thresholded against the required colour,

and then the pixels left in the image are grouped into

blobs. The centres of the blobs are then calculated

for those that meet the preferred size criteria.

Inputs: Image

Outputs: 2D position of the coloured

areas

Conﬁguration: RGB value of point;

Min and max size of regions

3.3 Visualization

Displaying images is a vital part of a vision system.

The following describes a drone set up to display in-

coming images :

Live Video Viewer. This drone provides a display

for incoming images and annotation tools to draw

shapes (from other drones such as the Face Detector).

Multiple instances of this drone can be tied to

different drones, providing real-time feedback at each

stage of a swarm’s computation, which is useful for

debugging during development. For example, in the

Face Detection application described in Section 4.1

separate viewers can be connected to the camera,

the background subtractor and the face detector to

monitor algorithm results.

Inputs: Image;

Rectangles;

Points

Outputs: Video to screen

Conﬁguration: None

Image Sequence Capture. As discussed above for

the Image Sequence drone, capturing data from cam-

eras is important for ofﬂine processing or algorithm

development and testing. This drone accepts im-

ages and stores them directly to disk, saving them

with a ﬁlename given via conﬁguration. The Im-

age Sequence Capture drone also supports saving

images to video ﬁles instead of image sequences.

Inputs: Image

Outputs: Images to disk

Conﬁguration: Root ﬁlename

4 APPLICATIONS

In this section we introduce four applications demon-

strating the power and ﬂexibility of Hive. Given a

RAPID VISION APPLICATION DEVELOPMENT USING HIVE - A Modular and Scaleable Approach to Vision System

Engineering

103

(a) System 1 (b) System 2

Figure 1: Face Detection. Flow charts showing the connections in (a) direct detection and (b) the addition of a Background

Subtractor drone.

Figure 2: Quality of View Analysis. The ﬂow chart for the Hive sensor network for analysing the quality of views via face

detection.

base set of drones, system prototypes can be con-

structed quickly, and swarms can be dynamically con-

nected to test different conﬁgurations.

4.1 Face Detection

Our ﬁrst application is a real-time face detection sys-

tem implemented using four of the Hive drones de-

scribed above: the AXIS Camera; the Background

Subtractor; the Face Detector; and the Live Video

Viewer. The system operates in two ways: the ﬁrst

detects faces on the original camera images; the sec-

ond performs face detection on a foreground extracted

image. Results demonstrate the improvement in accu-

racy and performance by incorporating a background

subtraction system, and the system itself shows the

simplicity and ﬂexibility of constructing applications

in Hive.

Both methods of face detection use a ‘dumb’ ap-

plication to connect the various drones together. The

application is termed ‘dumb’ because it does not need

to do any computation or result collation itself, as the

drones perform all the processing.

The ﬁrst method uses the application to connect an

AXIS Camera to both the Face Detector and the Live

Video Viewer, and then connects the Face Detector

to the Live Video Viewer (as shown in Figure 1(a)).

Using Hive and the pre-deﬁned drones, this amounts

to under thirty lines of code (including conﬁguration

parameters). To obtain real-time performance the face

detector is conﬁgured to be less accurate and faster,

however this results in more false positives.

For the second system a background subtractor

is inserted between the AXIS Camera and the Face

Detector in order to reduce the number of false pos-

itives while maintaining real-time performance. This

new system, shown in Figure 1(b), removes identiﬁed

faces from the background (such as photographs) as

well as reducing the number of false positives. An

evaluation of the face detection systems is presented

in Section 5.1. The addition of the Background Sub-

tractor drone to the system is simple and shows how

systems can be enhanced or tested by inserting addi-

tional processes using Hive.

4.2 Quality of View Analysis

The next application extends the previous real-time

Face Detection algorithm to create a system which

analyses the quality of the given views in a multi-

ple camera network. The quality evaluation is set

to be the number of faces in each view, and the ap-

plication automatically switches to the view with the

most faces. This system could for example be used

for home video podcasting in one-person shows: us-

ing multiple webcams the system will automatically

change to the view the presenter is looking at.

The system connections are shown in Figure 2:

three AXIS Camera drones are each connected to

a Background Subtractor drone which is in turn at-

tached to a Face Detector drone. The cameras are also

connected to the application to provide the images for

the chosen view. As the feeds come in from the Face

Detectors, the number of faces in each view is com-

pared and the view with the most faces chosen, and

its images are routed to the applications display. This

VISAPP 2009 - International Conference on Computer Vision Theory and Applications

104

Figure 3: Multiple Camera Calibration. Feed from cameras 1, 2 and 3 during data point collection for calibration.

example application shows how a sophisticated sys-

tem can be built quickly using Hive from a set of base

drones.

Results and an example of the system running are

shown in Section 5.2.

4.3 Multiple Camera Calibration

Our next application is a multiple camera calibration

tool using magnetic sensors to obtain the intrinsic and

extrinsic camera parameters. There are various meth-

ods for computing camera calibration; we have devel-

oped a multiple camera calibration system based on

the Tsai calibration method(Tsai, 1987).

Using Hive we utilize the Colour Point Detector

and the Fastrak drones. For this application we use a

green marker on the Fastrak sensor to locate it in the

image giving an image point to 3D point correspon-

dence. To perform calibration, the marked sensor is

moved around in the ﬁeld of view of each camera to

produce a data set which is then processed using the

Tsai method to calculate the intrinsic and extrinsic pa-

rameters.

Figure 4(a) shows the interconnection of drones

in the multi-camera calibration system. The appli-

cation is connected to one Fastrak and three sets of

the Colour Point Detector and AXIS Camera swarms.

The application couples the 3D sensor position from

the Fastrak drone with the 2D location of the colour

point from the Colour Point Detector drone and runs

the calibration routine. The resulting calibration pa-

rameters are written to disk for each camera. Figure 3

shows the annotated images for each camera. Note

that extension to more cameras is trivial, requiring an

additional swarm for each camera.

4.4 Augmented Reality

Insertion of virtual objects into a real scene has many

applications in entertainment, virtual reality and hu-

man computer interaction. We have implemented a

real-time augmented reality system using the Fastrak

and multiple cameras that provide jitter-free virtual

object insertion that is accurately represented in the

different camera viewpoints. Figure 4(b) shows the

interconnection of drones for this application. We use

the multiple camera calibration application described

above to calibrate the cameras to the Fastrak’s coordi-

nate system. The calibration data is used to construct

a model of the cameras in OpenGL.

Given this model, a 3D object can be placed in

the scene using the correct position and orientation

supplied by the Fastrak sensor, and rendered in the

image plane of the modelled camera. This rendering

is superimposed on the actual camera feed to produce

the images that contain the virtual object. Figure 5

shows the frames from the three cameras before and

after the placement of the augmented reality object.

5 RESULTS

This section presents the results of the Face Detector

and Quality of View Analysis applications described

above. Results were generated on a 2.33GHz Intel

Core2 Duo with 1GB RAM, with drones running on

a single machine.

5.1 Face Detection

The Face Detection application described in Sec-

tion 4.1 uses three drones for the ﬁrst system (di-

rect face detection) and four for the second (including

background subtraction). The results of the two sys-

tems are shown in Figure 6, with and without back-

ground subtraction, and at two levels of accuracy. To

obtain real-time performance the Face Detector is set

to ﬁnd faces with low accuracy, which increases the

rate of false positives (shown in Figure 6(a)). Attach-

ing a Background Subtractor to the system removes

large regions of the image (Figure 6(b)) where false

positives can appear, as well as removing static faces

(such as photographs) from the scene. Figure 6(c)

shows the ﬁnal result using the second system. The

second row of images displays results for the system

with the Face Detector in high accuracy mode.

The table in Figure 7 shows the results of the two

systems running on a set of 200 frames in a clut-

RAPID VISION APPLICATION DEVELOPMENT USING HIVE - A Modular and Scaleable Approach to Vision System

Engineering

105

(a) (b)

Figure 4: Drone connections for (a) Multiple Camera Calibration, (b) Augmented Reality.

Figure 5: Augmented Reality. Original feed from the cameras vs. Augmented Reality.

(a) (b) (c)

Fast detection Foreground image Result from (b) overlaid

(d) (e) (f)

Accurate detection Foreground image Result from (e) overlaid

Figure 6: Face Detection. (a)-(c) show the results of the fast method, (d)-(f) the accurate method; both show improvement

through the addition of background subtraction.

tered scene, in both low and high accuracy modes.

The fast mode (low accuracy) with background sub-

traction provides improvements in false positives and

positive matches, with only a slight dip in processing

speed. The background subtraction also removes the

static faces in the scene.

VISAPP 2009 - International Conference on Computer Vision Theory and Applications

106

Mode False Positives Background Faces Positive Matches FPS

Fast 3% 91% 92.4% 15.3

Fast + BGS 1% 0% 95.1% 13.1

Accurate 3.5% 85.5% 98.4% 3.9

Accurate + BGS 0% 0% 98.9% 6.4

Figure 7: Face Detection. Results of the face detection on a 200 frame sequence.

Figure 8: Quality of View Analysis. Each row represents a snapshot in time from each of the three cameras. The red boxes

in the top-left, centre and bottom-right images show positive detections and the view chosen by the system.

The slow mode (high accuracy) actually had the

most false positives but a higher rate of positive

match. When combined with background subtraction

it became the best overall system, with zero false pos-

itives, zero static faces and the highest rate of positive

match. The background subtraction also increased the

frame rate (since the search for faces did not need to

work at multiple scales on background regions).

These results demonstrate the systems built using

Hive are ﬂexible and capable of performing in real-

time.

5.2 Quality of View Analysis

Figure 8 shows the results of the Quality of View

Analysis application. The system was tested on three

cameras converging on a single region, in a varied

scene. Each row represents a snapshot in time from

each of the three cameras. View switching occurs

when a face is identiﬁed, shown as the red rectangle

in the top-left, centre, and bottom-right images. The

system operates in real-time, and is easily scalable us-

ing Hive to support many more cameras.

5.3 Performance

The drones used in our system all perform at real-time

rates, as shown in the table in Figure 9. The Back-

ground Subtractor and Colour Point Detector have

not been optimized and their frame rate could be im-

proved. While the rate of the applications is shown at

less than the camera rate, all applications display the

camera images at close to 30Hz, and use the results of

the other drones to update the system, which happens

less frequently.

RAPID VISION APPLICATION DEVELOPMENT USING HIVE - A Modular and Scaleable Approach to Vision System

Engineering

107

Drone Rate

AXIS Camera 29.5fps

Background Subtractor 15fps

FasTrack 120Hz

Face Detector 13fps

Colour Point Detector 20fps

Application Rate

Face Detection 15fps

Face Detection (BGS) 13fps

QoV Analysis 13fps

Augmented Reality 27fps

Figure 9: Performance. These tables document the performance of the drones individually and of the applications.

6 CONCLUSIONS

We have presented a novel method for creating sensor

fusion networks using Hive. The systems consisted

of multiple sensors, from input devices such as cam-

eras and 3D trackers, to processing systems such as

background subtraction and face detection. As proof

of concept four applications were described, for face

detection, quality of view analysis, multiple camera

calibration from 3D sensor, and augmented reality in

multiple cameras. The set of drones used to create

these applications have been described, as well as the

simple process of integrating new functionality as a

drone. Results have demonstrated the ﬂexible con-

struction and the ability to prototype systems quickly

using Hive, and the real-time performance achieved.

We have many other drones either implemented

or under development, such as a Tri-Stereo Camera,

Symbian OS and iPhone drones (for visualization on

mobile phones, and the use of camera phones), Hand

Tracker and Eye Tracker, to name a few.

Hive has been released for use by the vision and

robotics communities, as well as a number of drone

implementations. Visit:

http://hct.ece.ubc.ca/research/Hive/

for details.

REFERENCES

Afrah, A., Miller, G., Parks, D., Finke, M., and Fels, S.

(2008). Hive: A distributed system for vision process-

ing. In Proc. 2nd International Conference on Dis-

tributed Smart Cameras.

Bradski, G. and Kaehler, A. (2008). Learning OpenCV:

Computer Vision with the OpenCV Library. O’Reilly

Media, Inc., 1st edition.

Gerkey, B., Vaughan, R., Stoy, K., Howard, A., Sukhatme,

G., and Mataric, M. (2001). Most valuable player: a

robot device server for distributed control. In Intel-

ligent Robots and Systems, 2001. Proceedings. 2001

IEEE/RSJ International Conference on, pages 1226–

1231.

Lienhart, R. and Maydt, J. (2002). An extended set of haar-

like features for rapid object detection. In Proceed-

ings of International Conference on Image Process-

ing, volume 1, pages 900–903.

Luo, R. C., Chin-Chen, Y., and Kuo, L. S. (2002). Multisen-

sor fusion and integration: approaches, applications

and future research directions. IEEE Sensors Journal,

2:107–119.

Metta, G., Fitzpatrick, P., and Natale, L. (2006). Yarp: yet

another robot platform. In International Journal of

Advanced Robotics Systems.

Polhemus (2008). Fastrak :

http://www.polhemus.com/. Technical report.

Schimek, M. H., Dirks, B., Verkuil, H., and

Rubli, M. (2008). Video for linux v4.12:

http://v4l2spec.bytesex.org/v4l2spec/v4l2.pdf.

Technical Report 0.24, Linux.

Tsai, R. (1987). A versatile camera calibration technique for

high-accuracy 3d machine vision metrology using off-

the-shelf tv cameras and lenses. Robotics and Automa-

tion, IEEE Journal of [legacy, pre - 1988], 3:323–344.

VAPIX (2008). Axis communication api:

http://www.axis.com. Technical report, AXIS.

VISAPP 2009 - International Conference on Computer Vision Theory and Applications

108