PIGGYBACK META-DATA PROPAGATION IN

DISTRIBUTED HASH TABLES

Erik Buchmann, Sven Apel, Gunter Saake

University of Magdeburg, Germany

Keywords:

Peer-to-Peer data structures, meta-data propagation, Mixin Layers, Aspect-oriented Programming.

Abstract:

Distributed Hashtables (DHT) are intended to provide Internet-scale data management. By following the

peer-to-peer paradigm, DHT consist of independent peers and operate without central coordinators. Conse-

quentially, global knowledge is not available and any information have to be exchanged by local interactions

between the peers. Beneath data management operations, a lot of meta-data have to be exchanged between the

nodes, e.g., status updates, feedback for reputation management or application-speciﬁc information. Because

of the large scale of the DHT, it would be expensive to disseminate meta-data by peculiar messages. In this

article we investigate in a lazy dissemination protocol that piggybacks attachments to messages the peers send

out anyhow. We present a software engineering approach based on mixin layers and aspect-oriented program-

ming to cope with the extremely differing application-speciﬁc requirements. The applicability of our protocol

is conﬁrmed by means of experiments with a CAN implementation.

1 INTRODUCTION

Distributed hashtables (DHT) are able to manage

huge sets of (key,value)-pairs and cope with very high

numbers of parallel transactions. DHTs are a variant

of distributed data structures which follow the peer-

to-peer (P2P) paradigm. In particular, they consist

of many autonomous nodes, there is no central co-

ordinator and global knowledge is not available. Ex-

amples of DHTs are Content-Addressable Networks

(CAN) (Ratnasamy et al., 2001a), Chord (Stoica

et al., 2001), Pastry (Rowstron and Druschel, 2001)

or P-Grid (Aberer, 2001). Characteristic DHT ap-

plications are distributed search engines, repositories

for the semantic web or dictionaries for web services;

cf. (Gribble et al., 2001) for an exhaustive list. How-

ever, DHTs work well only if some meta-information

are present on proper peers, e.g., are adjacent nodes

alive and which tuples are managed by them. Fur-

thermore, some information belong to management

or optimization issues of the DHT. For example, load

balancing mechanisms may depend on data about the

average load of all peers. Finally, application-speciﬁc

knowledge have to be spread over the network, e.g.,

information about the synchronization of work in dis-

tributed web-crawlers. Most meta-data items have a

small footprint, are not needed in time, old ones be-

comes obsolete, and the number of data items is re-

ally large. In addition, the importance of data items

for a certain node depends from various factors, e.g.,

the workload of the node that issued it, or the dis-

tance (Kleinberg, 2000) to the addressee. Summing

up, propagation by peculiar messages would lead to

a very high network trafﬁc. Therefore, related tech-

niques known from distributed databases (epidemic

protocols etc.) can not be applied.

In this article we present a generic protocol that

piggybacks meta-data attachments to existing mes-

sages that the peers send out for data management

purposes, i.e., we use a variant of lazy dissemina-

tion (Rodrigues and Pereira, 2004; Boyd et al., 2005).

Based on the peer a message is forwarded to, each

peer selects a limited number of data items it deems to

be important for the receiving peer, and attaches them

to the message. On the other hand, each peer decides

to keep data items from incoming messages or not. If

the number of items exceeds the limit, a replacement

algorithm removes the most dispensable data items.

The broad spectrum of potential application scenarios

leads to a wide range of requirements for meta-data

dissemination protocols, e.g., small footprint, selec-

tive forwarding or time-constraints (cf. Sec. 2). Dif-

ferent strategies for item caching and dissemination

as well as their combinations lead to different results

regarding the behavior of DHTs (cf. Sec. 5). Fur-

thermore, we have observed that strategies sometimes

Buchmann E., Apel S. and Saake G. (2005).

PIGGYBACK META-DATA PROPAGATION IN DISTRIBUTED HASH TABLES.

In Proceedings of the First International Conference on Web Information Systems and Technologies, pages 72-79

DOI: 10.5220/0001230800720079

 SciTePress

differ only in some details, e.g., a threshold or some

additional constraints. Thus, our implementation

is based on modern software engineering methods,

namely Aspect-oriented Programming (AOP) (Kicza-

les et al., 1997) and mixin layers (Smaragdakis and

Batory, 2002). Furthermore, we discuss their advan-

tages and disadvantages in terms of reusability, con-

ﬁgurability and extensibility especially for the do-

main of DHTs. We evaluate our protocol on top of

a CAN implementation by experiments. The exper-

iments exemplarily compare different dissemination

strategies. Our results verify that the software engi-

neering approach allows to build variants of the pro-

tocol which meet the requirements of many different

applications.

The remainder of this paper has the following struc-

ture: After investigating in the characteristics of meta-

data in Section 2, we describe our piggyback meta-

data propagation protocol in Section 3. Section 4

presents our implementation based on mixin layers

and Aspect-oriented Programming, and Section 5 fea-

tures an experimental evaluation. This is followed by

a conclusion in Section 6.

2 CHARACTERISTICS OF

META-DATA IN DHTs

DHTs provide the data management operations of a

hashtable, in particular put and get. All operations

are addressed by a certain key. Each peer is responsi-

ble for a zone of the key space, and knows some other

peers. Variants of DHTs primarily differ in the topol-

ogy of the key space and the contact selection. Each

peer forwards any operation it cannot perform on its

own zone to a contact whose zone is closer to the key.

The way a node determines a closer peer depends on

the topology of the key space. DHTs similar to Chord

(Stoica et al., 2001) or CAN (Ratnasamy et al., 2001a)

organize as a circular key space in one or multiple di-

mensions, which is distributedamong all nodes. Here,

the peer forwards a message depending on the Euclid-

ean distance of the zones to the key of the operation.

In contrast, DHTs based on search trees like P-Grid

(Aberer, 2001) map each node to a certain subtree-

ID, and query forwarding follows subtrees with cor-

responding ID-preﬁxes. Usually, all DHT operations

are performed by invoking log(n) peers.

The absence of global knowledge makes it chal-

lenging to design DHT applications. There are two

straightforward ways to bypass the problem: (1) set-

ting up all necessary information at startup-time, and

(2) disseminating information by ﬂooding the DHT.

But while (1) limits the range of applications, (2)

limits the performance of the DHT. Before present-

ing our dissemination scheme, we want to investigate

common characteristics

of meta-data items used in

DHTs. Meta-data are used on three levels:

Meta-data of the DHT itself. DHTs forward mes-

sages between peers responsible for certain zones.

Therefore each peer needs to know a proper set of

contacts that changes whenever peers join or leave.

Thus, status information regarding zone boundaries

and contact information have to be updated. The

source of information is the new peer or the peer that

takes over the zone of a retiring peer. The addressee

can be neighbors of that peer (CAN, Chord), nodes

which follow a certain distribution (cf. (Kleinberg,

2000)) or other peers (Chord, P-Grid). Status updates

need to reach the addressee in time, but they are in-

frequent and irregularly generated.

Data related to enhancements of the DHT. There

are many different enhancements to the standard

DHTs which require different meta-data. These in-

formation are not needed in time, and have various

creators and receivers. For example, load balancing

(Rao et al., 2003) may take place in DHTs by forward-

ing messages along idle paths, reorganizing the key

space or changing the number of replicas of under-

or overloaded zones. Here, each peer requires infor-

mation about the average load (which may be rather

old) and more recent data about the load of all of its

contacts. Load information are disseminated at steady

rates.

Application-speciﬁc meta-data. The application

on top of the DHT may use the meta-data distribu-

tion scheme as well. For example, in a distributed

web crawler application the peers notify others about

the partitions of the WWW which they have already

crawled. A further example are keep-alive statements

in distributed groupware scenarios.

Summing up, nearly any application comes with

different requirements for a meta-data dissemination

protocol. The differences concern the origin of the

data item, the addressee and the frequency the data

items are generated. Some applications require meta-

data in time while others do not. This leads to two

important insights: (1) There is a strong demand for

a cheap protocol that satisﬁes the needs of many dif-

ferent applications, and works at smaller costs than

the traditional ﬂooding protocols. With cost we refer

to the amount of network trafﬁc, execution time and

memory consumption. (2) The number of required

protocol variants is very large. However, the variants

Each application comes with its very own set of re-

quirements. Thus we cannot come up with a comprehensive

list of features.

PIGGYBACK META-DATA PROPAGATION IN DISTRIBUTED HASH TABLES

only differ in some details. From an algorithmic per-

spective, shipping information for load balancing is

similar to the dissemination of contact information.

Thus we require a ﬂexible architecture which allows

the developers to assemble the protocol from a set

of well-deﬁned, disjunct and reusable software frag-

ments.

3 PIGGYBACK META-DATA

PROPAGATION IN DHTs

This section describes our dissemination protocol.

The protocol is intended to save resources by attach-

ing meta-data to the messages a peer sends out any-

how. This promises to be advantageous:

• The ratio of user data / protocol overhead will be

improved, in comparison to the use of dedicated

messages. The data items are attached to existing

messages, thus additional protocol headers, routing

addresses etc. are omitted. This is important for in-

formation with a small footprint.

• Routing and handling efforts can be reduced by

bundling multiple small data items to a single mes-

sage, that can be forwarded as a whole. By using

existing messages as container for meta-data, there

is no additional routing overhead at all.

Data structures. We now shortly introduce the re-

quired data structures. We use Java for our implemen-

tation. Each node is represented by a class Peer. Each

peer has the ability to send and receive messages by

invoking the instance methods send(Msg, Peer) and

recv(Msg). Each message implements the interface

Msg. Messages contain the addresser and addressee

of the message, and type-speciﬁc information like

query results. Each peer maintains a Cache contain-

ing the meta-data items it is aware of. The cache of-

fers methods for updating the content with entries at-

tached to incoming messages, replacing surplus meta-

data items and attaching items to messages. Meta-

data items are administered in a class MetaData. Each

MetaData-object contains the user data, and informa-

tion about the creator and the creation time. In addi-

tion, it can carry further information depending on the

protocol variant which is actually used.

Overview. The piggyback protocol works as fol-

lows: each peer stores all data items it has generated

in its own cache. As soon as a message containing

a set of attachments arrives, all attachments are ap-

pended to the cache. If the number of meta-data items

in the cache exceeds the limit, a certain algorithm re-

moves the most dispensable items. In the following,

we will refer to that group of algorithms as caching

strategies.

When a peer is about to forward a message, it de-

termines a subset of meta-data items from the cache

which contains the items that are approximately most

important for the receiver of the message, or should

be forwarded in the direction of the receiver, respec-

tively. This class of decision algorithms will be re-

ferred as dissemination strategies.

There are various possible strategies to decide

which item have to be removed from the cache or

which ones have piggybacked to messages. Tables 1

and 2 show some prominent examples. Note that not

all of them are meaningful in any DHT variant and

in any application scenario. We will come up with a

showcase evaluation of some of the strategies imple-

mented in a CAN in Section 5.

Table 1: Descriptions of the implemented caching strate-

gies.

Caching

Strategy

Description

Random The peer replaces its cache entries randomly.

FIFO The entries will be replaced which joined the cache at

ﬁrst. This is useful in settings where little numbers of

entries should be forwarded to many nodes.

Oldest First The cache entries with the oldest creation date will be

replaced ﬁrst. The strategy disseminates the most re-

cent items if its number exceeds the transport capacity

of the nodes.

Kleinberg-

replacement

(Kleinberg, 2000) tells us that a well-chosen set of

contacts improves the routing performance on a large

scale. Here, a node which is close in units of the key

space (e.g., a neighbor) is in the contact list of a cer-

tain other with a higher probability than a distant node.

The caching strategy reﬂects this: The entries are re-

placed so that the receivers of the remaining ones fol-

low a Gaussian distribution.

Most

Attached

Here, the peer counts how many times a certain cache

entry is transferred. Often-attached entries are replaced

ﬁrst.

Discussion. Our meta-data dissemination scheme is

intended to serve as a cheap lazy dissemination layer

on top of a DHT. It does not send out new messages;

thus the number of messages the data items can be

attached to is limited. Further limits are the maximum

number c of attachments carried by a single message,

and the size of the cache s. On the other hand, the

protocol is stressed by the rate r new data items are

generated.

Assume the query processing take place in rounds.

In each round each of the n peers in the DHT issues

and answers one query, and forwards f messages on

average. The query points are distributed randomly

WEBIST 2005 - INTERNET COMPUTING

Table 2: Descriptions of the implemented dissemination

strategies.

Dissemination

Strategy

Description

Simple Entries are attached randomly.

LIFO The entries which are appended last will be trans-

ferred ﬁrst. The strategy strongly prefers recent

cache entries, and send old ones only if there is free

transport capacity.

Remember

Receiver

The peer ships each entry exactly one time to each

of its neighbors, except for peers which have already

sent the entry to the peer. This strategy reduces the

number of duplicate transmissions.

Directed

Forwarding

The peer attaches items so that the distance to its

originator always increases. This is an adoption of

the greedy routing strategy used in the CAN.

Kleinberg

distribution

Entries are selected according to the distance be-

tween the peer which has created the item and the

receiver of the message (Kleinberg, 2000).

CAN-ﬂooding The strategy exploits the structure of the CAN to

transfer a cache entry to each node with a mini-

mum of entries sent repeatedly to the same node

(Ratnasamy et al., 2001b). A node that receives a

message from a neighbor in dimension i forwards

it in the opposite direction of that neighbor, and in

dimensions 1 · · · (i − 1).

P-Grid

ﬂooding

This strategy adapts the idea of CAN-ﬂooding to the

P-Grid. Here, entries are forwarded along the edges

of the routing tree.

over the key space. In this settings, the overall trans-

port capacity of our protocol is T =

n·f·c

. Suppose

each peer maintains a set of k contacts, and the dis-

semination strategy is intended to ship status updates

the peer has generated to each of them. Then the fol-

lowing holds: f · c = k· r. Now assume we use a CAN

(Ratnasamy et al., 2001a) implementation with a d-

dimensional key space. In CANs, each node main-

tains at least

k = 2 · d contacts, and each node for-

wards f =

· n

messages. Here, the maximum

number of status updates a peer can disseminate in

each round is r =

· n

. Thus, in CANs consisting of

at least 4096 nodes, a single attachment piggybacked

on each message allows for disseminating one status

update per round for free to all of the neighbors. Cal-

culations for other DHT variants yield similar results.

4 ARCHITECTURE, DESIGN AND

IMPLEMENTATION

The spectrum of potential applications and target plat-

forms as well as their requirements and characteris-

In a CAN where peers join and leave frequently, the

join-protocol results in a irregularly fragmented key space.

tics is extremely broad and diverse. Furthermore, we

have observed that strategies often differ only in de-

tails, e.g., a threshold or some additional constraints.

Out of these considerations we derived following re-

quirements on the implementation and integration of

strategies:

1. The implementation and integration of (new)

caching and dissemination strategies must be as

easy as possible.

2. It should be possible to integrate more than one

caching as well as dissemination strategy.

3. Different caching strategies must be combinable

with different dissemination strategies.

4. Strategy code implements a special-purpose con-

cern and should therefore be modularized and sep-

arated from the core DHT implementation.

5. A developer should be able to create new strategies

out of existing ones with minimal effort.

To meet these requirements we have investigated

in a novel architecture. It makes use of the software-

engineering concepts of AOP and mixin layers.

Aspect-oriented Programming. AOP was ﬁrst in-

troduced by (Kiczales et al., 1997). The aim of AOP

is to separate crosscutting concerns. It is well known

that the separation of concerns leads to maintain-

able, comprehensible, reusable and conﬁgurable soft-

ware (Kiczales et al., 1997). Common object-oriented

methods fail in this context (Kiczales et al., 1997;

Czarnecki and Eisenecker, 2000). The idea behind

AOP is to implement so called orthogonal features as

aspects separately. The core features are implemented

as components, as with common design and imple-

mentation methods. An aspect weaver brings aspects

and components together.

For our implementation we have used AspectJ

, a

language extension to Java. Mainly, we have used As-

pectJ to separate the code for the caching and dissem-

ination strategies from the core implementation of the

DHT. The left side in Figure 1 depicts a schematic de-

sign of several dissemination and caching strategies

implemented as aspects. It can be seen that the as-

pects are separated clearly. This allows us to design

a generalized protocol implementation, which does

not depend on the DHT which is actually used, and

increases the reusability and the ability of plugging

strategies.

Mixin Layers. During the design phase we have

observed that we need often only small modiﬁcations

of the caching and dissemination strategies to spec-

ify different protocol characteristics. For example,

from the algorithmic point of view the caching strat-

egy random differs from FIFO only in the order the

www.aspectj.org

PIGGYBACK META-DATA PROPAGATION IN DISTRIBUTED HASH TABLES

Aspect

Weaver

Aspect

use

Composition

use

Composition

Composed

Dissemination StrategyAspect

Dissemination

Generatorcompose

Mixin Layer

Dissemination Layers

Composed

Caching StrategyAspect

Caching

Mixin Layer

Caching Layers

weave

Core Implementation

P2P System

...

CAN Chord P−Grid

select

Figure 1: Generating and applying strategies using

mixin layers and AOP.

Experiment

Logging

FIFO

Caching

Random

Dissemination

Base

NoRedundant

Entries

TimeStamped

MetaData

Dissemination

Strategy

Dissemination

Strategy

Caching

Strategy

Meta

Data

Cache

MetaData MetaData

List

Figure 2: Stack of mixin layers for FIFO Caching and

Random Dissemination.

items are removed. Furthermore, some features are

common for many strategies, e.g., to keep track of the

peer a certain item was already sent to. Implementing

a new strategy in form of a separate aspect for each

variant would be inefﬁciently and errorprone, because

approved code is not reused. Therefore, we have

decided to combine the approach of Aspect-oriented

strategies with a mixin layer-based implementation.

A mixin layer is a static component encapsulat-

ing fragments of several different classes (mixins) so

that all fragments are composed consistently. Mixin

layers are an approved implementation technique for

component-based layered designs. Advantages are

the high degree of modularity and the easy compo-

sition (Smaragdakis and Batory, 2002).

We have used mixin layers to implement the vari-

ants of the different caching and dissemination strate-

gies and the AHEAD Tool Suite

to automatize the

conﬁguration and generation process. Doing, so we

had to decompose the mechanisms for caching and

dissemination into ﬁne-grained layers. Based on this

design we are able to compose these layers to gen-

erate different strategy variants. AOP is then used

to apply the conﬁgured strategies to the core of the

DHT. Figure 1 depicts the schematic overview of

conﬁguring and applying strategies to the DHT: At

ﬁrst, a generator (here the AHEAD Composer) com-

poses the strategies according to a (user) speciﬁcation.

Then, the composed strategies are applied to the DHT

core using aspects and the AspectJ Weaver. Using

mixin layers, we do not need multiple aspects. In-

stead, we need only two of them: one for applying

the caching strategies and the other for applying the

dissemination strategies. Figure 2 depicts the stack

of mixin layers implemented for random dissemina-

tion and FIFO caching. The enclosing grey boxes

are the mixin layers. The included rounded boxes

are the inner mixin classes. The ﬁgure depicts only

http://www.cs.utexas.edu/users/schwartz/Hello.html

the inheritance relations between the inner classes.

Other relations (e.g. associations) as well as refer-

ences to external classes are omitted. The Base layer

includes the needed data structures: MetaData, to

provide a base for application speciﬁc data, MetaDat-

aCache for storing the cached meta-data, MetaDataL-

ist to attach meta-data to messages, as well as the

strategy base classes (interfaces) CachingStrategy and

DisseminationStrategy. Tangible strategies (in our ex-

ample random dissemination and FIFO caching) im-

plement these interfaces to provide the desired algo-

rithm. The layer TimeStampedMetaData reﬁnes the

MetaData base by adding time stamps to each meta-

data object. The layer NoRedundantEntries prevents

the meta-data cache and the messages from storing

equal meta-data objects. ExperimentLogging creates

the logging dump which we used to analyze our ex-

periments.

The advantage of using mixin layers is that we can

easily derive and combine new variants of strategies.

To introduce a new caching strategy we have to im-

plement the interface CachingStrategy. Doing so, we

can remove the currently used strategy and apply the

new one. In this way we can easily combine caching

strategies with dissemination strategies. Furthermore,

we are able to reuse the base data structures as well

as existing strategy implementations. Imagine a prox-

imity FIFO caching strategy which combines FIFO

caching with the contact information of the peer. This

might be useful in order to limit the range of meta-

data to the contacts of a peer. If a cache entry was

generated by an immediate contact, the caching strat-

egy is FIFO. If not, the cache entries are dropped. The

proximity FIFO caching strategy reuses the overall

code of the simple FIFO strategy.

WEBIST 2005 - INTERNET COMPUTING

5 EVALUATION

Our evaluation is divided into two parts: The ﬁrst

one is intended to show that our protocol framework

and our software engineering approach is working

as envisioned. Therefore, we have used a CAN im-

plementation of our own, and have extended it by

several caching and dissemination strategies; and we

will come up with a detailed description of a simple

one. The second part proves that the protocol itself

meets the requirements for a meta-data dissemination

scheme. Here we assembled all meaningful proto-

col variants from the strategies we have realized for

the ﬁrst part, and evaluated it by the means of experi-

ments.

Evaluating the software design. The CAN imple-

mentation we have used for evaluation implements

all features of the originally proposed protocol (Rat-

nasamy et al., 2001a) as well as some proprietary

features (Buchmann and B

ohm, 2004a; Buchmann

and B

ohm, 2003). For our experiments we have im-

plemented the dissemination strategies Random, Di-

rected Forwarding and CAN-ﬂooding. In addition,

we realized the caching strategies Simple and FIFO.

Table 1 and 2 gives a short description. We expect

that AOP will be useful for the separation and the

easy plugging and combination of strategy code. A

feature-oriented view and mixin layers should help to

implement strategies highly conﬁgurable.

We start with the caching strategy. Here, the strat-

egy ”Random”, which is shown in Figure 3, serves as

a simple example. It collects the meta-data of incom-

ing messages to update the peers cache (MetaData-

Cache). Thereby it chooses the message entries as

well as replaces older cache entries randomly. The

random caching aspect (Line 1) introduces for each

Peer a cache for storing meta-data objects (Line 2,3).

The aspect intercepts calls to the method Peer.recv to

get access to incoming messages (Line 5-7). The at-

tachments of each incoming message are used to up-

date the peers meta-data cache (Line 11). The actual

implementation of the caching strategies is encapsu-

lated in the put method. We omit the detailed dis-

cussion of the random caching algorithm because its

implementation is straight forward.

Figure 4 depicts a dissemination aspect (Line 1)

for the dissemination strategy ”Simple”. It uses the

cached meta-data to disseminate them through the

DHT by attaching them to outgoing messages ran-

domly (cf. Tab. 2). At ﬁrst a list for attaching meta-

data objects is introduced to all messages (Line 2,3).

Moreover, the aspect intercepts all calls to the method

Peer.send which sends all outgoing messages (Line 5-

7). Doing so, the aspect has access to these mes-

sages and can add the peers cached meta-data ob-

jects (Line 8). The method piggyBack encapsulates

1 aspect RandomCachingAspect {

2 MetaDataCache Peer.m_meta_data =

3 new MetaDataCache();

5 before(Peer peer, Msg msg) :

6 target(peer) && args(msg) &&

7 c a l l (

Peer.recv(Msg)) {

8 put.(peer.m_meta_data,

9 msg.m_meta_data)); }

11 void update(Peer peer, MetaDataList mlist)

{

12 / / r e pla ces t he e n t r ies of a peers

cache

13 /∗ . . . by a random l i s t ∗ / }

14 }

Figure 3: The random caching aspect

the actual dissemination strategies (Line 10). In our

case the strategy chooses a random set of cached

meta-data (Line 14) and attach them to the outgoing

messages (Line 11,12).

1 aspect RandomDisseminationAspect {

2 MetaDataList Msg.m_meta_data =

3 new MetaDataList();

5 before(Peer peer, Msg msg) :

6 target(peer) && args(msg) &&

7 c a l l (

Peer.send(Msg)) {

8 piggyBack(peer, msg); }

10 void piggyBack(Peer peer, Msg msg) {

11 msg.attach(getRandomList(msg,

12 peer.m_meta_data)); }

14 LinkedList getRandomList(Msg msg,

15 MetaDataCache cache) {

16 /∗ r e turn s a random l i s t of ite ms ∗ / }

17 }

Figure 4: The random dissemination aspect.

The advantages of modularizing and separating

caching and dissemination strategies from each other

and from the core implementation of the underlying

CAN is that they can easily be applied to and re-

moved from the CAN. Furthermore, one is able to

combine different variations of caching and dissem-

ination strategies (cf. Sec. 5). Finally, the maintain-

ability and comprehensibility of the CAN core code

and the separated caching and dissemination strate-

gies is preserved.

We have implemented all mentioned caching and

dissemination mechanisms using the combined ap-

proach (AOP + mixin layers). Mainly, we have in-

PIGGYBACK META-DATA PROPAGATION IN DISTRIBUTED HASH TABLES

vestigated in this approach to get insight how mod-

ern software engineering methods can be exploited to

achieve modular, reusable, extensible and customiz-

able software component especially for distributed

(P2P) Systems. Our results are positive: We have

observed that the separation of special concerns like

the caching and dissemination of meta-data leads to

maintainable and comprehensible code. AOP and As-

pectJ are appropriate to separate code of special cross-

cutting concerns. Mixin layers allow to decompose

the strategies functionalities into modular combinable

building blocks (cf. Fig. 2). They support conﬁg-

urable, reusable and extensible implementations. The

combined approach reduces the complexity of imple-

menting new strategy variants and combinations and

therefor the expenditure of time.

The implemented mixin layers and aspects are

reusable in other DHTs because the rely only on a

simple interface containing the methods send and re-

ceive. These are available in most DHTs and other

unstructured P2P systems.

Evaluating the protocol We now want to show that

our meta-data dissemination protocol is working, and

is applicable to various settings. Due to the lack of

space, we limit our evaluation to three showcase ex-

periments investigating in the characteristics of the

dissemination strategies Random, Directed Forward-

ing and CAN-ﬂooding and the caching strategies Sim-

ple and FIFO.

All experiments are performed on a cluster of 32

loosely coupled Linux workstations equipped with 2

GHz CPU, 2 GB RAM and Fast Ethernet. We run ex-

periments with a four-dimensional CAN consisting of

10,000 peers issuing 1,000,000 randomly distributed

queries (see (Buchmann and B

ohm, 2004b) for an ex-

haustive description of our experimental setup.) All

evaluations are based on rounds. In one round each

peer issues and answers one query. The cache size is

set to 100 elements, and each message piggybacks up

to 10 meta-data items. At the beginning, each peer

owns one initial meta-data item.

Rounds

Number of nodes

0.2

0.4

0.6

0.8

Ratio of items

Figure 5: Item dispersion among the contacts.

At ﬁrst we investigate in the suitability of our proto-

col for status updating. Here it is important to publish

information about the state of a node to its contacts

(cf. Sec. 2). We have used the CAN-ﬂooding dis-

semination strategy and the cache strategy FIFO. See

(Ratnasamy et al., 2001b) for a detailed description

of CAN-ﬂooding. In CANs, the number of contacts

approximately follows a χ

-distribution; in our four-

dimensional setting the most nodes have 9 contacts.

Figure 5 shows the distribution of meta-data items.

The z-axis shows the ratio of meta-data items which

are known by a certain number of nodes at a given

time. For example, all items are known by one node

at round 0, and approximately the half of the items

are known by 9 nodes at round 5. The ﬁgure indicates

that the majority of the items have reached the con-

tact after at most 2 rounds. By considering that the

protocol ships such information for free, this is a very

positive result.

The next experiment examines the number of dupli-

cates sent by different dissemination strategies. Ob-

viously, due to the limited transport capacity ship-

ping many duplicates decreases the effectiveness of

the protocol. On the other hand, it would improve

the fault tolerance and the time to reach an adjacent

(in terms of hops between the nodes) node. Con-

sider the CAN-ﬂooding dissemination strategy for ex-

ample. Here, having a single failure near the cre-

ator of an item could prevent a large fraction of the

CAN from obtaining it. Figure 6 shows a comparison

of the dissemination strategies random, directed for-

warding and CAN-ﬂooding in respect to the number

of duplicates sent within 5 rounds. For example, af-

ter 5 rounds directed forwarding has transferred 1%

of all items more than times 500 to peers which al-

ready received them. It is obvious that the CAN-

ﬂooding strategy comes with the lowest number of

duplicates. In a CAN where all zones have the same

size, the number of duplicates would be zero. In re-

ality, the join-protocol leads to zones with different

sizes. Here, a few items are sent more than 500 times

to the same nodes twice. In contrast, it is surpris-

ing that the random dissemination strategy results in

less duplicates than directed forwarding. The expla-

nation is that directed forwarding prefers a few items

for each direction.

This leads to the question: how long does it take to

send items to a certain number of nodes? We address

this issue in the next experiment. Again, we use the

three different dissemination strategies. But now we

measure the time the items need to reach an arbitrar-

ily chosen number of 20 different nodes. We would

expect that the majority of items will be transferred

to the other nodes within one or two rounds, while

a few number of items will need a very long time.

The result of our experiment is shown in Figure 7.

Because it utilizes the underlying CAN forwarding

scheme to obtain a complete covering of all zones,

CAN-ﬂooding is the only strategy we have measured,

WEBIST 2005 - INTERNET COMPUTING

1e−04

0.001

0.01

0.1

0 500 1000 1500 2000 2500 3000

Ratio of items

Number of duplicates after 5 rounds

Random

Directed forwarding

CAN flooding

Figure 6: Duplicate items transferred within 5 rounds.

that disseminates all items to at least 20 nodes within

less than 5 rounds. However, by considering the be-

ginning of the curve, random and directed forwarding

are much faster. These strategies ship 1/3 of all items

to 20 nodes within the ﬁrst 1/4 round, while CAN-

ﬂooding requires a half round. Thus, in applications

where items would be useless if they do not reach its

destinations very quick, directed forwarding promises

to be an applicable strategy.

0.2

0.4

0.6

0.8

0 1 2 3 4 5 6 7 8 9 10

Rounds

Random

Directed forwarding

CAN flooding

Accumulated ratio of items

Figure 7: Time to disseminate an item to 20 nodes.

6 CONCLUSIONS

This article investigates in a novel lazy dissemination

protocol for various DHT variants. The protocol al-

lows to disseminate information free of charge, if they

are not needed in time and have a small footprint. Our

approach of combining mixin layers and AOP leads to

highly conﬁgurable, modularized and separated strat-

egy implementations. This combination makes it pos-

sible to cope with the wide spectrum of application

requirements. Because of the limited space, we have

shown only a few showcase experiments. However,

all of our experiments indicate that our protocol is

working as envisioned. In the future we want to ex-

amine our protocol in-depth on the basis of a speciﬁc

application.

REFERENCES

Aberer, K. (2001). P-Grid: A Self-Organizing Access

Structure for P2P Information Systems. LNCS, 2172.

Boyd, S. et al. (2005). GossipAlgorithms: Design, Analysis

and Applications. In Proceedings of the 24th Infocom.

Buchmann, E. and B

ohm, K. (2003). Efﬁzientes Routing

in verteilten skalierbaren Datenstrukturen. In Proc. of

the 10th BTW.

Buchmann, E. and B

ohm, K. (2004a). FairNet - How to

Counter Free Riding in Peer-to-Peer Data Structures.

In Proc. of the 12th CoopIS.

Buchmann, E. and B

ohm, K. (2004b). How to Run Ex-

periments with Large Peer-to-Peer Data Structures. In

Proc. of the 18th IPDPS.

Czarnecki, K. and Eisenecker, U. W. (2000). Generative

Programming: Methods, Tools, and Applications. Ad-

dison Wesley.

Gribble, S. D. et al. (2001). The Ninja Architecture for Ro-

bust Internet-Scale Systems and Services. Computer

Networks, 35(4).

Kiczales, G. et al. (1997). Aspect-Oriented Programming.

In Proc. of the ECOOP97.

Kleinberg, J. (2000). The Small-World Phenomenon: An

Algorithmic Perspective. In Proc. of the 32th STOC.

Rao, A. et al. (2003). Load Balancing in Structured P2P

Systems. In 2nd Int. Workshop on Peer-to-Peer Sys-

tems.

Ratnasamy, S. et al. (2001a). A Scalable Content-

Addressable Network. In Proc. of the ACM SIG-

COMM Conf.

Ratnasamy, S. et al. (2001b). Application-Level Multicast

Using Content-Addressable Networks. LNCS, 2233.

Rodrigues, L. and Pereira, J. (2004). Self-Adapting Epi-

demic Broadcast Algorithms. In FuDiCo II: S.O.S.

Survivability, Italy.

Rowstron, A. and Druschel, P. (2001). Pastry: Scalable, De-

centralized Object Location, and Routing for Large-

Scale Peer-to-Peer Systems. In Int. Conf. on Distrib-

uted Systems Platforms.

Smaragdakis, Y. and Batory, D. (2002). Mixin Layers: An

Object-Oriented Implementation Technique for Re-

ﬁnements and Collaboration-Based Designs. ACM

Transactions on Software Engineering Methodology,

11(2).

Stoica, I. et al. (2001). Chord: A Scalable Peer-To-Peer

Lookup Service for Internet Applications. In Proc. of

the SIGCOMM.

PIGGYBACK META-DATA PROPAGATION IN DISTRIBUTED HASH TABLES