AUXILIARY STORAGE AND DYNAMIC CONFIGURATION
FOR OPEN CLOUD STORAGE
Jincai Chen, Yangfeng Huang, Minghui Lai and Ping Lu
College of Computer Science and Technology, Wuhan National Laboratory for Optoelectronics
Huazhong University of Science and Technology, Wuhan, 430074, China
Keywords: Cloud Storage, Two-Tier Proxy, Auxiliary Storage, Dynamic Configuration.
Abstract: Along with the rapid development of cloud computing, cloud storage is also gradually warming. More and
more users and corporations are planning to use cloud storage services. At present, however, cloud storage
service technology is still facing many problems. Firstly, the current cloud storage systems only belong to
some specific cloud storage services providers and are enclosed to other cloud storage services providers.
Secondly, the growth of network transmission speed is relatively slow, which is difficult to transfer large
amounts of data in a given time. Finally, the current underlying storage architecture of cloud storage can not
be dynamically configured as required. For this reason, this paper presents an open architecture model of
cloud storage, which allows users to choose suitable cloud storage providers through the two-tier proxy. The
system can effectively reduce the response time of the users’ requests through using the geographic distribu-
tion auxiliary storage nodes to store hotspot data. The underlying storage architecture of data storage centers
can simultaneously adopt the Master-Slave architecture and the P2P architecture, which can hence own the
advantages of both two architectures.
1 INTRODUCTION
Nowadays along with the development of cloud
storage technology, service as you need and pay as
you go make more and more people consider using
cloud storage services. And there are also much re-
search which is related to cloud storage, such as en-
ergy consumption (Harnik .etc), storage architecture
(Abu-Libdeh.etc, 2010, Bowers.etc, 2009) and so on.
However, if we want cloud storage service to be
completely adopted, there are still many problems
which need to be solved. The detail as follows,
The existing cloud storage systems are still using
an enclosed structure, which can only support data
storage services offered by the particular cloud stor-
age services provider (CSSP). Moreover, data re-
sources cannot be reliably exchanged and shared
among various CSSPs.
The traditional network storage usually adopts
the communication mode that users directly com-
municate with data centers. This method is feasible
when the number of users is less and the volume of
data is small. However, the number of users in cloud
storage system is increasing quickly and the growth
of network transmission speed is relatively slow, the
response time of user requests will be very long and
the cloud storage service quality will be influenced.
To date, the typical architectures of cloud storage
is divided into two kinds. One is Master-Slave stor-
age architecture, such as Google file system. (Ghe-
mawat.etc,2003). The mainly advantages of this in-
clude the convenient system maintenance and the
easy synchronization and updates of data. The other
is P2P storage architecture, e.g. Amazon’s Dynamo.
(Decandia.etc, 2007). The major advantages of this
contain much less hotspot data and without single
point failure and so on. So far there is no such a
cloud storage system which can own the advantages
of both two architectures.
In this work, we present a cloud storage architec-
ture, which can effectively solve the current prob-
lems of cloud storage. The contributions of this pa-
per are:
(1) We present a cloud storage architecture
model, which make the cloud storage architecture
open through the two-tier proxy so that users can use
cloud storage service provided by multiple CSSPs.
(2) We assign multiple ASNs in which the hot-
spot data stored around the DSC so that the system
520
Chen J., Huang Y., Lai M. and Lu P..
AUXILIARY STORAGE AND DYNAMIC CONFIGURATION FOR OPEN CLOUD STORAGE.
DOI: 10.5220/0003390705200524
In Proceedings of the 1st International Conference on Cloud Computing and Services Science (CLOSER-2011), pages 520-524
ISBN: 978-989-8425-52-2
Copyright
c
2011 SCITEPRESS (Science and Technology Publications, Lda.)
can effectively reduce the response time of the user’
request and perceive change of user environment.
(3) We present an underlying storage architecture
which can own the advantages of both the P2P stor-
age architecture and the Master-Slave architecture.
This new storage architecture can be dynamically
configured.
The remainder of the paper is organized as fol-
lows: in Section 2, we consider the scope of cloud
storage architectures and propose a comprehensive
framework; In Section 3, we analyze consistency of
the cloud storage; In Section 4, we discuss some
migration issues; Finally, we summarize our find-
ings and outline directions for future work.
2 THE CLOUD STORAGE
ARCHITECTURE
In this section, we will mainly expound the cloud
storage architecture model. This model has several
primary features: (a) Make cloud storage an open
architecture model by using the two-tier proxy. (b)
Reduce the response time of user’ requests and per-
ceive the changes of user environment by the use of
ASNs. (c) Adjust the underlying storage infrastuc-
ture dynamically according to the system demand.
2.1 Two-Tier Proxy
2.1.1 Selection of CSSP and DSC
This cloud storage system is designed to be open,
and will no longer be confined to a particular CSSP
and specific DSC. As shown in Figure 1, users select
the CSSP through Tier 1 proxy nodes, and choose
the suitable DSC which belongs to certain CSSP
through Tier 2 proxy nodes.
Tier 1 proxy node, which do not belong to any
specific CSSP, is managed by neutral institutions,
such as government departments. Tier 1 proxy nodes
contain all the information of each CSSP, such as
capacity, storage cost, access speed and credit and so
on, and select the appropriate Tier 2 proxy according
to the cheaper cost or faster speed and so on. In ad-
dition, Tier 1 proxy nodes also need to store users’
account information, such as username, password,
and record that is the corresponding relationship
between the data and the CSSP, etc. Compared with
the current account information which is stored in a
particular CSSP, it is stored in a neutral body will be
more secure. In order to avoid heavy load in Tier 1
proxy nodes, they just store these two kinds of data.
Figure 1: Two-Tier Proxy.
Data information stored in each Tier 1 proxy
node is the same. Tier 1 proxy nodes receive user-
name, password and other information submitted by
users, and then connect to the Tier 2 proxy nodes
which link with each CSSP storing the user’s data.
Tier 1 proxy nodes respectively generate a user
name and a password for each CSSP which provide
storage services for user.
Tier 2 proxy nodes, which belong to some par-
ticular CSSP, store various information relevant to
DSC, such as geographical position, bandwidth, ex-
pense, DSC in which data are stored, some account
information including username and password
automatically generated by Tier 1 proxy nodes, and
so on. Besides, It also select the suitable DSC ac-
cording to the position or bandwidth and so on.
After using two-tier proxy, though CSSP have
username and password, they still do not know who
the data belong to, thus it greatly improve the safety.
When users log in Tier 1 proxy nodes through a
browser, Tier 1 proxy nodes display all the relevant
information such as which CSSP data are stored in,
how much data are stored separately, how much to
spend respectively, and other information. The user
can manipulate the data later.
……
……
……
Figure 2: Requests Distribution.
AUXILIARY STORAGE AND DYNAMIC CONFIGURATION FOR OPEN CLOUD STORAGE
521
2.1.2 Requests Distribution
Another effect of this two-tier proxy is the distribu-
tion of users’ requests. As shown in Figure 2. When
the load of a Tier 1 proxy node exceeds a certain
threshold, part of users’ requests is forwarded to
other Tier 1 proxy nodes to make the load of this
Tier 1 proxy node return to the normal level. This is
the first request distribution. The second request
distribution of Tier 2 proxy nodes also like this.
2.2 Auxiliary Storage Node (ASN)
2.2.1 Response Time
As shown in Figure 3, each CSSP has many DSCs
which are distributed around the world. In each
DSC, there are a number of ASNs. These deployed
around the DSC consist of the geographic distribu-
tion of small storage network.
ASNs store users’ data that is the use frequency
which exceeds a certain threshold, namely the hot-
spot data, one replication of which is stored in the
ASNs and other ones of which are stored in the
DSC. Threshold can be dynamically adjusted ac-
cording to the use of the capacity of ASN, not to
such an extent as to waste storage space for less data
stored.
To choose ASNs for DSC, it mainly consider
about the location, access speed, load, and several
other aspects. Hotspot data can be stored in the near-
est ASN away from the owner of the data or lighter
load ASNs. Furthermore, under the long tail theory,
in the most time users are using a small part of the
data, thus most of the data are particularly rarely
used. As a result, there is no need to interconnect
with DSC dealing with hotspot data, which can be
processed directly in the ASN. If so, it will greatly
improve the response time of users’ requests, and
reduce the load on the DSC. The difference from the
DSC
ASN
ASN
Figure 3: The Relationship of DSC and ASN.
CDN is that users’ hotspot data is stored in ASNs
and which can be dynamically changed as the envi-
ronment and requirement of users.
2.2.2 User-aware
Setting the ASNs to reduce the response time of us-
ers’ requests, this is in terms of the specific envi-
ronment for the user. DSC should be able to make
corresponding adjustment according to the user en-
vironment changes. For example, user U located in
P1 often uses hotspot data D1 stored in ASN A1, but
when user U moves to P2, the hotspot data used is
likely D2. In addition, even if the hotspot data D1
have not been changed, the ASN A1 may also be
inappropriate due to the geographic environment.
Therefore, the DSC should be able to sense the
changes of the user environment, and select the most
appropriate ASNs for storage of hotspot data in or-
der to achieve faster response time and reduce the
load of the DSC.
2.2.3 Security
Besides, in order to avoid data being deleted due to
misoperation, when the data waiting to be removed
is stored in ASNs, if the user sends a deletion re-
quest, this data which is stored in ASNs are deleted
and all replications stored in DSC are deleted except
for one. Then the user sets a deadline for the remain-
ing copy, only when the deadline has expired, the
user data can be removed. It aims to facilitate users
to quickly recover those data frequently used.
2.3 Dynamic Configuration of
Underlying Storage Infrastructure
2.3.1 Mechanism of Dynamic Configuration
To date, it has not been reported in the publication
that there is a cloud storage system, which can use
Master-Slave and P2P structures. The underlying
storage architecture designed in this paper will be
able to simultaneously adopt both structures and
dynamically adjust according to the system configu-
ration parameters.
As Figure 4 shows, the Master-Slave storage
network as a whole is added to P2P storage network,
and all of the storage nodes are virtual nodes in the
DSCs. These virtual nodes are abstract nodes which
are formed by deploying virtualization software
(such as XEN (Barham.etc, 2003)) in the physical
nodes. It aims at obtaining better scalability, good
isolation, and easy migration. When the system
boots, (1) Get configuration information from the
CLOSER 2011 - International Conference on Cloud Computing and Services Science
522
configuration file firstly, and then configure virtual
nodes with no need for configuring their structure to
Master-Slave to P2P storage network. (2) Secondly,
set some of the remaining nodes to be one storage
network with Master-Slave structure or more. (3)
Finally, join Master-Slave storage network(s) into
P2P storage network. P2P storage network can be
structured by virtual nodes or Master-Slave storage
network respectively, or by the both.
Figure 4: Master-Slave Storage Network and P2P storage
Network.
2.3.2 Routing Table and Metadata
As Figure 4 shows, the Master-Slave storage net-
work is composed by virtual management node
(VMN) and virtual storage nodes (VSNs). VMN
stores metadata information, while VSNs store user
data. In P2P storage network, all the nodes are vir-
tual p2p nodes (VPNs), which save routing table
information and user data. Figure 5 is a structure
schematic drawing of metadata and routing table.
In P2P storage networks, the routing table in or-
dinary VPNs has some location and routing informa-
tion, besides, it still contains two fields, respectively
use frequency and ASN position information. The
routing table in P2P storage networks is used to
route and locate VPN which stores the data. Use
frequency is to point out the number of using this
block of data in certain period. In the routing table
the ASNs’ location information fields record the
relationship between the hotspot data and the ASNs.
Use frequency and ASNs’ positional information of
routing table is for VPNs of P2P storage network
concerned.
In the Master-Slave storage network, a VMN is
also as a VPN. A VMN not only contains routing
table, but also includes metadata information, which
contains the frequency of utilization, ASNs’ location
information and the list of replications’ position in-
formation. Different from the use frequency and
ASN’s location information in the routing table,
these in the metadata informations aim at VSNs in
the Master-Slave storage network. The lists of repli-
cations’ location information are used to record the
position of each replication. All copies of the same
data may have been stored in the VSNs, and some
may be stored in the VPNs.
……
Figure 5: The Structure of Routing Table and Metadata.
Besides, according to CAP theory, consistency,
availability and network partition can not simultane-
ously satisfy. In order to have better availability and
avoid network partition, the consistency of P2P and
Master-Slave storage network between each copy
employs eventually consistent.
3 CONSISTENCY OF ASN
AND DSC
Addition: When DSC receives the user’s addition
request, the system will connect with ASN of the
DSC and make a comprehensive assessment from
the load, storage capacity and access speed to find
the most appropriate ASN to storage new data. After
choosing the most appropriate ASN, the system will
compare it with the DSC to determine where the
new data should be stored. If the data are stored in
the ASN, The ASN needs to make a temporary
backup for the data. After the addition operation
have finished, the ASN send a message to the corre-
sponding DSC. The DSC adds the metadata of this
data, and modifies the location information of ASN
and other related content. The ASN will do data
synchronization with DSC after a moment and delete
the temporary backup. After a period of time, if use
frequency of this data has not achieved the system
setting threshold, the DSC will notify the ASN to
AUXILIARY STORAGE AND DYNAMIC CONFIGURATION FOR OPEN CLOUD STORAGE
523
delete the data and modify the ASN’s location in-
formation.
Update: When users need to update their data, they
send their update command to the DSC. If the data
that need to be updated are stored in the ASN, the
system will compare the ASN with the DSC. If the
load of the DSC is lighter, the DSC will deal with
the update request. Otherwise, the ASN will do it.
After the ASN modify the data, it should update the
use frequency of the data in the DSC. Then the data
synchronization will happen between the ASN and
the DSC. Whether or not the data modification op-
eration happens in the ASN or DSC, the system will
check the metadata of the data according to the pre-
set strategies.
4 VMS MIGRATION
When the DSC adjusts the underlying storage archi-
tecture, data migration has two kinds, one is data
migrate from P2P storage network to Master-Slave
storage network, the other is contrary. For the first
kinds, when a common VPN needs to migrate to
Master-Slave storage network, the node only needs
to exit the P2P storage network and join the Master-
Slave storage network as a new node. Then the
VMN of the corresponding Master-Slave storage
network updates the related metadata information of
each data block in the migrated node and the meta-
data of the use frequency and the location informa-
tion of the ASN will be reserved in the virtual man-
agement node. The system will delete the original
routing table of this node which is used in the P2P
storage network, but the data information of users
will not be deleted. So these data can be visited both
from the P2P storage network and the Master-Slave
storage network. For the second kinds, when the
VSNs of the Master-Slave storage network needs to
migrate to the P2P storage network, the node only
needs to exit the Master-Slave storage network and
join the P2P storage network as a new node. Then
the system will initialize the routing table. Besides,
the use frequency and the ASN’s location informa-
tion in the metadata will be copied into this route
table from the VMN.
5 CONCLUSIONS
The main research content of this paper is that we
have presented a open cloud storage architecture
model which can dynamically configure the underly-
ing storage architecture and process the hotspot data
through ASNs. At last, We have discussed the con-
sistency and migration problems of cloud storage
system.
For the future work, we plan to research our pro-
posed architecture in the following two ways, (1)
building the model of the ASNs and simulating with
Cloudsim (Buyya.etc, 2009) and neural network, and
(2) building the model of the underlying storage
architecture and simulating through the P2P simula-
tion tools such as P2Psim (Montresor.etc, 2009).
ACKNOWLEDGEMENTS
We thank Mrs. Ning Wang and Mr. Ming Chen for
their helpful discussions. This work was supported
by the National High-Tech Research and Develop-
ment Plan of China under Grant No.2009AA01A402,
the Natural Science Foundation of Hubei Province
of China under Grant No.2010CDB01601, and the
Fundamental Research Funds for the Central Uni-
versities of China under Grant No.
HUST2010MS065.
REFERENCES
Ghemawat, S., Gobioff, H., Leung, S., 2003. "The Google
file system," ACM SIGOPS Operating Systems Review,
vol. 37, no.5, pp. 29-43.
Decandia, G., Hastorun, D., Jampani, M., .etc, 2007. "Dy-
namo: Amazon's highly available key-value store," in
Operating Systems Review (ACM), pp. 205-220.
Harnik, D., Naor, D., Segall, I., 2009. "Low power mode
in cloud storage systems," in IEEE International Sym-
posium on Parallel & Distributed Processing, pp. 1-8.
Abu-Libdeh, H., Princehouse, L., etc, 2010. "RACS: A
case for cloud storage diversity," in the 1st ACM Sym-
posium on Cloud Computing, pp. 229-239.
Bowers, K., Juels, A., Oprea, A., 2009. "HAIL: A high-
availability and integrity layer for cloud storage," in
the ACM Conference on Computer and Communica-
tions Security, pp. 187-198.
Pallis, G., Vakali, A., 2006. "Insight and perspectives for
Content Delivery Networks," Communications of the
ACM, vol. 49,no.1, pp. 101-106.
Barham, P., Dragovic, B., Fraser, K., Hand, S., .etc, 2003.
"Xen and the art of virtualization," in Operating Sys-
tems Review (ACM), pp. 164-177.
Buyya, R., Ranjan, R., Calheiros, R., 2009. "Modeling and
simulation of scalable cloud computing environments
and the cloudsim toolkit: Challenges and opportuni-
ties," in International Conference on High Perform-
ance Computing and Simulation, pp. 1-11.
Montresor, A., Jelasity, M., 2009. "Peersim: A scalable
p2p simulator," in IEEE P2P'09 - 9th International
Conference on Peer-to-Peer Computing, pp. 99-100.
CLOSER 2011 - International Conference on Cloud Computing and Services Science
524