
 
persistent identifier are uniqueness, which can be 
addressed by defining namespaces or using special 
identifier generation strategies, and resolvability, 
which means that the identifier can be resolved per-
sistently. Further important properties in the context 
of PID systems are, e.g. the association of metadata 
with the identifier, the ability to incorporate legacy 
identifiers or identifiers of other types, or the han-
dling of versioning, granularity and management of 
the PIDs (Ball and Duke, 2012). In general, we can 
distinguish two categories among the systems as-
signing persistent identifiers: systems that store 
metadata associated with the PID and systems that 
do not store metadata. The main part of the systems 
storing metadata has a basic metadata schema, 
which often consists of Dublin Core elements.  
The DOI Foundation provides a managed resolu-
tion system for identifiers. A DOI name may be 
represented as a URL by prefacing the string 
http://dx.doi.org/ to the DOI of the document (e.g., 
the DOI name 10.4232/1.11380, can be resolved by 
http://dx.doi.org/10.4232/1.11380). 
One of the biggest PID systems is Crossref 
(Crossref, 2012), which is mainly registering DOI 
names for different literature types. DataCite is reg-
istering DOI names, but their focus is on PIDs for 
datasets. DataCite also provides a very general 
metadata schema for datasets of all types. Further-
more, several institutions exist, e.g. national librar-
ies, which allow registration of URNs (Daigle et al., 
2002) for publications. We build our system on the 
services provided by DataCite, since the purpose of 
DataCite is to promote science and research, which 
perfectly matches our use cases. Thus, we use DOI 
names as PIDs (Hausstein, 2012).  
3 METADATA SCHEMA 
The main goal of the da|ra information system is the 
registration of scientific social and economic da-
tasets and to allow for searching for metadata of 
research datasets. Typical data in social sciences is 
empirical primary data from survey research, histor-
ical social research and texts for content analyses. 
The typical economics data is statistical data collect-
ed with surveys of individuals, companies or states 
but also data representing experiment results. 
The main requirements when developing the 
da|ra metadata schema to describe the data were the 
following: (1) Interoperability with other standards 
such as the DDI metadata specification (DDI, 2012) 
and the Dublin Core Metadata Initiative (DCMI); (2) 
Quality assurance of metadata; (3) Sustainability, 
e.g. the availability for semantic web applications. 
The metadata schema of da|ra is implemented as 
XML Schema Definition (Hausstein et al., 2012) and 
is partially based on the metadata schema of the 
Metadata store of DataCite (Starr et al., 2011). As 
we are interfacing with the DataCite services, we 
incorporated all required fields of the metadata store 
schema in our schema, but also adapted and intro-
duced new fields. The following fields are consid-
ered as the minimal set of fields required for a cita-
tion of a dataset: Title; Principal Investigator; Publi-
cation Agent; DOI; URL; Publication Date. Since 
da|ra does not store the data itself but only the 
metadata, the mandatory field ‘Availability’ addi-
tionally holds information about the access status of 
the dataset.  
The da|ra schema includes 28 optional fields to 
give users the possibility to describe social and eco-
nomic science data in detail, e.g. by fields such as 
Data Collector, Sampled Universe, Sampling, Tem-
poral Coverage, Time Dimension, Collection Mode, 
Data, and Publication. These additional fields also 
increase the visibility of the datasets and make them 
easier to be found by a domain expert. 
In the da|ra system, editing of metadata is sup-
ported by controlled vocabularies in order to support 
quality assurance and standardization. Hence, some 
fields of the da|ra metadata schema accept only 
values from controlled vocabularies from the social 
and economic sciences, such as TheSoz (Thesaurus 
Social Sciences) (Zapilko et al., 2012) or STW 
(Thesaurus for Economics)
  (Gastmeyer, 1998). For 
each controlled field there exists also a free text field 
to increase flexibility. 
Versioning and granularity are issues in the con-
text of persistent identifiers. In da|ra, we offer a 
comprehensive versioning mechanism and let the 
publication agents decide how to use it. For exam-
ple, publication agents can register a new DOI name 
for each version of the metadata or update the exist-
ing metadata in order to, e.g., remove typos. Publica-
tion agents are also free to decide on the granularity 
of the datasets, which means that it is also possible 
to assign a DOI name for a package, e.g. a CD con-
taining several datasets. 
4 SYSTEM ARCHITECTURE 
In this section, we give an overview over the archi-
tecture of the da|ra information system. The architec-
ture of our system is visualized in Figure 1. On the 
left, we see the two types of user groups, Publication 
Agents  and  Researchers. The  main  difference be- 
WEBIST2013-9thInternationalConferenceonWebInformationSystemsandTechnologies
156