
 
and communicate best practices of data curation for 
other stakeholders (research papers authors in the 
last example). The consistent and clearly formulated 
framework will make a collaborative data curation 
effort much better defined and communicated, and 
the best data curation practices more readily adopted 
by the research community. Supervision of various 
kinds of information through the research lifecycle 
will help then to create rich data aggregations and 
reproducible research workflows with contributions 
naturally made by different lifecycle stakeholders. 
The next challenge and opportunity is presented 
by the emergence of research services such as the 
aforementioned UK National Crystallography 
Service. This trend raises questions on the user 
management, research proposals management and 
data management in facilities science. Just one 
example of that are the future role and the content of 
data management policies which some facilities tend 
to impose on their users as a pre-condition for 
getting a facility resource for research. The policy 
may ask users to agree with the public release of 
their experimental data after a period of exclusive 
access (typically a few years), or contain the 
requirement to submit the list of resulting 
publications back to the facility user office. This 
works well in a traditional business model of 
facilities science but does not take into account the 
emergence of the service intermediaries who may 
need to be a subject of the data management policy, 
too, so that it becomes a multilateral agreement.  
The data management policy format which is 
now just plain text is also questionable as it is not 
interpretable without a human; this will be likely not 
enough for the automated research proposals 
management and data release management across 
different facilities. The development of licences for 
data re-use, or the adoption of suitable ones could 
alleviate the problem but licences might need a 
proper machine-oriented modelling for policy 
enforcement; the indication of what is possible in 
respect to structured modelling and automation of 
data licences can be seen in the recent formation of 
the Linked Content Coalition 
(www.linkedcontentcoalition.org) endorsed by the 
European Commission and some national 
governments. Again, information departments of 
large research facilities might consider borrowing 
the advanced practices and models of data licensing 
for their re-use in facilities science. 
Another important consideration is the 
interoperability of metadata models and their actual 
implementations for different research facilities. The 
idealized metadata model for facilities science that 
we call Core Scientific MetaData (CSMD) 
(Matthews et al., 2012) is derived from a generic 
research lifecycle in facilities science: 
 
Figure 1: Generic research lifecycle in facilities science. 
The different stages of research lifecycle produce 
data artefacts (research proposals, user records, 
datasets, publications etc.) that are similar across 
research facilities so having a common metadata 
model like CSMD seems sensible. However, it may 
be applied differently by different facilities; there are 
a few CSMD implementations in data catalogues 
across Europe by virtue of the ICAT platform 
(http://code.google.com/p/icatproject/) but the 
model, and the actual use of its elements may vary 
among implementations. This may result in extra 
design and implementation overheads when we 
consider federated services for a few facilities (even 
when based on the same software platform), also 
there is no guarantee that once we have the federated 
solution agreed and implemented, it will be not 
affected sooner or later by the diverging business 
needs of different participants. The common data 
curation framework for facilities science might help 
to have these needs permanently monitored, properly 
communicated and effectively reconciled thus 
serving as a well-structured business analysis 
wrapper for technology solutions. 
An interesting development that may be 
considered a part of the emerging data curation 
framework but has exposed certain challenges, too, 
is the recent effort of minting Digital Object 
Identifiers for investigations performed on ISIS 
neutron facility (Wilson, 2012). Having permanent 
identifiers minted for particular investigations 
(experiments) should be enough for linking them to 
datasets and publications but in order to have a 
structured and linkable representation of a facility 
research environment, other parts of it such as 
scientific instruments, experimental techniques, 
people, organizations, software, derived data sets 
etc. need minting or borrowing identifiers for them, 
too. There is currently no sustainability model for 
this activity, as well as for the steady production and 
support of landing Web pages where the permanent 
identifiers (all kinds of them) should ideally resolve 
into. The different aspects – modelling, 
technological, operational – of the permanent 
identifiers management should be an important part
DataCurationFrameworkforFacilitiesScience
213