Consistency and Interoperability on Dublin Core Element Values in Collections Harvested using the Open Archive Initiative Protocol for Metadata Harvesting

Sarantos Kapidakis


When resource descriptions use the exact same value for an entity, this value is easier parsed, identified and utilized by automatic procedures. The use of controlled values, even when it is common and very useful, it is usually not enforced during the data entry. In this paper we study the use of the controlled values in many harvested collections and we study all Dublin Core elements and also their similarity. We mainly focus in the element language, as there is a lot of standardization on how to denote language values, followed by other elements that normally use controlled values. We discovered values that are repeated many times and in many collections and many more values that are used only once! The lack of coordination among collections during their creation results to many variations for each value, even when the value is used consistently and many times inside a collection. The study uses dendrogram to reveal the current usage of the Dublin Core elements inside and among active collections by clustering the collections with similar values and helps adopting better guidelines, designing better tools and improving the effectiveness of the collections.


Paper Citation