Author:
Sarantos Kapidakis
Affiliation:
Department of Archival, Library and Information Studies, University of West Attica, 28 Agiou Spyridonos Str., 12243 Aegaleo, Athens, Greece
Keyword(s):
Dublin Core, Metadata, OAI-PMH, Harvesting, Language, Controlled Vocabularies, Controlled Terms, Repeated Values, Linked Open Data, Dendrogram.
Abstract:
When resource descriptions use the exact same value for an entity, this value is easier parsed, identified and utilized by automatic procedures. The use of controlled values, even when it is common and very useful, it is usually not enforced during the data entry. In this paper we study the use of the controlled values in many harvested collections and we study all Dublin Core elements and also their similarity. We mainly focus in the element language, as there is a lot of standardization on how to denote language values, followed by other elements that normally use controlled values. We discovered values that are repeated many times and in many collections and many more values that are used only once! The lack of coordination among collections during their creation results to many variations for each value, even when the value is used consistently and many times inside a collection. The study uses dendrogram to reveal the current usage of the Dublin Core elements inside and among ac
tive collections by clustering the collections with similar values and helps adopting better guidelines, designing better tools and improving the effectiveness of the collections.
(More)