DSP generation, such as Abele (2016) and Assaf et 
al. (2015), an approach which provides more 
detailed information about datasets, including 
descriptive, structural, and quality metadata is not 
found. In addition, some of them do not use 
vocabulary terms associated to the metadata 
provided by the profile. This allows to assign more 
meaning and a representation of the metadata which 
facilitates its consumption. 
6 CONCLUSIONS 
In this work, we have presented an approach for the 
generation of semantically enriched Dataset Profiles. 
To help matters, a DSP composed of descriptive, 
structural and quality metadata is proposed. During 
the DSP generation process, some metadata are 
extracted from the datasets, and, additionally, the 
dataset domain is identified and domain 
vocabularies are suggested. Furthermore, the process 
includes the generation of structural metadata and 
quality metadata, which proposes two IQ criteria to 
be measured as relevant and additional information. 
The main idea of providing enriched DSPs is to 
facilitate the communication between dataset 
publishers and consumers (humans and machines).   
In order to evaluate the proposed approach, a 
prototype has been implemented. It provides an 
automatic DSP generation process. The tool assists 
data producers who wish to make DSPs available to 
certain datasets. Dataset consumers can also 
generate a DSP, without the need of prior knowledge 
about the data. 
The experiments used datasets from different 
knowledge domains. They demonstrated that the 
proposed strategy produces good results, by 
allowing the generation of new metadata. 
Improvements were also observed with respect to 
the quality of the datasets after the DSP generation.  
As future works, we consider to include user 
feedback and other IQ criteria (e.g., completeness, 
correctness), to link the approach to an existing 
dataset catalog, and also to include in the DSP the 
recommendation of vocabularies for each identified 
structural metadata. New experiments with expert 
users and datasets belonging to a wider range of 
domains will also be accomplished. 
REFERENCES 
Abele, A., 2016. Linked Data Profiling: Identifying the 
Domain of Datasets Based on Data Content and 
Metadata, In: 25th International Conference 
Companion on World Wide Web. Canada, p. 287-291. 
Assaf, A., Senart, A., Troncy, R., 2016. An Objective 
Assessment Framework & Tool for Linked Data: 
Enriching Dataset Profiles with Quality Indicators, In: 
IJSWIS, International Journal on Semantic Web and 
Information Systems, Special Issue on Dataset 
Profiling and Federated Search for Linked Data, Vol. 
12, N°3, 2016, ISSN: 1552-6283 
Assaf, A., Troncy, R., Senart, A., 2015. Roomba: An 
extensible framework to validate and build dataset 
profiles, In: 24th International Conference on World 
Wide Web, Italy, p. 159-162. 
Baeza-Yates, R., Ribeiro-Neto, B., 1999. Modern 
Information Retrieval.  Addison-Wesley, First Edition. 
Clarke, M., Harley, P., 2014. How smart is your content? 
Using semantic enrichment to improve your user 
experience and your bottom line, Science Editor, Vol. 
37, N° 2, p. 40–44. 
Ellefi, M. B., Bellahsene, Z., Scharffe, F., Todorov, K., 
2014. Towards Semantic Dataset Profiling In: 
International Workshop on Dataset Profiling & 
Federated Search for Linked Data co-located with the 
11th Extended Semantic Web Conference. Greece. 
Ellefi, M. B., Bellahsene, Z., Todorov, K., 2015. 
Datavore: a vocabulary recommender tool assisting 
Linked Data modeling, In: 14th International 
Semantic Web Conference, Posters and 
Demonstrations Track, United States. 
Flemming, A. (2011). Quality Characteristics of Linked 
Data Publishing Datasources. Master's Thesis, 
Humboldt-Universität zu Berlin, Institut für 
Informatik. 
Heath T., Bizer C., 2011. Linked Data: Evolving the Web 
into a Global Data Space, 1st edition. Synthesis 
Lectures on the Semantic Web: Theory and 
Technology, 1:1, 1-136. Morgan & Claypool. 
Kaggle platform, 2018. Available at https://www. 
kaggle.com. Last access on june, 20
th
.  
Lalithsena, S., Hitzler, P., Sheth, A. P., Jain, P., 2013. 
Automatic Domain Identification for Linked Open 
Data. In: IEEE/WIC/ACM International Joint 
Conferences on Web Intelligence and Intelligent Agent 
Technologies, United States, p. 205-212. 
Lóscio, B. F., Burle, C., Calegari, N., 2017. Data on the 
web best practices. W3C, Version: https://www. 
w3.org/TR/2017/REC-dwbp-20170131/ Last Acess: 
march 20, 2018. 
LOV, 2018. Linked Open Vocabulary Repository. 
Available at https://lov.okfn.org/dataset/lov/. Last 
access on June 20
th
.   
Naumann, F., Rolker, C., 2000. Assessment methods for 
information quality criteria In: IQ, 5th Conference on 
International Quality. United States, p. 148-162. 
Ouksili, H., Kedad, Z., Lopes, S., 2014. Theme 
Identification in RDF Graphs, In: MEDI, International 
Conference on Model and Data Engineering. Cyprus, 
p. 321-329. 
Pipino, L. L., Lee, Y. W., Wang, R. Y. (2002) Data 
Quality Assessment. In: Communications of the ACM