supervised machine learning to map entity attributes 
via dictionaries to predict instance entity types. 
´Particularly, this method uses a way to map 
attributes from different domains to a common set 
regarding people, location and organization types. 
They show how to reduce work when using entity 
type information as a pre-filter for instance matching 
evaluation.   
Bernardo et al. (2012) propose an approach to 
integrate data from hundreds of spreadsheets 
available on the Web. For that, they make a semantic 
mapping from data instances of spreadsheets to 
RDF/OWL datasets. They use a process that 
identifies the spreadsheet domain and associates the 
data instances to their respective class according to a 
domain vocabulary.  
Taheriyan et al. (2014) use a supervised machine 
learning technique based on Conditional Random 
Fields with features extracted from the attribute 
names as part of a process to construct semantic 
models. The goal is to map the attributes to the 
concepts of a domain ontology and generate a set of 
candidate semantic types for each source attribute, 
each one with a confidence value. Next, an 
algorithm selects the top k semantic types for each 
attribute as an input to the next step of the process.  
Tonon et al. (2013) propose a method to find the 
most relevant entity type given an entity (instance) 
and its context. This method is based on collecting 
statistics and on the graph structure interconnecting 
instances and types. This approach is useful for 
searching entity types in the light of search engines.  
Comparing these works with ours, in our work 
we are interested in identifying the entity types to 
give more semantics to RDF generated datasets. 
Also, we use a semantic matcher to identify the 
vocabulary terms which are associated with the 
structural metadata from the converting dataset. 
Differently from the presented related works, the 
entity type is defined as the one which has the max 
number of property occurrences. This is recognized 
according to the semantics provided by domain 
vocabularies which have been chosen by a DE. 
Although there is such dependency, our work may 
be used in any data domain.  
7 CONCLUSIONS 
We presented a data domain-driven approach to 
converting semi-structured datasets, particularly in 
JSON formats, to RDF. By using the semantics 
underlying the domain of the data, it makes the 
conversion process less demanding. It attempts to 
automate as much of the conversion process by 
maintaining a domain alignment composed by 
correspondences between the converting metadata 
(properties) and the domain terms, and reusing it in 
each new conversion process. Also, in order to 
enrich the target generated RDF dataset, the object’s 
entity types are identified and included in the code.    
Accomplished experiments show that our 
approach is promising. By using the domain 
vocabularies, it is able to produce complete RDF 
datasets w.r.t. the original source data. Furthermore, 
it identifies in almost 77% the most appropriate 
entity type for a given object.  
As future work, we intended to extend the 
approach and tool to deal with CSV files. 
Furthermore, we intend to use the MET recognition 
process to assist a coreference resolution task when 
integrating some datasets at conversion time.   
REFERENCES 
Alexe, B., Burdick, D. Hernandez, M., Koutrika, G., 
Krishnamurthy, R., Popa, L., Stanoi, I., and Wisnesky, 
R., 2013. High-Level Rules for Integration and 
Analysis of Data: New Challenges. In Search of 
Elegance in the Theory and Practice of Computation: 
Essays Dedicated to Peter Buneman. Springer Berlin 
Heidelberg. Pp 36-55.  
Bernardo, I. R., Mota, M. S., Santanchè, A., 2012. 
Extraindo e Integrando Semanticamente Dados de 
Múltiplas Planilhas Eletrônicas a Partir do 
Reconhecimento de Sua Natureza. In Proceedings of 
Brazilian Symposium on Databases (SBBD 2012): 
256-263 
BIBO, 2016. Available at 
http://lov.okfn.org/dataset/lov/vocabs/bibo. Last 
access on December, 2016.  
CBO, 2016. Available at http://comicmeta.org/cbo/. Last 
access on December, 2016. 
David, J., Euzenat, J., Scharffe, F., and Trojahn dos 
Santos, C., 2011. The alignment api 4.0. In Semantic 
web journal 2 (1): 3–10, 2011. 
DBPEDIA, 2016. Available on http://wiki.dbpedia.org/. 
Last access on December, 2016.  
DC, 2016.  Available at 
http://dublincore.org/documents/2008/01/14/dcmi-
type-vocabulary/. Last access on December, 2016.  
DOAP, 2016. Available on 
http://lov.okfn.org/dataset/lov/vocabs/doap. Last 
access on December, 2016.  
Fanizzi, N., dAmato, C., and Esposito, F. 2012. Mining 
linked open data through semi-supervised learning 
methods based on self-training. In Proceedings of the 
IEEE Sixth International Conference on Semantic 
Computing (ICSC), 2012. IEEE, Palermo, Italy, pp. 
277–284, 2012.