Building a Query Engine for a Corpus of Open Data

Mauro Pelucchi, Giuseppe Psaila, Maurizio Toccu

Abstract

Public Administrations openly publish many data sets concerning citizens and territories in order to increase the amount of information made available for people, firms and public administrators. As an effect, Open Data corpora has become so huge that it is impossible to deal with them by hand; as a consequence, it is necessary to use tools that include innovative techniques able to query them. In this paper, we present a technique to select open data sets containing specific pieces of information, and retrieve them in a corpus published by a portal of open data. In particular, users can formulate structured queries blindly submitted to our search engine prototype (i.e., being unaware of the actual structure of data sets). Our approach reinterpret and mixes several known information retrieval approaches, giving at the same time a database view of the problem. We implemented this technique within a prototype, that we tested on a corpus containing more that over 2000 data sets. We noted that our technique provides focused results w.r.t. the baseline experiments performed with Apache Solr.

References

  1. Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak, R., and Ives, Z. (2007). Dbpedia: A nucleus for a web of open data. In The semantic web, pages 722-735. Springer.
  2. Clark, K. G., Feigenbaum, L., and Torres, E. (2008). Sparql protocol for rdf. World Wide Web Consortium (W3C) Recommendation, 86.
  3. Höchtl, J. and Lampoltshammer, T. J. (2016). Adequateanalytics and data enrichment to improve the quality of open data. In Proceedings of the International Conference for E-Democracy and Open Government CeDEM16, pages 27-32.
  4. Jaro, M. A. (1989). Advances in record-linkage methodology as applied to matching the 1985 census of tampa, florida. Journal of the American Statistical Association, 84(406):414-420.
  5. Khosro, S. C., Jabeen, F., Mashwani, S., and Alam, I. (2014). Linked open data: Towards the realization of semantic web - a review. Indian Journal of Science and Technology, 7(6):745-764.
  6. Kononenko, O., Baysal, O., Holmes, R., , and Godfrey, M. (2014). Mining modern repositories with elasticsearch. In MSR. June 29-30 2014, Hyderabad, India.
  7. Liu, J., Dong, X., and Halevy, A. Y. (2006). Answering structured queries on unstructured data. In WebDB. 2006, Chicago, Illinois, USA, volume 6, pages 25-30. Citeseer.
  8. Manning, C. D., Raghavan, P., Schütze, H., et al. (2008). Introduction to information retrieval, volume 1. Cambridge university press Cambridge.
  9. Miller, E. (1998). An introduction to the resource description framework. Bulletin of the American Society for Information Science and Technology, 25(1):15-19.
  10. Schwarte, A., Haase, P., Hose, K., Schenkel, R., and Schmidt, M. (2011). Fedx: a federation layer for distributed query processing on linked open data. In Extended Semantic Web Conference, pages 481-486. Springer.
  11. Shahi, D. (2015). Apache solr: An introduction. In Apache Solr, pages 1-9. Springer.
  12. Winkler, W. E. (1999). The state of record linkage and current research problems. In Statistical Research Division, US Census Bureau. Citeseer.
Download


Paper Citation


in Harvard Style

Pelucchi M., Psaila G. and Toccu M. (2017). Building a Query Engine for a Corpus of Open Data . In Proceedings of the 13th International Conference on Web Information Systems and Technologies - Volume 1: WEBIST, ISBN 978-989-758-246-2, pages 126-136. DOI: 10.5220/0006308801260136


in Bibtex Style

@conference{webist17,
author={Mauro Pelucchi and Giuseppe Psaila and Maurizio Toccu},
title={Building a Query Engine for a Corpus of Open Data},
booktitle={Proceedings of the 13th International Conference on Web Information Systems and Technologies - Volume 1: WEBIST,},
year={2017},
pages={126-136},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0006308801260136},
isbn={978-989-758-246-2},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 13th International Conference on Web Information Systems and Technologies - Volume 1: WEBIST,
TI - Building a Query Engine for a Corpus of Open Data
SN - 978-989-758-246-2
AU - Pelucchi M.
AU - Psaila G.
AU - Toccu M.
PY - 2017
SP - 126
EP - 136
DO - 10.5220/0006308801260136