loading
Papers Papers/2022 Papers Papers/2022

Research.Publish.Connect.

Paper

Paper Unlock

Authors: Rui Lima 1 and Estrela Ferreira Cruz 2

Affiliations: 1 ARC4DigiT - Applied Research Centre for Digital Transformation, Instituto Politécnico de Viana do Castelo and Portugal ; 2 ARC4DigiT - Applied Research Centre for Digital Transformation, Instituto Politécnico de Viana do Castelo, Portugal, ALGORITMI Research Centre, Escola de Engenharia, Universidade do Minho, Guimarães and Portugal

Keyword(s): Data Mining, Data Wrangling, Web Scraping, Data Warehouse, Parsing, Business Intelligence.

Related Ontology Subjects/Areas/Topics: Coupling and Integrating Heterogeneous Data Sources ; Data Warehouses and OLAP ; Databases and Information Systems Integration ; Enterprise Information Systems

Abstract: This paper proposes an approach to detect and extract data from unstructured data source (about the subject to be studied) available online and spread by several Web pages and aggregate and store the data in a Data Warehouse properly designed for it. The Data Warehouse repository will serve as basis for the Business Intelligence and Data Mining analysis. The extracted data may be complemented with information provided by other sources in order to enrich the information to enhance the analysis and draw new and more interesting conclusions. The proposed process is then applied to a case study composed by results of athletics events realized in Portugal in the last 12 years. The files, about competition results, are available online, spread by the websites of the several athletics associations. Almost all files are published in portable document format (PDF) and each association provides files with its own different internal format. The case study also proposes an integrating mechanism between results of athletics events with their geographic location and atmospheric conditions of the events allowing to assess and analyze how the atmospheric and geographical conditions interfere in the results achieved by the athletes. (More)

CC BY-NC-ND 4.0

Sign In Guest: Register as new SciTePress user now for free.

Sign In SciTePress user: please login.

PDF ImageMy Papers

You are not signed in, therefore limits apply to your IP address 3.145.166.7

In the current month:
Recent papers: 100 available of 100 total
2+ years older papers: 200 available of 200 total

Paper citation in several formats:
Lima, R. and Cruz, E. (2019). Extraction and Multidimensional Analysis of Data from Unstructured Data Sources: A Case Study. In Proceedings of the 21st International Conference on Enterprise Information Systems - Volume 1: ICEIS; ISBN 978-989-758-372-8; ISSN 2184-4984, SciTePress, pages 190-199. DOI: 10.5220/0007720301900199

@conference{iceis19,
author={Rui Lima. and Estrela Ferreira Cruz.},
title={Extraction and Multidimensional Analysis of Data from Unstructured Data Sources: A Case Study},
booktitle={Proceedings of the 21st International Conference on Enterprise Information Systems - Volume 1: ICEIS},
year={2019},
pages={190-199},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0007720301900199},
isbn={978-989-758-372-8},
issn={2184-4984},
}

TY - CONF

JO - Proceedings of the 21st International Conference on Enterprise Information Systems - Volume 1: ICEIS
TI - Extraction and Multidimensional Analysis of Data from Unstructured Data Sources: A Case Study
SN - 978-989-758-372-8
IS - 2184-4984
AU - Lima, R.
AU - Cruz, E.
PY - 2019
SP - 190
EP - 199
DO - 10.5220/0007720301900199
PB - SciTePress