Authors:
Mila Kwiatkowska
1
and
Frank Pouw
2
Affiliations:
1
Department of Computing Science, Thompson Rivers University, 805 TRU Way, Kamloops and Canada
;
2
Department of Environmental Sciences, Thompson Rivers University, 805 TRU Way, Kamloops and Canada
Keyword(s):
Data Quality, Secondary Data Analysis, Ecological Data, Semiotics.
Related
Ontology
Subjects/Areas/Topics:
Artificial Intelligence
;
Collaboration and e-Services
;
Data Engineering
;
Data Management and Quality
;
e-Business
;
Enterprise Information Systems
;
Information Integration
;
Information Quality
;
Integration/Interoperability
;
Knowledge Engineering and Ontology Development
;
Knowledge-Based Systems
;
Ontologies and the Semantic Web
;
Symbolic Systems
;
Transparency in Research Data
Abstract:
Data quality problems are widespread in secondary data when they are used for data warehousing and data mining. This paper advocates a broad semiotic approach to data quality. The main premises of this expanded semiotic framework are (1) data represent some reality, (2) data are created and interpreted by humans in a communication process, (3) data are used for specific purposes by humans, and (4) data cannot be created, interpreted and used without knowledge. Thus, the semiotic-based approach to data quality in secondary data analysis has four aspects: (1) representational, (3) communicational, (3) pragmatic, and (4) knowledge-based. To illustrate these four characteristics, we present a case study of ecological data analysis used in the creation of an ornithological data warehouse. We discuss the temporal data (ecological notion of time), spatial ecological data (communication processes and protocols used for data collection), and bioacoustic data processing (domain knowledge neede
d for the specification of data provenance).
(More)