loading
Papers Papers/2022 Papers Papers/2022

Research.Publish.Connect.

Paper

Paper Unlock

Authors: João Gama 1 and Pedro Medas 2

Affiliations: 1 LIACC - University of Porto; Fac. Economics, University of Porto, Portugal ; 2 LIACC - University of Porto, Portugal

Keyword(s): Concept Drift, Forest of Trees, Data Streams.

Abstract: This paper presents an adaptive learning system for induction of forest of trees from data streams able to detect Concept Drift. We have extended our previous work on Ultra Fast Forest Trees (UFFT) with the ability to detect concept drift in the distribution of the examples. The Ultra Fast Forest of Trees is an incremental algorithm, that works online, processing each example in constant time, and performing a single scan over the training examples. Our system has been designed for continuous data. It uses analytical techniques to choose the splitting criteria, and the information gain to estimate the merit of each possible splitting-test. The number of examples required to evaluate the splitting criteria is sound, based on the Hoeffding bound. For multi-class problems the algorithm builds a binary tree for each possible pair of classes, leading to a forest of trees. During the training phase the algorithm maintains a short term memory. Given a data stream, a fixed number of the most recent examples are maintained in a data-structure that supports constant time insertion and deletion. When a test is installed, a leaf is transformed into a decision node with two descendant leaves. The sufficient statistics of these leaves are initialized with the examples in the short term memory that will fall at these leaves. To detect concept drift, we maintain, at each inner node, a naïve-Bayes classifier trained with the examples that traverse the node. While the distribution of the examples is stationary, the online error of naive-Bayes will decrease. When the distribution changes, the naive-Bayes online error will increase. In that case the test installed at this node is not appropriate for the actual distribution of the examples. When this occurs all the subtree rooted at this node will be pruned. This methodology was tested with two artificial data sets and one real world data set. The experimental results show a good performance at the change of concept detection and also with learning the new concept. (More)

CC BY-NC-ND 4.0

Sign In Guest: Register as new SciTePress user now for free.

Sign In SciTePress user: please login.

PDF ImageMy Papers

You are not signed in, therefore limits apply to your IP address 54.205.243.115

In the current month:
Recent papers: 100 available of 100 total
2+ years older papers: 200 available of 200 total

Paper citation in several formats:
Gama, J. and Medas, P. (2004). Learning in Dynamic Environments: Decision Trees for Data Streams. In Proceedings of the 4th International Workshop on Pattern Recognition in Information Systems (ICEIS 2004) - PRIS; ISBN 972-8865-01-5, SciTePress, pages 149-158. DOI: 10.5220/0002673001490158

@conference{pris04,
author={João Gama. and Pedro Medas.},
title={Learning in Dynamic Environments: Decision Trees for Data Streams},
booktitle={Proceedings of the 4th International Workshop on Pattern Recognition in Information Systems (ICEIS 2004) - PRIS},
year={2004},
pages={149-158},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0002673001490158},
isbn={972-8865-01-5},
}

TY - CONF

JO - Proceedings of the 4th International Workshop on Pattern Recognition in Information Systems (ICEIS 2004) - PRIS
TI - Learning in Dynamic Environments: Decision Trees for Data Streams
SN - 972-8865-01-5
AU - Gama, J.
AU - Medas, P.
PY - 2004
SP - 149
EP - 158
DO - 10.5220/0002673001490158
PB - SciTePress