Miriam Esteve, Nuria Mollá, Jesús Javier Rodríguez and Alejandro Rabasa (Operations Research Center, University Miguel Hernández of Elche)
Abstract: The continuous input of data into an Information System makes it difficult to generate a reliable model when this stream changes unpredictably. This continuous and unexpected change of data, known as concept drift, is faced by different strategies depending on its type. Several contributions are focused on the adaptations of traditional Machine Learning techniques to solve these data streams problems. The decision tree is one of the most used Machine Learning techniques due to its high interpretability. This article aims to study the impact an abrupt concept change has on the accuracy of the original CART proposed by Breiman, and justify the necessity of detection and/or adaptation methodologies that update or rebuild the model when a concept drift occurs. To do that, some simulated experiences have been carried out to study several training and testing conditions in a changing data environment. According to the results, models that are rebuilt in the right moment after a concept drift occurs obtain high accuracy rates while those that are not rebuilt or are rebuilt after a change occurs, obtain considerably lower accuracies.