Concatenated Decision Paths Classification for Time Series Shapelets - A New Approach for One Dimensional Data Classification and its Application
Mitzev, Ivan Stefanov.
Time series are very common in presenting collected data such as economic indicators, natural phenomenon, control engineering data, among others. In the last decade, the interest in time series data mining increased as the amount of collected data increased dramatically. Standard approaches for time series classification are based on collecting distance measures, such as the Euclidian distance (ED) and dynamic time warping (DTW) along with 1-NN classifier for further classification. Recently, more advanced types of classification were found, introducing primitives (named time series shapelet) that consistently represent a certain class. The time series shapelet is a small sub-section of the entire time series, which is “particularly discriminating”. It appears that shapelets based classification produces higher accuracies on some data sets, based on the fact that the global features are more sensitive to noise than locals. Despite its advantages, the time series shapelets classification has an apparent disadvantage: very slow training time. This work attempts to improve the training time for the originally proposed time series shapelets classification algorithm and introduces a new approach for time series classification based on concatenated decision tree paths. First, the classical algorithm for time series classification based on shapelets, is significantly improved in terms of the training time. The improvement is based on using randomly generated sequences tuned in a particle-swarm-optimization (PSO) environment, instead of using sub-series from the original time series. Second, a new highly accurate classification method, based on concatenated decision tree paths, is introduced. The approach builds a unique representative pattern of a certain class based on the taken paths in a pool of decision trees. Third, the proposed method has been successfully extended for a 2-class-labels classification problem where only one decision tree can be built. A variety of 2-class-labels decision trees were built based on different splitting criterion (distance to a random shapelet); thus- increasing the pool of decision trees and increasing the overall accuracy. Fourth, the proposed method was successfully applied on two classes image classification problem, by converting the image into time series. An accuracy of around 95% was achieved for the pedestrian detection case from the Daimler database.