Time Series Shapelet Classification through Learned Shapelet-Specific Distances

Carter Blum

Carter Blum

Minneapolis, Minnesota

This project aims to improve the classification ability of time series shapelets by learning a unique distance metric for each shapelet that best separates samples. ...learn more

Project status: Under Development

Artificial Intelligence

Intel Technologies
AI DevCloud / Xeon

Overview / Usage

Shapelets [1] are small subsequences of time series that can be used to classify unlabeled time series. Shapelets can be used in stand-alone classifiers and are an essential part of the most accurate ensemble methods for time series classification. Previous shapelet research has focused on finding the most efficient approaches to discover shapelets and how to incorporate shapelets into different classification methods. However, a shapelet's ability to accurately classify depends on a single measurement: its distance from each sample. Euclidean distance, while robust, is the only distance metric that has been used with shapelets.

This project explores an alternative to Euclidean distance by creating a distance metric specific to each shapelet. If successful, this improvement could improve shapelet accuracy for time series classification across dozens of domains including finance, entertainment, medicine, biometrics, human motion, chemistry, astronomy, robotics, entomology, wildlife management, food spectroscopy, and PMU disturbance event classification.

Methodology / Approach

First, given a shapelet, we use a unique mapping that uses the shapelet to project time series training samples into Euclidean point space. In this space, the current approach of using a fixed Euclidean distance threshold for shapelet classification is equivalent to creating a spherical decision boundary. We then use logistic regression in this Euclidean space to learn a decision boundary that can more accurately capture the structure of the data. The goal is to improve the classification power of the shapelet by providing better separation between samples of different classes.

The number of dimensions in this Euclidean space often exceed the number of training points, meaning that dimensionality reduction is required. We are currently exploring using Principal Component Analysis and Generalized Eigenvectors [2] to reduce dimensions prior to learning the new distance with logistic regression.

We are currently exploring the impact of using shapelet-specific distances instead of Euclidean distance on 4 shapelet classification methods: Fast Shapelets, Shapelet Transform, Learned Shapelets, and FLAG. A brief description of each shapelet method is below.

As a testbed, we're using the UCR/UEA repository [3,4] which contains 128 datasets from dozens of domains.

Shapelet methods

  • Fast Shapelets [5] compresses the shapelet search space using SAX word mapping and builds a decision tree classifier.
  • The Shapelet Transform [6] uses brute force search with minimal speedups to find shapelet candidates. These are then used to train an ensemble of 6 different classifiers with cross-validated weights.
  • Learned Shapelets [7] formulates shapelet discovery as an optimization problem, which is solved using gradient descent. Multiple shapelets are then used to train a linear regression classifier.
  • FLAG [8] searches for shapelets by first creating shapelet indicators through an optimization using Generalized Eigenvectors and Fused Lasso. It then feeds these shapelets through the shapelet transform and classifies with a linear Support Vector Machine.

[1] Ye, Lexiang, and Eamonn Keogh. "Time series shapelets: a new primitive for data mining." Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2009.
[2] Karampatziakis, Nikos, and Paul Mineiro. "Discriminative features via generalized eigenvectors." Proceedings of the 31st International Conference on Machine Learning, 2014.
[3] Hoang Anh Dau, Eamonn Keogh, Kaveh Kamgar, Chin-Chia Michael Yeh, Yan Zhu, Shaghayegh Gharghabi , Chotirat Ann Ratanamahatana, Yanping Chen, Bing Hu, Nurjahan Begum, Anthony Bagnall , Abdullah Mueen and Gustavo Batista (2018). The UCR Time Series Classification Archive. https://www.cs.ucr.edu/~eamonn/time_series_data_2018
[4] Anthony Bagnall, Jason Lines, William Vickers and Eamonn Keogh, The UEA & UCR Time Series Classification Repository, www.timeseriesclassification.com
[5] Rakthanmanon, Thanawin, and Eamonn Keogh. "Fast shapelets: A scalable algorithm for discovering time series shapelets." proceedings of the 2013 SIAM International Conference on Data Mining. Society for Industrial and Applied Mathematics, 2013.
[6] Hills, Jon, et al. "Classification of time series by shapelet transformation." Data Mining and Knowledge Discovery 28.4, 2014.
[7] Grabocka, Josif, et al. "Learning time-series shapelets." Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2014.
[8] Hou, Lu, James T. Kwok, and Jacek M. Zurada. "Efficient learning of timeseries shapelets." Thirtieth AAAI Conference on Artificial Intelligence, 2016.

Technologies Used

Computing Resources:
Intel Colfax

Programming Languages:
MATLAB
C++
Java

Dimensionality Reduction Techniques:
Principal Component Analysis
Generalized Eigenvectors

Collaborators

There are no people to show.

Comments (0)