Anomaly detection on streaming data using distributed Singular Value Decomposition
Project status: Published/In Market
Groups
Artificial Intelligence West Coast
Intel Technologies
Other
Overview / Usage
Singular Value Decomposition (SVD) is a matrix decomposition technique with many applications in areas like genetics, natural language processing (NLP), and social network analysis. All these application areas result in very large matrices with millions of rows and Features.
In genetics, matrix entries represent gene response for an individual, while in NLP these entries represent a term frequency per document. Datasets in these areas are growing rapidly and one processor cannot compute SVD in a feasible amount of time. Randomized algorithms for SVD with sketching have gained traction as they perform significantly better than classical deterministic algorithms in speed, accuracy, and robustness. In addition, these algorithms can be implemented to exploit multi-processor architectures and run on large-scale clusters with 1000s of cores on large datasets.
This all sounds good. But there is a challenge! In data streaming applications data arrives in random order and is not directly suitable for randomized algorithms as they expect the whole dataset to be available. In this talk, we will present a hybrid approach of applying frequent directions algorithms that are well suited for data streaming applications and randomized algorithm for fast SVD computation. We will present results in various applications including video and NLP for datasets with billions of nonzero entries on clusters with 1000s of cores. We will also discuss how Spark performs for these large-scale machine learning challenge.
Methodology / Approach
We use frequent directions and randomized distributed SVD to handle large-scale streaming data.
Technologies Used
We used several C++ libraries including STL and Boost on Intel Xeons.
Other links
Collaborators
There are no people to show.