RocketML Distributed Machine Learning
vinay rao
Beaverton, Oregon
RocketML is a super fast scale-out system for both Training machine learning models and Pre-processing steps. As data gets larger, machine learning steps gets slower making data scientists job tedious. A distributed system like RocketML shortens model training and processing tasks from days to minutes. RocketML is built to stitch together a large number of powerful Xeon processors to scale efficiently. Every component of the software is tuned so that the system is pushed to the limits of Amdahl's law. In essence everybody who uses RocketML gets a supercomputer at their disposal! RocketML supports Tabular, Text, Video and Image data types in its raw formats eliminating cost and unnecessary steps. It supports data pre-processing functionality to run on 1000s of cores. A few examples are shown here: 1) Object detection on images and videos using pre-built deep learning models 2) Audio-to-text using CMU Sphinx 3) Google APIs, text processing for multiple languages RocketML provides Distributed Machine learning algorithms that scale efficiently across nodes. Our current suite of algorithms include: 1) Linear Regression, Ridge Regression, Lasso, Logistic Regression, 2) Support Vector Machine, 3) Singular Value Decomposition, 4) Principal Component Analysis, 5) Non-negative Matrix Factorization, 6) K-means clustering, 7) Decision Trees, Gradient Boosted Trees, 8) Neural Nets RocketML also allows customers to bring their own models, favorite frameworks like MxNet, Tensorflow, Sklearn, PyTorch. ...learn more
Project status: Published/In Market
Robotics, HPC, Artificial Intelligence
Intel Technologies
OpenVINO,
AI DevCloud / Xeon,
MKL,
Intel Opt ML/DL Framework,
Intel Python,
Intel CPU
Overview / Usage
RocketML is a super fast scale-out system for both Training machine learning models and Pre-processing steps.
As data gets larger, machine learning steps gets slower making data scientists job tedious. A distributed system like RocketML shortens model training and processing tasks from days to minutes.
RocketML is built to stitch together a large number of powerful Xeon processors to scale efficiently for machine learning.
Methodology / Approach
RocketML is essentially a C++ library for distributed machine learning for the following
- Supervised algorithms: Generalized Linear Models, Logistic Regression, Linear and Nonlinear SVM, Random forests, Gradient boosted trees and
- Unsupervised learning algorithms: Singular Value Decomposition, Principal Component Analysis (PCA), Sparse PCA, and Clustering.
We have built a wide collection of sequential and distributed-memory operations, including support for
- dense and sparse-direct linear algebra,
- unconstrained, bound, and least square optimization problems,
- scalable linear equation solvers and eigenvalue solvers.
It also includes optimization methods: - gradient descent method,
- stochastic gradient descent methods,
- coordinate-descent method,
- alternating direction method of multipliers (ADMM),
- quasi-Newton methods like L-BFGS,
- trust region methods,
- line search methods.
These methods are fairly popular in solving generalized linear models, logistic regression, and SVM.
Technologies Used
Intel Xeon Processors
Intel MKLs
Intel AI Dev Cloud
OpenVino (**NEW)
Intel Python Distribution
Repository
https://aws.amazon.com/marketplace/pp/B07C49CVC1?
Collaborators
There are no people to show.