RocketML Distributed Machine Learning

vinay rao

vinay rao

Beaverton, Oregon

RocketML is a super fast scale-out system for both Training machine learning models and Pre-processing steps. As data gets larger, machine learning steps gets slower making data scientists job tedious. A distributed system like RocketML shortens model training and processing tasks from days to minutes. RocketML is built to stitch together a large number of powerful Xeon processors to scale efficiently. Every component of the software is tuned so that the system is pushed to the limits of Amdahl's law. In essence everybody who uses RocketML gets a supercomputer at their disposal! RocketML supports Tabular, Text, Video and Image data types in its raw formats eliminating cost and unnecessary steps. It supports data pre-processing functionality to run on 1000s of cores. A few examples are shown here: 1) Object detection on images and videos using pre-built deep learning models 2) Audio-to-text using CMU Sphinx 3) Google APIs, text processing for multiple languages RocketML provides Distributed Machine learning algorithms that scale efficiently across nodes. Our current suite of algorithms include: 1) Linear Regression, Ridge Regression, Lasso, Logistic Regression, 2) Support Vector Machine, 3) Singular Value Decomposition, 4) Principal Component Analysis, 5) Non-negative Matrix Factorization, 6) K-means clustering, 7) Decision Trees, Gradient Boosted Trees, 8) Neural Nets RocketML also allows customers to bring their own models, favorite frameworks like MxNet, Tensorflow, Sklearn, PyTorch. ...learn more

Project status: Published/In Market

Robotics, HPC, Artificial Intelligence

Intel Technologies
OpenVINO, AI DevCloud / Xeon, MKL, Intel Opt ML/DL Framework, Intel Python, Intel CPU

Code Samples [1]

Overview / Usage

RocketML is a super fast scale-out system for both Training machine learning models and Pre-processing steps.
As data gets larger, machine learning steps gets slower making data scientists job tedious. A distributed system like RocketML shortens model training and processing tasks from days to minutes.

RocketML is built to stitch together a large number of powerful Xeon processors to scale efficiently for machine learning.

Methodology / Approach

RocketML is essentially a C++ library for distributed machine learning for the following

  • Supervised algorithms: Generalized Linear Models, Logistic Regression, Linear and Nonlinear SVM, Random forests, Gradient boosted trees and
  • Unsupervised learning algorithms: Singular Value Decomposition, Principal Component Analysis (PCA), Sparse PCA, and Clustering.

We have built a wide collection of sequential and distributed-memory operations, including support for

  • dense and sparse-direct linear algebra,
  • unconstrained, bound, and least square optimization problems,
  • scalable linear equation solvers and eigenvalue solvers.
    It also includes optimization methods:
  • gradient descent method,
  • stochastic gradient descent methods,
  • coordinate-descent method,
  • alternating direction method of multipliers (ADMM),
  • quasi-Newton methods like L-BFGS,
  • trust region methods,
  • line search methods.

These methods are fairly popular in solving generalized linear models, logistic regression, and SVM.

Technologies Used

Intel Xeon Processors
Intel MKLs
Intel AI Dev Cloud
OpenVino (**NEW)
Intel Python Distribution

Repository

https://aws.amazon.com/marketplace/pp/B07C49CVC1?

Collaborators

There are no people to show.

Comments (0)