Hybrid training methodologies for Distributed mesh computing
Yash Akhauri
Unknown
- 0 Collaborators
Introducing a hybrid training method for decentralizing the neural network training process. This will enable us to test the feasibility of distributing training on idle CPU/GPUs in workspaces. This can even be extended to IoT use cases for decentralized smart device processing and data gathering tasks. ...learn more
Project status: Under Development
Networking, Artificial Intelligence
Intel Technologies
Intel Opt ML/DL Framework
Overview / Usage
There are several idle CPU and GPU resources in a typical workspace. These resources can
be utilized for distributed computing when idle. This project hopes to develop and look into
testing effective training techniques for such work stations. This will be achieved using
prior knowledge from research on neural network energy landscapes, it is argued that
neural network loss minima are not isolated points in parameter space, but essentially
form a connected manifold. Thus, it is understood that neural networks have enough
capacity for structural changes. I hope to introduce a computationally efficient structural
change (Weight binarization) for distributed training. I have developed inference
techniques for the Intel Xeon processors which give feed-forward speed up of around 30
times for GEMM operations on Intel Xeon processors. The XNOR network worked on
before is also known to require 32 times lesser memory and is well tuned for CPUs. In this
project, I shall exploit this property of XNOR Nets, and the stability of networks to structural
changes for exploring effective decentralized computing methodologies.
Methodology / Approach
- Implementing xGEMM with OpenMP, performance benchmarks and accuracy analysis.
- Implementing xCONV with OpenMP
- Epoch-based analysis of proposed training methodology. Feasibility for distributed
computing. - A two phase training technique will be tested. In this, I shall first attempt to train the network as a BNN, distributed. This will be followed by full precision training of the neural network.
Technologies Used
Intel Xeon Phi, Intel Xeon Gold, Intel OpenMP, Tensorflow, Theano.