Knowledge Distillation
Ujjwal Upadhyay
Unknown
- 0 Collaborators
Keras implementation of Hinton's knowledge distillation (KD), a way of transferring knowledge from a large model into a smaller model. ...learn more
Project status: Published/In Market
Groups
Student Developers for AI,
Artificial Intelligence India,
DeepLearning
Intel Technologies
AI DevCloud / Xeon,
Intel Python,
Intel Opt ML/DL Framework
Overview / Usage
knowledge distillation is a simple way to improve the performance of deep learning models on mobile devices. In this process, we train a large and complex network or an ensemble model which can extract important features from the given data and can, therefore, produce better predictions. Then we train a small network with the help of the cumbersome model. This small network will be able to produce comparable results, and in some cases, it can even be made capable of replicating the results of the cumbersome network.
Methodology / Approach
You can ‘distill’ the large and complex network in another much smaller network, and the smaller network does a reasonable job of approximating the original function learned by a deep network. However, there is a catch, the distilled model (student), is trained to mimic the output of the larger network (teacher), instead of training it on the raw data directly. This has something to do with how the deeper network learns hierarchical abstractions of the features. The transferring of the generalization ability of the cumbersome model to a small model can be done by the use of class probabilities produced by the cumbersome model as “soft targets” for training the small model. For this transfer stage, we use the same training set or a separate “transfer” set as used for training the cumbersome model. When the cumbersome model is a large ensemble of simpler models, we can use arithmetic or geometric mean of their individual predictive distributions as the soft targets.
Technologies Used
- Tensorflow
- Keras
Repository
https://github.com/Ujjwal-9/Knowledge-Distillation