Language modelling
Prajjwal Bhargava
Unknown
- 0 Collaborators
Universal Language Modelling (ULMFit): Transfer Learning was kind of limited to computer vision up till now, but recent research work shows that the impact can be extended almost everywhere, including natural language processing (NLP), reinforcement learning (RL). Recently, a few papers have been published that show that transfer learning and fine-tuning work in NLP as well and the results are great. Previous research involved incremental learning in computer vision, bringing generalization into models since it’s one of the most important components in making learning in neural networks robust. One paper that aims to build on this is Universal Language Model Fine-tuning for Text Classification. state of the art other LM using different architectures using RNNs/LSTMs/GRUs ...learn more
Project status: Published/In Market
Groups
Student Developers for AI
Intel Technologies
AI DevCloud / Xeon,
Intel Opt ML/DL Framework
Overview / Usage
This project aims to creating on Universal Language models and builds on ULMFit which has shown tremendous improvements in accuracies using Language Model (Transfer Learning in NLP) . The goal is make usage of LM more easier and accessible and similar to transfer learning in CV .It also contains different architectures of language modelling models using RNNs/GRU/LSTMs .
The project made use of Jupyter* notebook on the Intel AI DevCloud (using Intel Xeon Scalable processors) to write the code and for visualization purposes. Information from the Intel® AI Academy forum was also used for optimization purposes with Intel Xeon processors. The code used can be found in this GitHub* repository or in this Fast.ai original implementation by Jeremy Howard. Some adjustments for optimization on this architecture can be found here.
Methodology / Approach
It makes use of a pre-trained AWD-LSTM trained on wiki-text103, we use that as the base model, using novel techniques like:
- Classifier fine-tuning for task specific weights
- Discriminative fine-tuning
- Concat pooling
- Training the classifier (gradual unfreezing)
- Backpropagation Through Time (BPTT) for text classification
This project includes implementations of research papers to demonstrate better and efficient ways to perform Language Modelling. More details can be found in the Github repository
Problems being solved by ULMfit
This method can be called universal because it is not dataset-specific. It can work across documents and datasets of various lengths. It uses a single architecture (in this case AWD-LSTM, just like ResNets in CV). No custom features must be engineered to make it compatible with other tasks. It doesn’t require any additional documents to make it work across certain domains.
This model can further be improved with using attention and adding skip connections wherever necessary.
Technologies Used
Pytorch,fastai,DevCloud,Tensorflow
Repository
https://github.com/prajjwal1/language-modelling