DEEP NATURAL LANGUAGE PROCESSING IMPLEMENTATION USING TENSORFLOW
Rajat Sharma
Bengaluru, Karnataka
- 0 Collaborators
Machine Comprehension is a very interesting task in both natural language processing and artificial intelligent research but extremely challenging.There are several approaches to NLP tasks in general. With recent breakthroughs allowed in algorithms (deep learning), hardware (GPUs) and user friendly APIs (TensorFlow), some tasks have become feasible up to a certain accuracy. This project report contains TensorFlow implementations of various deep learning models, with a focus on problems in Natural Language Processing. Given the following models implementation and training which are completed in the project done at NIT Srinagar using Intel DevCloud and optimized Intel TensorFlow Framework: 1. Mnist_cnn: A three-layer Convolutional Neural Network for the MNIST Handwritten Digit Classification task. 2. Langmod_nn: Builds a three-layer Forward Bigram Model neural network consisting of an Embedding Layer, a Hidden Layer, and a final Softmax layer where the goal is as follows: Given a word in a corpus, attempt to predict the next word. 3. Memn2n_nn: A neural network with a recurrent attention model over a possibly large external memory. The architecture is a form of Memory Network but unlike the model in that work, it is trained end-to-end, 4. Variational_autoencoder : Variational Autoencoder for the MNIST Handwritten Digits dataset. As Variational Autoencoder, the goal of this model is to simulate a generative model. The importance of this project is to help the machine to understand the meaning of sentences, which improves the efficiency of machine translation and to interact with the computing systems and obtain useful information from it. ...learn more
Project status: Under Development
Intel Technologies
AI DevCloud / Xeon,
Intel Opt ML/DL Framework
Overview / Usage
Machine Comprehension is a very interesting task in both natural language processing and artificial intelligent research but extremely challenging.There are several approaches to NLP tasks in general. With recent breakthroughs allowed in algorithms (deep learning), hardware (GPUs) and user friendly APIs (TensorFlow), some tasks have become feasible up to a certain accuracy. This project report contains TensorFlow implementations of various deep learning models, with a focus on problems in Natural Language Processing.
Given the following models implementation and training which are completed in the project done at NIT Srinagar using Intel DevCloud and optimized Intel TensorFlow Framework:
-
Mnist_cnn: A three-layer Convolutional Neural Network for the MNIST Handwritten Digit Classification task.
-
Langmod_nn: Builds a three-layer Forward Bigram Model neural network consisting of an Embedding Layer, a Hidden Layer, and a final Softmax layer where the goal is as follows: Given a word in a corpus, attempt to predict the next word.
-
Memn2n_nn: A neural network with a recurrent attention model over a possibly large external memory. The architecture is a form of Memory Network but unlike the model in that work, it is trained end-to-end,
-
Variational_autoencoder : Variational Autoencoder for the MNIST Handwritten Digits dataset. As Variational Autoencoder, the goal of this model is to simulate a generative model.
The importance of this project is to help the machine to understand the meaning of sentences, which improves the efficiency of machine translation and to interact with the computing systems and obtain useful information from it.
Methodology / Approach
This project has used Deep Learning(Neural Networks approach) which is a branch of machine learning based on a set of algorithms that attempt to model high level abstractions in data.A deep neural network (DNN) is an artificial neural network (ANN) with multiple hidden layers of units between the input and output layers.Algorithm used is the backward propagation of errors or back propagation, which is a common method of training artificial neural networks, used in conjunction with optimization method of stochastic gradient descent.The overall focus is to use deep learning methods on Natural language processing (NLP) which is a field of computer science, artificial intelligence concerned with the interactions between computers and human (natural) languages, and, in particular, concerned with programming computers to fruitfully process large natural language data.Used Open source software library TensorFlow for numerical computation using data flow graphs.TensorFlow provides an extensive suite of functions and classes that allow users to build various models from scratch.Used python 3.5.2 as the programming language and the training is done using Intel Dev Cloud.The objective of the project was TensorFlow implementations of various models, with a focus on problems in Natural Language Processing.Stochastic gradient descent algorithm (often shortened in SGD), also known as incremental gradient descent, is used in the models, is a stochastic approximation of the gradient descent optimization method for minimizing an objective function that is written as a sum of differentiable functions. In other words, SGD tries to find minimums or maximums by iteration.
model.py:
Class definition for the model's neural network. TensorFlow at its core is a system for building symbolic computational graphs, and everything in TensorFlow is either expressed as a raw Tensor, or a Tensor operation. Because of this, building a model consists of building different graphs and operations to handle the inference of the model, to evaluate the loss/cost, and to perform training
( via backpropagation). Because of this, each class definition consists of the following three functions:
inference: This is the crux of any neural network. This function is responsible for building all the layers of the network, from the input, all the way to the final layer, just before the loss is calculated.
loss: Using the output from the inference function, this function evaluates the loss used for training the model. For example, the loss function might take in the logits from the softmax layer of a classification model (say like in the MNIST_cnn model), and calculate the cross-entropy loss with the true labels of the input data.
train: The train function builds the training operation, given the cost calculated in the loss function. This function computes the gradients, and sets up the optimizer (i.e. SGD, Adam, Adagrad, etc.). Any learning rate decay is also performed during this step.
Technologies Used
Windows 10, 64 bit, x86
TensorFlow with CPU support
"native" pip
Python 3.5.2(libraries like numpy etc included)
Intel DevCloud: Xeon Phi Powered HPC Cluster(Devcloud) on hosted Intel Server. To use Intel DevCloud we need installation of some softwares:
-SSH client putty and installer package
-Windows Access Key
-File Transfer Client WinSCP