OCR Reader from the form
Divyansh Jha
Unknown
- 0 Collaborators
Processing of handwritten forms by customers requires a large amount of human resources and moreover, it is time consuming. There is a need of extracting the information from a handwritten form in an automated manner. Convolutional Neural Networks are de-facto state-of-the-art for Classification, Localization and Detection of images. Character recognition with baseline architectures is restricted to single character recognition in a well preprocessed image centered on the character, and training carried out using popular datasets such as MNIST Digits dataset or Chars74k dataset. We explore a new approach for Optical Character Recognition on handwritten forms, where we employ a modified architecture named Region based Convolutional Neural Network (R-CNN). We wrap the core model into an end-to-end pipeline and present a product which can seamlessly process any kind of handwritten forms and populate a database on a remote server with all the relevant information. ...learn more
Project status: Concept
Intel Technologies
AI DevCloud / Xeon
Overview / Usage
The first step is to take a photograph of the remittance form that is already filled
by the customer.This can be done using the camera of a mobile phone. This photograph
can be stored in any suitable format like jpg, png etc. The JPEG file is then uploaded to
the cloud server where we have hosted our already trained model. This jpg/png file is
the input to our model.The model uses Fast R-CNN and RPN and checks the jpg file
using a sliding window to extract the useful information from the form like name of
customer, amount paid, Card number and other details. All this information is extracted
and stored in text format.
This information is then uploaded and stored in our database which is synced
with a django server. The database is dynamically updated and a notification system is
also added so as to remove manual labour from the process. The automation of
process by removal of manual labour leads to an increase in efficiency, reliability and
robustness.
Methodology / Approach
A fast R-CNN is designed to take the complete image, as an input along with a
set of object proposals. A total of 3 pre-trained ImageNet networks, each with five
maximum pooling layers, between 5 and 13 convolutional layers are used. At first, RoI
pooling layer replaces the last max pooling layer, and then 2 sibling layers replace the
last fully connected layer and softmax. (Trained by 1000 way ImageNet classification).
Backpropagation is used to update the Fast R-CNN weights. Along with the hierarchical
sampling, Faster R-CNN combines streamlined training process with a fine-tuned layer,
which optimizes the softmax classifier and box bounding regressors. The various
components involved in this, includes calculating multi-task loss, backpropagation
through RoI pooling layers, mini-batch sampling and SGD hyper-parameters.
We do not train the network from scratch, and instead adopt a VGG Network
already trained on Pascal VOC dataset. This network is then appended with the RPN in
order to create a Faster R-CNN. The collective architecture and trained for 50,000
iterations on IAM Handwriting Dataset.For a word image the R-CNN outputs a set of
region proposals each of which is specified by 4 bounding box coordinates, the class
label (consisting of alphanumerics: 0-9, a-z, A-Z), and confidence, which corresponds to
the probability of the top class label. Further, we talk about how we wrap the complete
model in a bigger pipeline, resulting in the product suitable for industrial settings.
I will train it with Adam Optimizer (alpha 0.9, beta 0.995) and Nesterov Momentum of 0.95 on intel devcloud.
Once training is complete I will do the inferencing on Intel Neural compute stick.