Dynamic Lexicon Generation for Natural Scene Images

Vishal Bidawatka

Vishal Bidawatka

Hyderabad, Telangana

1 0
  • 0 Collaborators

In this project we propose a method that generates contextualized lexicons for scene images using only visual information. For this, we exploit the correlation between visual and textual information in a dataset consisting of images and textual content associated with them. (Topic modelling + CNN ) ...learn more

Project status: Under Development

Artificial Intelligence

Code Samples [1]Links [1]

Overview / Usage

Many scene text understanding methods approach the endto-

end recognition problem from a word-spotting perspective and take

huge benet from using small per-image lexicons. Such customized lexicons

are normally assumed as given and their source is rarely discussed.

In this project we propose a method that generates contextualized lexicons

for scene images using only visual information. For this, we exploit

the correlation between visual and textual information in a dataset consisting

of images and textual content associated with them. Using the

topic modeling framework to discover a set of latent topics in such a

dataset allows us to re-rank a xed dictionary in a way that prioritizes

the words that are more likely to appear in a given image. Moreover,

we train a CNN that is able to reproduce those word rankings but using

only the image raw pixels as input. We demonstrate that the quality

of the automatically obtained custom lexicons is superior to a generic

frequency-based baseline.

Methodology / Approach

The underlying idea of our lexicon generation method is that the topic modeling

statistical framework can be used to predict a ranking of the most probable words

that may appear in a given image. For this we propose a three-fold method: First,

we learn a LDA topic model on a text corpus associated with the image dataset.

Second, we train a deep CNN model to generate LDA's topic-probabilities directly

from the image pixels. Third, we use the generated topic-probabilities,

either from the LDA model (using textual information ) or from the CNN (using

image pixels), along with the word-probabilities from the learned LDA model to

re-rank the words of a given dictionary.

Technologies Used

  1. gensim library
  2. keras
  3. Topic modelling
  4. CNN

Repository

https://github.com/vishalbidawatka/Dynamic-Lexicon-Generation-for-Natural-Scene-Images

Comments (0)