Visual AID using Image Captioning on Intel OneAPI
Vivek Muskan
Bengaluru, Karnataka
- 0 Collaborators
This visual assistant app helps blind users by generating image captions with Intel OneDNN and oneDAL reading them with Google text-to-speech library. Built using Streamlit and OneAPI toolkits, it extracts visual features and creates descriptions to provide greater awareness of surroundings. ...learn more
Project status: Published/In Market
oneAPI, Artificial Intelligence, Cloud
Intel Technologies
oneAPI,
AI DevCloud / Xeon,
Intel Python
Overview / Usage
Image captioning is the process of generating a natural language description of an image. It is a task in the field of computer vision and natural language processing. The goal of image captioning is to generate a coherent and fluent sentence that accurately describes the image content.
Methodology / Approach
This image captioning model takes an image as input and generates a textual description of the contents of the image.
The model uses a convolutional neural network (CNN) architecture to analyze the visual aspects of the input image. The CNN encodes the image into a dense feature vector capturing information about the objects, scenes, and relationships depicted.
This image vector is passed to a recurrent neural network (RNN) which generates the text caption one word at a time. The RNN uses the context vector from the CNN as it decoders the image features into a natural language sentence describing the image content.
The model is trained end-to-end on a dataset of images labeled with human-written captions. This allows the model to learn the correlations between image contents and textual descriptions.
After training, the model can generate new captions for images it hasn't seen before. The quality of the generated captions depends on the size and diversity of the training dataset.
This project demonstrates how CNN and RNN architectures can be combined to perform image to text translation. The model is able to generate basic descriptions of image contents, though there is still room for improvement in caption quality and diversity.
Technologies Used
Intel OneDNN
Intel OneDAL
TENSORFLOW
STREAMLIT