Assistant for the visually impaired people using intel powered technologies.

Pranab Sarkar

Pranab Sarkar

Jalpaiguri, West Bengal

According to the World Health Organization, around 40 million people in the world are blind, while another 250 million have some form of visual impairment. The main objective of this project is to solve the daily life problems faced by them using AI and IoT technologies. Here, the users will give voice commands to get the assistance from the device. The device which will assist the users in various ways as listed:- 1. Environment Activity. 2. Lane Detection. 3. Basic Information ( Date, time, location, calendar). 4. Drop a message. 5. Emotion Detection. ...learn more

Project status: Under Development

Internet of Things, Artificial Intelligence, Graphics and Media

Intel Technologies
Intel Opt ML/DL Framework

Overview / Usage

The main objective of this project is to solve the daily life problems faced by them using AI and IoT technologies.

Here, the users will give voice commands to get the assistance from the device. The device which will assist the users in various ways as listed:-

  1. Environment Activity.
  2. Lane Detection.
  3. Basic Information ( Date, time, location, calendar).
  4. Drop a message.
  5. Emotion Detection.

I have created a prototype version to detect the various environmental activities, the working video is attached under in the video links.

Dataset for Activity Detection : https://forms.illinois.edu/sec/1713398

In this prototype version instead of using the camera, the user is manually selecting images for the activity prediction.

The final device will be totally different, It will be an IoT device where it will take the input from the camera and microphone and compute them using complex deep learning models to assist the users, the architecture of the deep learning models as well as the device is attached in the image section.

Methodology / Approach

The entire Approach:

The user asks for the assistance --> Microphone --> IoT device -->Camera--> IoT device -->Natural Language Processing and Image Processing --> Deep Learning Models --> Prediction --> text to voice --> speaker.

The Prototype version of activity detection is trained in Google Colab.

While training the model(activity detection) I have gone through the following steps:

  1. Data collection : Training Set — 6000 images, Dev Set — 1000 images, Test Set — 1000 image.
  2. Understanding the data: One of the files is “Flickr8k.token.txt” which contains the name of each image.
  3. Data Cleaning: Cleaning the text data.
  4. Loading the training set: The text file “Flickr_8k.trainImages.txt” contains the names of the images that belong to the training set.
  5. Data Preprocessing — Images: Implemented transfer learning by using the InceptionV3 model (Convolutional Neural Network).
  6. Data Preprocessing — Captions.
  7. Data Preparation using Generator Function
  8. Word Embeddings: Map the every word (index) to a 200-long vector and for this purpose, we will use a pre-trained GLOVE Model
  9. Main model: Used LSTM to process the sequence input (partial captions in our case). Compile the model with loss as categorical_crossentropy and using adam optimizer you can test result by trying RMSProp as well as Momentum.
  10. Inference

After the model is ready, it is made ready for inference via conversation with the help of gTTs and speech_recognition.

Technologies Used

Python
Keras(TensorFlow)
gTTs
speech_recognition

Collaborators

There are no people to show.

Comments (0)