Speech Recognition System

Niveditha Gokulmuthu

Niveditha Gokulmuthu

Bengaluru, Karnataka

The Speech Recognition System is an artificial intelligence-based system that recognizes and transcribes spoken language into written text or other forms of output. ...learn more

Project status: Under Development

oneAPI, Artificial Intelligence, Cloud

Intel Technologies
oneAPI

Code Samples [1]

Overview / Usage

The inspiration for creating speech recognition systems can be traced back to the desire to enable computers to interact with humans more naturally and efficiently. The ability to understand and interpret human speech would allow computers to perform tasks that previously required human intervention, such as transcribing audio recordings or responding to voice commands.

Early research into speech recognition systems began in the 1950s, and progress accelerated in the 1970s with the development of Hidden Markov Models (HMMs) and Dynamic Time Warping (DTW) algorithms. These techniques enabled computers to recognize isolated words and simple phrases accurately.

In the 1980s, the development of artificial neural networks and the availability of more powerful computers led to significant advances in speech recognition technology. With deep learning techniques in the 2010s, speech recognition systems became even more accurate and robust, allowing for real-time transcription and voice-controlled interfaces.

Today, speech recognition systems are used in various applications, from virtual assistants like Siri and Alexa to automated customer service and transcription services. The development of speech recognition technology promises to make human-computer interaction even more seamless and intuitive in the coming years.

Methodology / Approach

A speech recognition system using RNN (Recurrent Neural Network) is a deep learning model that teaches sequential data, such as sp processes. Our system trains the model on a large dataset of speech samples and their corresponding transcriptions. The RNN architecture allows the model to analyze the audio signal in a time-dependent manner, considering the temporal dependencies between different parts of the audio signal.

The RNN model typically consists of several layers of recurrent cells, each of which takes in an input vector and a hidden state vector from the previous time step. The isolated state vector is updated at each time step based on the input vector and the last remote state vector, allowing the model to learn temporal patterns in the input data.

First, we import libraries.

Understand the data.

Create a Correlation and visualize it.

Test Different Models and find the best model out of it.

Train the model on intel oneAPI and Speech recognition API.

Save the Model.

Technologies Used

oneAPI, Jupyter Notebook, Anaconda, RNN, Google Cloud API, Speech Recognition API, tensorflow

Repository

https://github.com/nivgokul/Speech-Recognition-System

Collaborators

There are no people to show.

Comments (0)