Neural Voice Cloning with Few Samples

Sharad Chitlangia

Goa

6 0

0 Collaborators

Implementation of Neural Voice Cloning with Few Samples project. And implementation of efficient multi-speaker speech synthesis on Tacotron-2 ...learn more

Project status: Published/In Market

Artificial Intelligence

Intel Technologies
Intel Python

Code Samples [1]Links [1]

Overview / Usage

The problem being solved is efficient neural voice Synthesis of a person’s Voice given only a few samples of his Voice. Current methods either rely heavily on a lot of data or an not good enough. We aim to solve this by building an encoder which first captures a person’s speech characteristic by encoding his voice In a high dimensional latent space. Then a voice generator generates voice conditioned on this high dimensional vector.

Methodology / Approach

A speaker encoder is developed consisting of an architecture of 1 dimensional convolutions followed by Multi head attention. The other architecture is a LSTM based recurrent speaker encoder. These two encoders embed important speaker characteristics of an individual in a high dimensional latent space. This vector is then taken and a generative model conditioned on this vector generates a speech very similar to original person’s Voice.

Technologies Used

Python, pytorch, librosa, GCP, AWS

Repository

https://github.com/Sharad24/Neural-Voice-Cloning-with-Few-Samples

You have disabled JavaScript

We are sorry, but without JavaScript we are currently unable to display the latest activity feed. Please, enable Javascript in your browser.

Neural Voice Cloning with Few Samples

Sharad Chitlangia

Overview / Usage

Methodology / Approach

Technologies Used

Repository

Other links

Login to continue

This action requires you to be logged in.

Thanks for voting. Please leave a comment.

Neural Voice Cloning with Few Samples

Sharad Chitlangia

Overview / Usage

Methodology / Approach

Technologies Used

Repository

Other links

Login to continue

This action requires you to be logged in.