Neural Voice Cloning with Few Samples
- 0 Collaborators
Implementation of Neural Voice Cloning with Few Samples project. And implementation of efficient multi-speaker speech synthesis on Tacotron-2 ...learn more
Project status: Published/In Market
Intel Technologies
Intel Python
Overview / Usage
The problem being solved is efficient neural voice Synthesis of a person’s Voice given only a few samples of his Voice. Current methods either rely heavily on a lot of data or an not good enough. We aim to solve this by building an encoder which first captures a person’s speech characteristic by encoding his voice In a high dimensional latent space. Then a voice generator generates voice conditioned on this high dimensional vector.
Methodology / Approach
A speaker encoder is developed consisting of an architecture of 1 dimensional convolutions followed by Multi head attention. The other architecture is a LSTM based recurrent speaker encoder. These two encoders embed important speaker characteristics of an individual in a high dimensional latent space. This vector is then taken and a generative model conditioned on this vector generates a speech very similar to original person’s Voice.
Technologies Used
Python, pytorch, librosa, GCP, AWS
Repository
https://github.com/Sharad24/Neural-Voice-Cloning-with-Few-Samples