I had decided to get my email reminders using any celebrity’s voice in my previous post Alexa AI for reminding important emails and reminders. Here is a small step towards it :D
If you don’t know how Priyanka Chopra sounds like, here is a real sample
Sample synthesized voice 1
Sample synthesized voice 2
Sample synthesized voice 3
Real voice of Amitabh Bachchan
Sample synthesized voice for Amitabh Bachchan
This one is more robotic. I’m still searching for more dataset for this, and once I have that I’d also need to finetune the hifigan
High level overview of how to build the dataset
- Collect videos + their english transcripts
- Some videos on youtube have clean transcripts, some are autogenerated which have many errors
- Make all transcripts lowercase, if the transcripts have errors, fix them manually
- I used youtube transcript downloader firefox extension to collect those
- If there are multiple speakers, remove the content from other speaker
- Use Audacity to remove noise and other disturbances :-)
References:
-
You might also need to finetune the hifigan using this notebook or colab link to remove the robotic voice
Note: This codebase is a rabbit hole, and It’s too difficult for me to write everything to get started with this, please go through the docs and github issues of tacotron implementations if you face any difficulties
Update!!
There was no interface for using these models in real world applications.
-
The model is now live on huggingface spaces here
-
To use it as a voice assistant an API has been created here
-
Want to use custom audio models in PDFs/ Ebooks? Here is a custom pdf reader written in QT