![]() ![]() Tortoise TTS is licensed under the Apache 2.0 license. Their employer was not involved in any facet of Tortoise's development. Tortoise was built entirely by the author (James Betker) using their own hardware. Patrick von Platen whose guides on setting up wav2vec were invaluable to building my dataset.lucidrains who writes awesome open source pytorch models, many of which are used here.Kim and Jung who implemented univnet pytorch model.Jang et al who developed and open-sourced univnet, the vocoder this repo uses.Nichol and Dhariwal who authored the (revision of) the code that drives the diffusion model.Ramesh et al who authored the DALLE paper, which is the inspiration behind Tortoise.Hugging Face, who wrote the GPT model and the generate API used by Tortoise, and who hosts the model weights.I am standing on the shoulders of giants, though, and I want toĬredit a few of the amazing folks in the community that have helped make this happen: This project has garnered more praise than I expected. tts_with_preset( "your text here", voice_samples = reference_clips, preset = 'fast') Acknowledgements TextToSpeech( use_deepspeed = True, kv_cache = True, half = True) run tortoise python setup install script. ![]() change the current directory to tortoise-tts.install pytorch with the command provided here:.create conda environment with minimal dependencies specified.Then run the following commands, using anaconda prompt as the terminal (or any other terminal configured to work with conda) Will spend a lot of time chasing dependency problems. I have been told that if you do not do this, you On Windows, I highly recommend using the Conda installation path. If you want to use this on your own computer, you must have an NVIDIA GPU. Unfortunately, this proejct seems no longer to be active. See this page for a large list of example outputs.Ī cool application of Tortoise + GPT-3 (not affiliated with this repository). not so slow anymore now we can get a 0.25-0.3 RTF on 4GB vram and with streaming we can get < 500 ms latency !!! Demos On a K80, expect to generate a medium sized sentence every 2 minutes. It leverages both an autoregressive decoder and a diffusion decoder both known for their low Tortoise is a bit tongue in cheek: this model I'm naming my speech-related repos after Mojave desert flora and fauna. ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |