Audio & Voice

Which languages are supported?
We support over 30 major languages globally, including but not limited to: English, Chinese, Spanish, Arabic, Portuguese, Russian, Japanese, Punjabi, German, French, Korean, Turkish, Tamil, Vietnamese, Hindi, Bengali, Urdu, Persian, Italian, Indonesian, Thai, Marathi, Telugu, Ukrainian, Malay, Romanian, Polish, Dutch, Gujarati, and Kannada.
How many voices are available in VisionStory’s voice library, and can I customise them?
VisionStory provides over 200 voices in its library, which you can filter by gender, age, and use case. If you don’t find a voice that suits your needs, you can also create a custom AI voice clone by uploading or recording your own audio.
Why are there fewer voice options available in my language?
Some languages have fewer voice options because those voices are specially fine-tuned for that language. However, many of our English voices can speak multiple languages, so you still have flexibility when choosing a voice for your project.
What is voice cloning, and how do I clone a voice?
Voice cloning lets you create a custom AI voice that imitates a specific person’s voice by uploading or recording audio. To clone a voice, make sure your recording is clear and done in a quiet environment for the best results.
Is voice cloning free?
To use voice cloning in video creation, you'll need to be on the Pro plan or above.
How many languages does voice cloning support?
Voice cloning is freely supported in over 32 languages. The list of supported languages may change, so please check the voice cloning feature for the latest options. Please note: while cloning is free, you’ll need a subscription to use the cloned voice in video generation.
What is preview audio, and what are its benefits?
Preview audio lets you generate and listen to the speech for your talking video before creating the final version. This feature helps you check the voice, pronunciation, and pauses to make sure they’re just right. You can tweak the voice as needed before using credits to generate the video. For all subscribers, preview audio is free to use, with your preview quota resetting each day. If you use up your daily quota, you can buy extra preview quota with credits.
What does the stopwatch icon and +0.5s mean?
The stopwatch icon and +0.5s feature let you add a 0.5-second pause to the generated voice. You can use multiple stopwatch icons in a row to create longer pauses in your video as needed.
What is URL import, and which URLs are supported?
URL import lets you bring in audio from a link by downloading and extracting the audio from the specified URL for use in video creation. At the moment, it supports links from YouTube and TikTok. If you’d like support for other sites, please get in touch with us. You can also use the voice changer feature to alter the imported audio while keeping the original content.
What is the remove noise feature?
The remove noise feature helps get rid of background noise from your audio when you import or record it, making sure your videos have clearer sound quality. To use this feature, you’ll need to be on the Pro Plan or higher.
What is the voice changer feature?
The voice changer feature lets you alter the voice in a speech, so you can create unique versions of the audio while keeping the original content. To use this feature, you’ll need a Pro Plan or higher.
Can I control the emotion of the voice?
The emotion in the voice is determined by the text you enter. When you use different wording, the text-to-speech (TTS) system automatically applies the suitable emotion, so there’s no need for any extra controls.
What should I keep in mind when using the stopwatch (pause) feature?
When using the stopwatch feature, each stopwatch icon adds a 0.5-second pause. You can use them in a row to create longer pauses, up to a maximum of 3 seconds. However, it’s best not to use more than two pauses in a row within a single text segment, as this can sometimes cause the AI to generate unexpected sounds or glitches.