Audio & Voice

Which languages are supported by VisionStory?
VisionStory supports over 30 major languages worldwide, including but not limited to: English, Chinese, Spanish, Arabic, Portuguese, Russian, Japanese, Punjabi, German, French, Korean, Turkish, Tamil, Vietnamese, Hindi, Bengali, Urdu, Persian, Italian, Indonesian, Thai, Marathi, Telugu, Ukrainian, Malay, Romanian, Polish, Dutch, Gujarati, and Kannada.
How many voices are available in VisionStory’s voice library, and can I customise them?
VisionStory provides a library of over 200 voices, which you can filter by gender, age, and use case. If you do not find a suitable voice, you also have the option to create a custom AI voice clone by uploading or recording your own audio.
Why are there fewer voice options available in my language?
The smaller selection of voices in some languages is due to those voices being specially optimised for that language. However, many of the English voices can also speak other languages, giving you greater flexibility when choosing a voice for your project.
What is voice cloning, and how do I clone a voice?
Voice cloning enables you to create a custom AI voice that replicates a particular voice by uploading or recording audio. To clone a voice, make sure the audio is recorded clearly in a quiet environment for the best results.
Is voice cloning free?
To use voice cloning in video creation, you need to subscribe to the Pro plan or above.
How many languages are supported for voice cloning?
Voice cloning is freely supported in over 32 languages. The list of supported languages may change, so please check the voice cloning feature for the most up-to-date options. Please note: while cloning is free, you will need a subscription to use the cloned voice in video generation.
What is preview audio, and what are its benefits?
Preview audio enables you to generate the speech for your talking video before creating the final version. This feature allows you to check the voice, pronunciation, and pauses to ensure they meet your requirements. You can adjust the voice as needed before generating the video, which uses credits. To access preview audio, you need to be on the Pro Plan or above, and each plan offers a different preview quota.
What does the stopwatch icon and +0.5s mean?
The stopwatch icon and +0.5s feature let you insert a 0.5-second pause into the generated voice. You can add several stopwatch icons in a row to create longer pauses as required in your video.
What is URL import, and which URLs are supported?
URL import enables you to extract and use audio from a provided link for video creation. At present, VisionStory supports links from YouTube and TikTok. If you wish to see support for additional sites, please let us know. You can also use the voice changer feature to alter the imported audio while retaining the original content.
What is the remove noise feature?
The remove noise feature helps to eliminate background noise from audio when you import or record it, resulting in clearer audio quality for your videos. This feature requires a Pro Plan or higher to use.
What is the voice changer feature?
The voice changer feature allows you to alter the voice in a speech, enabling you to create unique versions of the audio while preserving the original content. This feature requires a Pro Plan or higher to use.
Can I control the emotion of the voice?
The emotion in the voice is determined by the text you provide. The text-to-speech (TTS) system automatically applies the appropriate emotion based on your wording, so there is no need for any extra controls.
What should I bear in mind when using the stopwatch (pause) feature?
When using the stopwatch feature, each stopwatch icon adds a 0.5-second pause, and you can use them in succession to create longer pauses, up to a maximum of 3 seconds. However, it’s best not to use more than two consecutive pauses within a single text segment, as this may cause the AI to generate unexpected sounds or artefacts.