VisionStory supports over 30 major languages, including English, Chinese, Spanish, Arabic, Portuguese, Russian, Japanese, Punjabi, German, French, Korean, Turkish, Tamil, Vietnamese, Hindi, Bengali, Urdu, Persian, Italian, Indonesian, Thai, Marathi, Telugu, Ukrainian, Malay, Romanian, Polish, Dutch, Gujarati, and Kannada.
How many voices does VisionStory offer, and can I personalize them?
VisionStory provides a library of over 200 voices, which you can filter by gender, age, and use case. If you don’t find a voice that fits your needs, you can create a custom AI voice clone by uploading or recording your own audio.
Why do I see fewer voice choices in my language?
Some languages have fewer voice options because those voices are specially optimized for that language. However, many English voices can also speak other languages, so you can use them for more flexibility when selecting a voice.
What is voice cloning and how do I create a cloned voice?
Voice cloning is a feature that lets you create a custom AI voice that sounds like a specific person by uploading or recording audio. To clone a voice, simply provide a clear audio recording in a quiet environment for the best results.
Is there a cost for voice cloning?
Voice cloning is free for English, Spanish, Japanese, and Chinese, so you can test if the cloned voice matches your own. However, to use your cloned voice in video generation, you need to subscribe to the Pro Plan or higher. For voice cloning in other languages, a Pro Plan or above is also required.
How many languages does VisionStory support for voice cloning?
Voice cloning is freely supported in four languages: English, Spanish, Japanese, and Chinese. Additional languages are available but require a Pro Plan or higher. The list of supported languages may change, so please check the voice cloning feature for the latest options. Note that while cloning is free in these four languages, using the cloned voice in video generation requires a subscription.
What is preview audio and why should I use it?
Preview audio lets you generate and listen to the speech for your talking video before creating the final video. This helps you review the voice, pronunciation, and pauses to make sure they sound right. You can adjust the voice as needed before using credits to generate the video. Preview audio is available to Pro Plan and higher subscribers, with a daily quota that resets each day. If you use up your daily quota, you can buy additional preview quota with credits.
What is the purpose of the stopwatch icon and the +0.5s option?
The stopwatch icon and +0.5s option let you add a 0.5-second pause in the generated voice. You can use multiple stopwatch icons in a row to create longer pauses in your video as needed.
What is the URL import feature, and which types of URLs can I use?
The URL import feature lets you extract and use audio from a video link for your VisionStory project. Currently, you can import audio from YouTube and TikTok links. If you need support for other platforms, please let us know. You can also use the voice changer to modify the imported audio while preserving the original content.
What does the remove noise feature do, and who can use it?
The remove noise feature reduces background noise from your imported or recorded audio, resulting in clearer sound quality for your videos. This feature requires a Pro Plan or higher to use.
What does the voice changer feature do?
The voice changer feature lets you alter the voice in your audio, allowing you to create unique versions of the speech while keeping the original content. This feature requires a Pro Plan or higher to use.
Can I adjust the emotion expressed by the AI voice?
The emotion in the AI voice is determined by the text you provide. The text-to-speech (TTS) system automatically interprets and applies the appropriate emotion based on your input, so you don't need to manually control it.
What are the best practices for using the stopwatch (pause) feature?
Each stopwatch icon adds a 0.5-second pause to your video’s speech. You can use multiple stopwatches in a row to create longer pauses, up to a total of 3 seconds. However, it’s recommended not to use more than two consecutive pauses in a single text segment, as this may cause the AI to generate unexpected sounds or artifacts.