Introduction to the Video Podcast Feature

Feb 19, 2025

Looking for a fast, engaging way to transform your audio podcasts into dynamic visual experiences? Discover VisionStory’s Video Podcast feature! Effortlessly turn any two-person audio conversation into an immersive video podcast—complete with AI-powered scene generation, customisable characters, intelligent shot selection, and more. Here’s how it works:

1. Upload or Import Your Audio

Begin by uploading an audio file (such as .mp3 or .wav) or pasting a link from YouTube, TikTok, or other supported platforms. Once your file is uploaded, you can preview and trim it to highlight the best parts of your conversation—all within our user-friendly interface.

Uploading audio files for video podcasts

2. Select a Scene and Characters

Next, choose a scene to set the backdrop for your podcast—anything from a cosy studio to a virtual news desk. Then, select two speaker characters—either from your previously uploaded images or by adding new ones.

Selecting scenes and characters for the podcast

3. AI-Generated Storyboard

Once your audio and characters are ready, VisionStory’s AI takes over with intelligent segmenting and automatic shot allocation:

Audio segmentation: The system analyses the conversation flow, detecting when each speaker is talking.
Automatic shot selection: Each audio segment is matched with the most suitable shot type:
- Single-person close-up to highlight a speaker’s expression
- Single-person mid-shot for a balanced view of your host
- Two-person shot when both speakers are interacting

These storyboards are created automatically—ideal for anyone seeking professional results without advanced editing skills.

4. Fine-Tune Your Scenes and Voices

Within the storyboard editor, you can refine each shot to suit your preferences:

Switch shot types: Move from close-up to mid-shot, or use a two-person shot for both hosts.
Select alternative AI voices for each host to match your desired tone or style.
Swap characters: Instantly change which person appears in each segment for optimal visual flow.

Fine-tuning scenes and voices in podcast

5. One-Click Aspect Ratio Switching

Creating content for multiple platforms? No problem. Effortlessly toggle between 16:9 for standard landscape and 9:16 for vertical formats. The scene, characters, and shots all automatically adjust to the new aspect ratio—ensuring your video looks professional on every platform.

6. Generate Your Final Video

Once you’re happy with the storyboard and settings, simply click Generate to produce your complete video podcast. VisionStory’s rapid rendering engine brings together your background scene, characters, audio, and camera transitions. In just moments, your immersive, AI-powered video podcast will be ready to engage your audience!

Preparing Your Podcast Audio & Key Usage Tips

1. Getting Your Audio

No ready-made podcast file? Use tools like NotebookLM by Google to generate speech audio from text.
VisionStory will soon offer a similar service, allowing you to create a podcast entirely from text within our platform.

2. Speaker Separation Limitations

Currently, our system cannot perfectly separate overlapping voices. If two hosts speak at the same time, the voice changer feature may not function accurately.
For best results, use clear audio where only one person speaks at a time.

3. Subscription Requirement

While anyone can upload podcast audio and generate a storyboard with AI-powered speakers, scenes, and shots, final video podcast generation is available to Pro Plan and above subscribers. If you’re not yet a member, consider subscribing to unlock this feature.

4. Video Length & Credits

Currently, generated videos are limited to 10 minutes in length, regardless of subscription tier.
Monitor your credit usage according to your plan; longer or more complex videos will consume additional credits.

Why Choose VisionStory’s Video Podcast Feature?

1. Versatile Use Cases

Content Creators: Effortlessly add a visual element to interviews or co-hosted shows.
Marketing Teams: Promote products or host discussions that captivate audiences on social media.
Educators & Trainers: Create engaging lesson recaps or remote webinars with a more personable approach.

2. AI-Powered Editing

Save hours of manual editing and shot selection. VisionStory’s algorithms handle the technical details for you.

3. Highly Customisable

From backgrounds to voices and aspect ratios, you have full control over the final look and feel.

4. Professional Quality, Minimal Effort

Produce polished, dynamic video content without advanced editing skills or a full production team.

Transform your two-person conversations into immersive video podcasts in just a few simple steps. Thanks to VisionStory’s AI-driven technology, creating professional, visually engaging podcast episodes has never been easier!