From AI storyboard to generated video
A guide to moving from story idea to shot planning, references, image generation, and AI video creation.
TL;DR
The storyboard-to-video pipeline has 5 stages: story breakdown → reference generation → shot planning → video generation → assembly. Each stage uses different AI models at different StarPoints costs. This guide maps the full pipeline with costs.
1. Break the story into shots
Start by dividing the story into scenes and shots. Each shot should have: shot number, description, visual reference, motion direction, estimated duration. Use DeepSeek V4 (1 SP) or Prompt Polish (3 SP) to help structure the breakdown.
2. Generate references and key visuals
For each shot, generate a reference image before generating video. Use Fast Image (10 SP) for drafts, Standard Image (20 SP) for approved references. These reference images become the input frames for video generation.
3. Plan the motion for each shot
For each shot, write a motion prompt describing camera and subject movement. Example: "Camera slowly pushes in on the detective's face as she reads the letter. Shallow depth of field. Noir lighting. 5 seconds."
4. Generate video clips
Use Wan 2.7 (30 SP/5s) for initial motion tests. After validating motion direction, upgrade to Seedance 2.0 (450 SP/5s) for final output. Generate one shot at a time to maintain quality control.
5. Assemble and review
Export generated clips and assemble in your video editor. Compare against the original storyboard to verify shot coverage. Flag any shots that need regeneration.
Cost estimate for a 5-shot storyboard
Story breakdown (text): 5 SP. Reference images (5 × 20 SP): 100 SP. Motion tests (5 × 30 SP × 2 retries): 300 SP. Final video (5 × 450 SP): 2,250 SP. Total: ~2,655 SP. A Pro plan (3,000 SP/month) covers one 5-shot storyboard per month.
Limitations
Shot-to-shot consistency remains the biggest challenge — clothing details, lighting, and facial features may drift between shots. Always generate all shots in the same session using the same model and reference assets. Expect to regenerate 30-40% of shots for consistency. Video duration tops out at 15 seconds per shot for most models.
