The Pipelines

How Vid works.

Two distinct pipelines — one self-serve, one managed. Both cinematic.

Pipeline 1 — Self-Serve

VID Automated Workflow

Topic → Claude script → HeyGen v3 avatar video · 1,050 cr · 20–45 min

01

Brief — Topic Input

Topic fieldAnything from a product name to a full content brief — Claude fills the gaps
Style selectorProfessional, Casual, Educational, or News broadcast tone
Avatar pickerChoose from HeyGen v3 public avatar library — 50+ photorealistic presenters
Voice selector100+ voices by language, gender, and character — or use avatar default
Credit check1,050 credits reserved upfront — refunded automatically on failure
02

Script — Claude Sonnet 4.6

Research passClaude builds context on the topic — facts, angles, structure
Structured JSON outputScript sections with section titles, body text, lower-third cues, and duration estimate
Style blocksEach style maps to a visual directive injected into the HeyGen prompt (e.g. GEOMETRIC BOLD, RED WIRE, SWISS PULSE)
Duration targetWord count → speech rate → seconds estimate. Full script passed to HeyGen as creative direction, not verbatim lines
CTA generationClear call to action at end of every script
03

Render — HeyGen v3 Video Agent

POST /v3/video-agentsPrompt-based — not slide-by-slide. HeyGen's AI director interprets the full script and style block
B-roll + motion graphicsHeyGen adds stock footage, animated counters, and transitions automatically based on content
Two-step pollingSession ID → video_id assigned → video completed. Wall time 20–45 min. 540 polls × 5s = 45 min max
Landscape 1080p16:9 output, production quality, downloadable MP4
Credits on completion1,050 credits deducted only after successful render — no charge on failure
Pipeline 2 — Managed Service

Catalog Video Pipeline

Still image → Seedance motion → multi-shot montage · Fashion & editorial · From €499/run

01

Anchor Image — Visual Foundation

FLUX.1 Dev / ProText-to-image with portrait_4_3 format, 28 inference steps
HY-WU Try-OnVirtual garment try-on for fashion catalog accuracy
IP-AdapterFace-identity-preserving generation from model anchor
Visual DirectorClaude builds a structured 10-field prompt from rough direction
MIRE mapping7 layers: Environment, Light, Colour, Props, Pose, Composition, Mood Tags
Character sheet4-angle reference grid (front, 3/4, profile, over-shoulder) for consistency locking
Realism Phase 1Media type profile injected at image stage — VHS, 16mm, documentary, etc.
02

Image → Video — Motion Layer

SeedanceImage-anchored video generation, consistency 65, duration 4–15s
34 Camera Movements8 categories: static, horizontal, vertical, depth, circular, aerial, handheld, complex
suggestMovement()MIRE emotional intent → closest camera movement via keyword matching
ElevenLabs dialogueDetects quoted speech → extracts lines → generates audio → passes @Audio1 to Seedance
Async 202 patternReturns immediately, Seedance runs fire-and-forget, client polls /video-status every 5s
Golden Negative PromptsNo subtitles, no music, no text overlays, no title cards, no watermarks
Realism Phase 2–3Composition rules: close-up fills 60%+ frame, max 3 subjects, simple continuous motion
03

Multi-Shot Montage — Cinematic Assembly

Shot list scriptingClaude scripts ELS→LS→MLS→MS→MCU→CU→ECU→LA→HA sequence with hero shot marking (★ HERO)
Camera vocabularyAll 34 movements injected into system prompt — Claude picks per-shot
MIRE per-shot hintsCinematography, context, style, audio atmosphere injected as labeled fields
Seedance parallelEach shot rendered independently with shared Image1 identity anchor
Flat prompt assemblyShots joined with [Shot cut] separators into single Seedance-ready prompt
Grid shot pickerSelect any subset of 9 shot types — FLUX generates them in parallel

Sound Studio

Suno AI music generation — descriptive or custom lyrics + tags + title

MIRE auto-style: mood profile → musical genre tags automatically

buildMusicPrompt() → full sentence from environment + light + colour + mood

Waveform visualizer via Web Audio API OfflineAudioContext

A/B comparison — both Suno tracks shown side by side

Extend to match video duration — repeats from start via /sound/extend

Brand Sound library — job-level signature track stored in metadata

Attach to scene → audio_url travels with unit into video pipeline

Skin Studio

Magnific AI (via Freepik) — creative and precision upscale modes

Engines: Illusio (photorealistic), Sharpy (high detail), Sparkle (editorial)

5 presets: Glass Skin, Natural Editorial, High Fashion, Authentic Raw, Dewy Commercial

4 sliders: Pore Texture (creativity), Definition HDR (hdr), Fidelity (resemblance), Depth Detail (fractality)

Skin character: Finish × Undertone × Pore Preset × Condition

Toggles: SSS Glow, Authenticity Seeds, 2-Pass Pipeline, Set as Approved

Before/After split viewer with download link

Studio Munich Assistant