Cinema Motion gives you multiple models to generate video from a text prompt or a reference image. They vary in maximum resolution, duration, and what controls appear in the interface. Some generate native audio alongside video. WAN 2.2 supports LoRA training for consistent characters. Kling O1 offers four distinct generation modes, including video editing.

Model overview
| Model | Resolution | Duration | Audio | Unique controls |
|---|---|---|---|---|
| LTX 2.3 | 720p | 5s | — | — |
| Hailuo 02 | 768p | 6s | — | — |
| Seedance 2.0 | 1080p | ~15s | ✓ | Inline reference prompting, multimodal input (text, image, audio, video) |
| Seedance Pro | 720p | 5s | — | — |
| WAN 2.2 | 720p | 5s | — | LoRA Library, Train Avatar |
| WAN 2.6 | 720p | 5s | ✓ | Video mode, Single/Multi-shot |
| Kling O1 | — | 5s | — | 4 modes: Text / Image / Reference / Edit |
| Runway Gen 4.5 | — | 5s | — | — |
| Runway Gen-4 | — | 5s | — | — |
| Runway Gen-4 Aleph | 720p | — | — | — |
| Kling 2.6 Pro | — | 5s | ✓ | Natural motion mode |
| Kling 3.0 | Standard | 7s | ✓ | — |
| Veo 3 Fast | 1080p | 8s | ✓ | — |
| Veo 3.1 Fast | 1080p | 8s | ✓ | — |
| Veo 3 | 1080p | 8s | ✓ | — |
| Veo 3.1 | 1080p | 6s | ✓ | References input |
WAN 2.2 — trainable model
WAN 2.2 supports LoRA training in Cinema Motion. You can teach it a specific person or character by uploading 15–30 photos, then apply that LoRA directly in video generation — keeping your character consistent without generating an intermediate image first.
Kling O1 — four generation modes
Kling O1 has a mode selector in the control bar that switches between four distinct workflows:
Text — generates video from a prompt only. The default mode.
Image — animates a starting image. Shows Start Frame and End Frame inputs. Aspect ratio is determined by the uploaded image, not by a selector.
Reference — uses existing footage as a style or composition guide without animating it directly. Shows two separate inputs: Video (a single reference video) and References (one or more images).
Edit — edits an existing video based on your prompt. Shows a Video input and an Original audio control (Remove or Keep). Duration and aspect ratio are inherited from the source video.

Models with native audio
Several models generate audio — music, ambience, or speech — alongside the video. An Audio toggle appears in the control bar.
- Veo 3.1, Veo 3.1 Fast — 1080p output
- Veo 3, Veo 3 Fast — 1080p output, audio toggle
- Kling 3.0 — Standard resolution, audio toggle
- Kling 2.6 Pro — Natural motion mode
- WAN 2.6 — adds Video mode for image-to-video
Veo 3.1 — References instead of Start Frame
Veo 3.1 adds a References input alongside Start Frame. This lets you attach multiple images to guide the visual style of the output alongside Start Frame, which controls the opening frame of the video.
WAN 2.6 — multi-shot sequences
WAN 2.6 adds a Single/Multi-shot selector. Multi-shot generates a sequence of connected clips from a single prompt, useful for longer narratives. A Video mode button also appears for image-to-video workflows.
Switch your model
Open the model selector in the top-left corner of Cinema Motion. WAN 2.2 shows a Trainable badge. Selecting any model immediately updates the controls in the bottom bar.