Video generation models

Updated April 23, 2026See Cinema Motion app →

Cinema Motion gives you multiple models to generate video from a text prompt or a reference image. They vary in maximum resolution, duration, and what controls appear in the interface. Some generate native audio alongside video. WAN 2.2 supports LoRA training for consistent characters. Kling O1 offers four distinct generation modes, including video editing.

Cinema Motion interface with the model selector in the top-left corner
Cinema Motion interface with the model selector in the top-left corner

Model overview

ModelResolutionDurationAudioUnique controls
LTX 2.3720p5s
Hailuo 02768p6s
Seedance 2.01080p~15sInline reference prompting, multimodal input (text, image, audio, video)
Seedance Pro720p5s
WAN 2.2720p5sLoRA Library, Train Avatar
WAN 2.6720p5sVideo mode, Single/Multi-shot
Kling O15s4 modes: Text / Image / Reference / Edit
Runway Gen 4.55s
Runway Gen-45s
Runway Gen-4 Aleph720p
Kling 2.6 Pro5sNatural motion mode
Kling 3.0Standard7s
Veo 3 Fast1080p8s
Veo 3.1 Fast1080p8s
Veo 31080p8s
Veo 3.11080p6sReferences input
The model list changes as new models are added or retired. Always check the model selector in Cinema Motion for the current selection.

WAN 2.2 — trainable model

WAN 2.2 supports LoRA training in Cinema Motion. You can teach it a specific person or character by uploading 15–30 photos, then apply that LoRA directly in video generation — keeping your character consistent without generating an intermediate image first.

When you select WAN 2.2, two extra controls appear: LoRA Library (to pick a trained character) and a Train Avatar toggle in the top-right corner (to start a new training session). See Train an avatar LoRA for the full workflow.

Kling O1 — four generation modes

Kling O1 has a mode selector in the control bar that switches between four distinct workflows:

Text — generates video from a prompt only. The default mode.

Image — animates a starting image. Shows Start Frame and End Frame inputs. Aspect ratio is determined by the uploaded image, not by a selector.

Reference — uses existing footage as a style or composition guide without animating it directly. Shows two separate inputs: Video (a single reference video) and References (one or more images).

Edit — edits an existing video based on your prompt. Shows a Video input and an Original audio control (Remove or Keep). Duration and aspect ratio are inherited from the source video.

Kling O1 in Image mode — Start Frame and End Frame controls
Kling O1 in Image mode — Start Frame and End Frame controls

Models with native audio

Several models generate audio — music, ambience, or speech — alongside the video. An Audio toggle appears in the control bar.

  • Veo 3.1, Veo 3.1 Fast — 1080p output
  • Veo 3, Veo 3 Fast — 1080p output, audio toggle
  • Kling 3.0 — Standard resolution, audio toggle
  • Kling 2.6 Pro — Natural motion mode
  • WAN 2.6 — adds Video mode for image-to-video
If you don't describe any sounds in your prompt, the model generates audio based on the visual context of the scene. Add specific descriptions — "gentle rain", "crowd cheering", "upbeat music" — for more precise results.

Veo 3.1 — References instead of Start Frame

Veo 3.1 adds a References input alongside Start Frame. This lets you attach multiple images to guide the visual style of the output alongside Start Frame, which controls the opening frame of the video.

WAN 2.6 — multi-shot sequences

WAN 2.6 adds a Single/Multi-shot selector. Multi-shot generates a sequence of connected clips from a single prompt, useful for longer narratives. A Video mode button also appears for image-to-video workflows.

Switch your model

Open the model selector in the top-left corner of Cinema Motion. WAN 2.2 shows a Trainable badge. Selecting any model immediately updates the controls in the bottom bar.

Start with LTX 2.3 for general use. Switch to the Veo series for 1080p or audio output, to WAN 2.2 for trained character consistency, and to Kling O1 when you need to edit an existing video.