Cinema Motion gives you multiple models to generate video from a text prompt or a reference image. They vary in maximum resolution, duration, and what controls appear in the interface. Some generate native audio alongside video. WAN 2.2 supports LoRA training for consistent characters. Kling O1 offers four distinct generation modes, including video editing.

Cinema Motion interface with the model selector in the top-left corner

Model overview

Model	Resolution	Duration	Audio	Unique controls
LTX 2.3	720p	5s	—	—
Hailuo 02	768p	6s	—	—
Seedance 2.0	1080p	~15s	✓	Inline reference prompting, multimodal input (text, image, audio, video)
Seedance Pro	720p	5s	—	—
WAN 2.2	720p	5s	—	LoRA Library, Train Avatar
WAN 2.6	720p	5s	✓	Video mode, Single/Multi-shot
Kling O1	—	5s	—	4 modes: Text / Image / Reference / Edit
Runway Gen 4.5	—	5s	—	—
Runway Gen-4	—	5s	—	—
Runway Gen-4 Aleph	720p	—	—	—
Kling 2.6 Pro	—	5s	✓	Natural motion mode
Kling 3.0	Standard	7s	✓	—
Veo 3 Fast	1080p	8s	✓	—
Veo 3.1 Fast	1080p	8s	✓	—
Veo 3	1080p	8s	✓	—
Veo 3.1	1080p	6s	✓	References input

The model list changes as new models are added or retired. Always check the model selector in Cinema Motion for the current selection.

WAN 2.2 — trainable model

WAN 2.2 supports LoRA training in Cinema Motion. You can teach it a specific person or character by uploading 15–30 photos, then apply that LoRA directly in video generation — keeping your character consistent without generating an intermediate image first.

When you select WAN 2.2, two extra controls appear: LoRA Library (to pick a trained character) and a Train Avatar toggle in the top-right corner (to start a new training session). See Train an avatar LoRA for the full workflow.

Kling O1 — four generation modes

Kling O1 has a mode selector in the control bar that switches between four distinct workflows:

Text — generates video from a prompt only. The default mode.

Image — animates a starting image. Shows Start Frame and End Frame inputs. Aspect ratio is determined by the uploaded image, not by a selector.

Reference — uses existing footage as a style or composition guide without animating it directly. Shows two separate inputs: Video (a single reference video) and References (one or more images).

Edit — edits an existing video based on your prompt. Shows a Video input and an Original audio control (Remove or Keep). Duration and aspect ratio are inherited from the source video.

Kling O1 in Image mode — Start Frame and End Frame controls

Models with native audio

Several models generate audio — music, ambience, or speech — alongside the video. An Audio toggle appears in the control bar.

Veo 3.1, Veo 3.1 Fast — 1080p output
Veo 3, Veo 3 Fast — 1080p output, audio toggle
Kling 3.0 — Standard resolution, audio toggle
Kling 2.6 Pro — Natural motion mode
WAN 2.6 — adds Video mode for image-to-video

If you don't describe any sounds in your prompt, the model generates audio based on the visual context of the scene. Add specific descriptions — "gentle rain", "crowd cheering", "upbeat music" — for more precise results.

Veo 3.1 — References instead of Start Frame

Veo 3.1 adds a References input alongside Start Frame. This lets you attach multiple images to guide the visual style of the output alongside Start Frame, which controls the opening frame of the video.

WAN 2.6 — multi-shot sequences

WAN 2.6 adds a Single/Multi-shot selector. Multi-shot generates a sequence of connected clips from a single prompt, useful for longer narratives. A Video mode button also appears for image-to-video workflows.

Switch your model

Open the model selector in the top-left corner of Cinema Motion. WAN 2.2 shows a Trainable badge. Selecting any model immediately updates the controls in the bottom bar.

Start with LTX 2.3 for general use. Switch to the Veo series for 1080p or audio output, to WAN 2.2 for trained character consistency, and to Kling O1 when you need to edit an existing video.

Video generation models