Model Parameter Documentation

Text to Video

Cogvideox 5b

Required Arguments
Name Type Description
prompt string Text prompt to guide generation
model string "cogvideox-5b"
Optional Arguments
Name Type Default Value Description
negative_prompt string "" The prompt or prompts not to guide video generation. Ignored if guidance_scale is less than 1.
height int 480 The height in pixels of the generated video.
width int 720 The width in pixels of the generated video.
num_frames int 48 Number of frames to generate.
num_inference_steps int 50 The number of denoising steps. More steps can improve quality but are slower.
timesteps list Custom timesteps to use for the denoising process, must be in descending order.
guidance_scale float 7.0 Classifier-Free Diffusion guidance scale. Higher values align the video more closely with the prompt.
num_videos_per_prompt int 1 The number of videos to generate for each prompt.

Note: Tio Magic Animation Framework currently only supports 1 video output
generator torch.Generator A torch.Generator or List[torch.Generator] to make generation deterministic.
latents torch.FloatTensor Pre-generated noisy latents to be used as inputs for generation.
prompt_embeds torch.FloatTensor Pre-generated text embeddings, used as an alternative to the 'prompt' argument.
negative_prompt_embeds torch.FloatTensor Pre-generated negative text embeddings, used as an alternative to the 'negative_prompt' argument.
output_type str pil The output format of the generated video. Choose between 'pil' or 'np.array'.
return_dict bool True Whether to return a StableDiffusionXLPipelineOutput object instead of a plain tuple.
attention_kwargs dict A kwargs dictionary passed to the AttentionProcessor.
callback_on_step_end Callable A function called at the end of each denoising step during inference.
callback_on_step_end_tensor_inputs list The list of tensor inputs for the callback_on_step_end function.
max_sequence_length int 226 Maximum sequence length in the encoded prompt.

Pusa V1

Because PusaV1 is relatively new, there is no central location for its required and optional arguments. Based on its examples, we have added some parameters to the optional arguments section.

Required Arguments
Name Type Description
prompt string Text prompt to guide generation
model string "pusa-v1"
Optional Arguments
Name Type Default Value Description
negative_prompt string "" The prompt or prompts not to guide video generation.

Wan 2.1 Text to Video 14b

Required Arguments
Name Type Description
prompt string Text prompt to guide generation
model string "wan2.1-t2v-14b"
Optional Arguments
Name Type Default Value Description
negative_prompt string "" The prompt or prompts not to guide video generation. Ignored if guidance_scale is less than 1.
height int 480 The height in pixels of the generated video.
width int 832 The width in pixels of the generated video.
num_frames int 81 Number of frames in the generated video
num_inference_steps int 50 The number of denoising steps. More steps usually lead to higher quality at the expense of slower inference.
guidance_scale float 5.0 Guidance scale for classifier-free diffusion. Higher values encourage generation to be closely linked to the text prompt.
num_videos_per_prompt int 1 The number of videos to generate for each prompt.

Note: Tio Magic Animation Framework currently only supports 1 video output
generator torch.Generator A torch.Generator or List[torch.Generator] to make generation deterministic.
latents torch.FloatTensor Pre-generated noisy latents to be used as inputs for generation.
prompt_embeds torch.FloatTensor Pre-generated text embeddings, used as an alternative to the 'prompt' argument.
output_type str np The output format of the generated video. Choose between 'pil' or 'np.array'.
return_dict bool True Whether to return a WanPipelineOutput object instead of a plain tuple.
attention_kwargs dict A kwargs dictionary passed to the AttentionProcessor.
callback_on_step_end Callable A function called at the end of each denoising step during inference.
callback_on_step_end_tensor_inputs list The list of tensor inputs for the callback_on_step_end function.
max_sequence_length int 512 Maximum sequence length in the encoded prompt.

Wan 2.1 14b Text to Video FusionX

This is a LoRA applied on top of Wan 2.1 Vace 14b. All of the required arguments and optional arguments are the same as Wan 2.1 Vace 14b, except for the model string.

Required Arguments
Name Type Description
prompt string Text prompt to guide generation
model string "wan2.1-14b-t2v-fusionx"

Wan 2.1 Vace 14b

Required Arguments
Name Type Description
prompt string Text prompt to guide generation
model string "wan2.1-vace-14b"
Optional Arguments
Name Type Default Value Description
negative_prompt string "" The prompt or prompts not to guide video generation. Ignored if guidance_scale is less than 1.
height int 480 The height in pixels of the generated video.
width int 832 The width in pixels of the generated video.
conditioning_scale float 1.0 The scale applied to the control conditioning latent stream. Can be a float, List[float], or torch.Tensor.
num_frames int 81 Number of frames in the generated video
num_inference_steps int 50 The number of denoising steps. More steps usually lead to higher quality at the expense of slower inference.
guidance_scale float 5.0 Guidance scale for classifier-free diffusion. Higher values encourage generation to be closely linked to the text prompt.
num_videos_per_prompt int 1 The number of videos to generate for each prompt.

Note: Tio Magic Animation Framework currently only supports 1 video output
generator torch.Generator A torch.Generator or List[torch.Generator] to make generation deterministic.
latents torch.FloatTensor Pre-generated noisy latents to be used as inputs for generation.
prompt_embeds torch.FloatTensor Pre-generated text embeddings, used as an alternative to the 'prompt' argument.
output_type str np The output format of the generated video. Choose between 'pil' or 'np.array'.
return_dict bool True Whether to return a WanPipelineOutput object instead of a plain tuple.
attention_kwargs dict A kwargs dictionary passed to the AttentionProcessor.
callback_on_step_end Callable A function called at the end of each denoising step during inference.
callback_on_step_end_tensor_inputs list The list of tensor inputs for the callback_on_step_end function.
max_sequence_length int 512 Maximum sequence length in the encoded prompt.
flow_shift float 3.0 A value that estimates motion between two frames. A larger flow shift focuses on high motion or transformation. A smaller flow shift focuses on stability.

Wan 2.1 Vace 14b Phantom FusionX

This is a LoRA applied on top of Wan 2.1 Vace 14b. All of the required arguments and optional arguments are the same as Wan 2.1 Vace 14b, except for the model string.

Required Arguments
Name Type Description
prompt string Text prompt to guide generation
model string "wan2.1-vace-14b-phantom-fusionx"

Model Parameter Documentation

Image to Video

Cogvideox 5b Image to Video

Required Arguments
Name Type Description
prompt string Text prompt to guide generation
image string Local path or URL to input image.
Note: this model only supports 720 x 480 resolution. Unlike other model implementations, we do not autofix the video to be in the resolution of the given image.
model string "cogvideox-5b-image-to-video"
Optional Arguments
Name Type Default Value Description
negative_prompt string "" The prompt or prompts not to guide video generation. Ignored if guidance_scale is less than 1.
height int 480 The height in pixels of the generated video.
width int 720 The width in pixels of the generated video.
num_frames int 48 Number of frames to generate.
num_inference_steps int 50 The number of denoising steps. More steps can improve quality but are slower.
timesteps list Custom timesteps to use for the denoising process, must be in descending order.
guidance_scale float 7.0 Classifier-Free Diffusion guidance scale. Higher values align the video more closely with the prompt.
num_videos_per_prompt int 1 The number of videos to generate for each prompt.

Note: Tio Magic Animation Framework currently only supports 1 video output
generator torch.Generator A torch.Generator or List[torch.Generator] to make generation deterministic.
latents torch.FloatTensor Pre-generated noisy latents to be used as inputs for generation.
prompt_embeds torch.FloatTensor Pre-generated text embeddings, used as an alternative to the 'prompt' argument.
negative_prompt_embeds torch.FloatTensor Pre-generated negative text embeddings, used as an alternative to the 'negative_prompt' argument.
output_type str pil The output format of the generated video. Choose between 'pil' or 'np.array'.
return_dict bool True Whether to return a StableDiffusionXLPipelineOutput object instead of a plain tuple.
attention_kwargs dict A kwargs dictionary passed to the AttentionProcessor.
callback_on_step_end Callable A function called at the end of each denoising step during inference.
callback_on_step_end_tensor_inputs list The list of tensor inputs for the callback_on_step_end function.
max_sequence_length int 226 Maximum sequence length in the encoded prompt.

Framepack I2V HY

Required Arguments
Name Type Description
prompt string Text prompt to guide generation
image string Local path or URL to input image.
model string "framepack-i2v-hy"
Optional Arguments
Name Type Default Value Description
prompt_2 string "" A secondary prompt for the second text encoder; defaults to the main prompt if not provided.
negative_prompt string "" The prompt or prompts not to guide video generation. Ignored if guidance_scale is less than 1.
negative_prompt2 string "" A secondary negative prompt for the second text encoder.
height int 720 The height in pixels of the generated video.
width int 1280 The width in pixels of the generated video.
num_frames int 129 Number of frames to generate.
num_inference_steps int 50 The number of denoising steps. More steps can improve quality but are slower.
sigmas list Custom sigmas for the denoising scheduler.
true_cfg_scale float 1.0 Enables true classifier-free guidance when > 1.0.
guidance_scale float 6.0 Guidance scale to control how closely the video adheres to the prompt.
num_videos_per_prompt int 1 The number of videos to generate for each prompt.

Note: Tio Magic Animation Framework currently only supports 1 video output
generator torch.Generator A torch.Generator or List[torch.Generator] to make generation deterministic.
image_latents torch.Tensor Pre-encoded image latents, bypassing the VAE for the first image.
last_image_latents torch.Tensor Pre-encoded image latents, bypassing the VAE for the last image.
prompt_embeds torch.Tensor Pre-generated text embeddings, an alternative to 'prompt'.
pooled_prompt_embeds torch.FloatTensor Pre-generated pooled text embeddings.
negative_prompt_embeds torch.FloatTensor Pre-generated negative text embeddings, an alternative to 'negative_prompt'.
output_type str pil The output format of the generated video. Choose between 'pil' or 'np.array'.
return_dict bool True Whether to return a HunyuanVideoFramepackPipelineOutput object instead of a plain tuple.
attention_kwargs dict A kwargs dictionary passed to the AttentionProcessor.
clip_skip int Number of final layers to skip from the CLIP model.
callback_on_step_end Callable A function called at the end of each denoising step during inference.
callback_on_step_end_tensor_inputs list The list of tensor inputs for the callback_on_step_end function.

LTX Video

Required Arguments
Name Type Description
prompt string Text prompt to guide generation
image string Local path or URL to input image.
model string "ltx-video"
Optional Arguments
Name Type Default Value Description
negative_prompt string "" The prompt to avoid during video generation.
height int 512 The height in pixels of the generated video.
width int 704 The width in pixels of the generated video.
num_frames int 161 Number of frames to generate.
num_inference_steps int 50 The number of denoising steps. More steps can improve quality but are slower.
timesteps list Custom timesteps for the denoising process in descending order.
guidance_scale float 3.0 Scale for classifier-free guidance.
num_videos_per_prompt int 1 The number of videos to generate for each prompt.

Note: Tio Magic Animation Framework currently only supports 1 video output
generator torch.Generator A torch.Generator to make generation deterministic.
latents torch.Tensor Pre-generated noisy latents.
prompt_embeds torch.Tensor Pre-generated text embeddings, an alternative to 'prompt'.
promt_attension_mask torch.Tensor Pre-generated attention mask for text embeddings.
negative_prompt_embeds torch.FloatTensor Pre-generated negative text embeddings.
negative_prompt_attension_mask torch.FloatTensor Pre-generated attention mask for negative text embeddings.
decode_timestep float 0.0 The timestep at which the generated video is decoded.
decode_noise_scale float None Interpolation factor between random noise and denoised latents at decode time.
output_type str pil The output format of the generated video. Choose between 'pil' or 'np.array'.
return_dict bool True Whether to return a LTXPipelineOutput object instead of a plain tuple.
attention_kwargs dict A kwargs dictionary passed to the AttentionProcessor.
callback_on_step_end Callable A function called at the end of each denoising step during inference.
callback_on_step_end_tensor_inputs list The list of tensor inputs for the callback_on_step_end function.
max_sequence_length int 128 Maximum sequence length for the prompt.

Luma Ray 2

Required Arguments
Name Type Description
prompt string Text prompt to guide generation
image string URL to input image. Note that Luma does not accept local files.
model string "luma-ray-2"

Pusa V1

Because PusaV1 is relatively new, there is no central location for its required and optional arguments. Based on its examples, we have added some parameters to the optional arguments section.

Required Arguments
Name Type Description
prompt string Text prompt to guide generation
image string Local path or URL to input image.
model string "pusa-v1"
Optional Arguments
Name Type Default Value Description
negative_prompt string "" The prompt or prompts not to guide video generation. Ignored if guidance_scale is less than 1.
cond_position str "0" Comma-separated list of frame indices for conditioning. You can use any position from 0 to 20.
noise_multipliers str "0.0" Comma-separated noise multipliers for conditioning frames. A value of 0 means the condition image is used as totally clean, higher value means adding more noise.

For I2V, you can use 0.2 or any from 0 to 1.

For Start-End-Frame, you can use 0.2,0.4, or any from 0 to 1.
lora_alpha float 1.0 A bigger alpha would bring more temporal consistency (i.e., make generated frames more like the conditioning part), but may also cause small motion or even collapse. We recommend using a value around 1 to 2.
num_inference_steps int 30 The number of denoising steps. More steps can improve quality but are slower.
num_frames int 81

Veo 2.0 Generate 001

Required Arguments
Name Type Description
prompt string Text prompt to guide generation
image string Local path or URL to input image.
model string "veo-2.0-generate-002"
Optional Arguments
Name Type Default Value Description
negativePrompt string "" Text string that describes anything you want to discourage the model from generating
aspectRatio str "16:9" Defines the aspect ratio of the generated videos. Accepts '16:9' (landscape) or '9:16' (portrait).
personGeneration str "allow_adult" Controls whether people or face generation is allowed. Accepts 'allow_adult' or 'disallow'.
numberOfVideos int 1 The number of videos to generate for each prompt.

Note: Tio Magic Animation Framework currently only supports 1 video output
durationSeconds int 8 Veo 2 only. Length of each output video in seconds, between 5 and 8

Wan 2.1 I2V 14b 720p

Required Arguments
Name Type Description
prompt string Text prompt to guide generation
image string Local path or URL to input image.
model string "wan2.1-i2v-14b-720p"
Optional Arguments
Name Type Default Value Description
negative_prompt string "" The prompt or prompts not to guide video generation. Ignored if guidance_scale is less than 1.
height int 480 The height in pixels of the generated video.
width int 832 The width in pixels of the generated video.
conditioning_scale float 1.0 The scale applied to the control conditioning latent stream. Can be a float, List[float], or torch.Tensor.
num_frames int 81 Number of frames in the generated video
num_inference_steps int 50 The number of denoising steps. More steps usually lead to higher quality at the expense of slower inference.
guidance_scale float 5.0 Guidance scale for classifier-free diffusion. Higher values encourage generation to be closely linked to the text prompt.
num_videos_per_prompt int 1 The number of videos to generate for each prompt.

Note: Tio Magic Animation Framework currently only supports 1 video output
generator torch.Generator A torch.Generator or List[torch.Generator] to make generation deterministic.
latents torch.FloatTensor Pre-generated noisy latents to be used as inputs for generation.
prompt_embeds torch.FloatTensor Pre-generated text embeddings, used as an alternative to the 'prompt' argument.
negative_prompt_embeds torch.Tensor Pre-generated negative text embeddings, used as an alternative to the 'negative_prompt' argument.
image_embeds torch.Tensor Pre-generated image embeddings, used as an alternative to the 'image' argument.
output_type str np The output format of the generated video. Choose between 'pil' or 'np.array'.
return_dict bool True Whether to return a WanPipelineOutput object instead of a plain tuple.
attention_kwargs dict A kwargs dictionary passed to the AttentionProcessor.
callback_on_step_end Callable A function called at the end of each denoising step during inference.
callback_on_step_end_tensor_inputs list The list of tensor inputs for the callback_on_step_end function.
max_sequence_length int 512 Maximum sequence length in the encoded prompt.

Wan 2.1 Vace 14b

Required Arguments
Name Type Description
prompt string Text prompt to guide generation
image string Local path or URL to input image.
model string "wan2.1-vace-14b"
Optional Arguments
Name Type Default Value Description
negative_prompt string "" The prompt or prompts not to guide video generation. Ignored if guidance_scale is less than 1.
video list The input video (List[PIL.Image.Image]) to be used as a starting point for the generation.

Note: this is created in _process_payload for you.
mask list The input mask (List[PIL.Image.Image]) that defines which video regions to condition on (black) and which to generate (white).

Note: this is created in process_payload for you.
reference_images list A list of one or more reference images (List[PIL.Image.Image]) as extra conditioning for the generation.
height int 480 The height in pixels of the generated video.
width int 832 The width in pixels of the generated video.
num_frames int 81 Number of frames in the generated video
num_inference_steps int 50 The number of denoising steps. More steps usually lead to higher quality at the expense of slower inference.
guidance_scale float 5.0 Guidance scale for classifier-free diffusion. Higher values encourage generation to be closely linked to the text prompt.
num_videos_per_prompt int 1 The number of videos to generate for each prompt.

Note: Tio Magic Animation Framework currently only supports 1 video output
generator torch.Generator A torch.Generator or List[torch.Generator] to make generation deterministic.
latents torch.FloatTensor Pre-generated noisy latents to be used as inputs for generation.
prompt_embeds torch.FloatTensor Pre-generated text embeddings, used as an alternative to the 'prompt' argument.
output_type str np The output format of the generated video. Choose between 'pil' or 'np.array'.
return_dict bool True Whether to return a WanPipelineOutput object instead of a plain tuple.
attention_kwargs dict A kwargs dictionary passed to the AttentionProcessor.
callback_on_step_end Callable A function called at the end of each denoising step during inference.
callback_on_step_end_tensor_inputs list The list of tensor inputs for the callback_on_step_end function.
max_sequence_length int 512 Maximum sequence length in the encoded prompt.
flow_shift float 5.0 A value that estimates motion between two frames. A larger flow shift focuses on high motion or transformation. A smaller flow shift focuses on stability.

Wan 2.1 Vace 14b I2V FusionX

This is a LoRA applied on top of Wan 2.1 Vace 14b. All of the required arguments and optional arguments are the same as Wan 2.1 Vace 14b, except for the model string.

Required Arguments
Name Type Description
prompt string Text prompt to guide generation
image string Local path or URL to input image.
model string "wan2.1-vace-14b-i2v-fusionx"

Interpolate

Wan 2.1 Flf2v 14B 720p

Required Arguments
Name Type Description
prompt string Text prompt to guide generation
first_frame string Local path or URL to first frame image.
last_frame string Local path or URL to last frame image
model string "wan2.1-flf2v-14b-720p"
Optional Arguments
Name Type Default Value Description
negative_prompt string "" The prompt or prompts not to guide video generation. Ignored if guidance_scale is less than 1.
height int 480 The height in pixels of the generated video.
width int 832 The width in pixels of the generated video.
num_frames int 81 Number of frames in the generated video
num_inference_steps int 50 The number of denoising steps. More steps usually lead to higher quality at the expense of slower inference.
guidance_scale float 5.0 Guidance scale for classifier-free diffusion. Higher values encourage generation to be closely linked to the text prompt.
num_videos_per_prompt int 1 The number of videos to generate for each prompt.

Note: Tio Magic Animation Framework currently only supports 1 video output
generator torch.Generator A torch.Generator or List[torch.Generator] to make generation deterministic.
latents torch.FloatTensor Pre-generated noisy latents to be used as inputs for generation.
prompt_embeds torch.FloatTensor Pre-generated text embeddings, used as an alternative to the 'prompt' argument.
negative_prompt_embeds torch.Tensor Pre-generated negative text embeddings, used as an alternative to the 'negative_prompt' argument.
image_embeds torch.Tensor Pre-generated image embeddings, used as an alternative to the 'image' argument.
output_type str np The output format of the generated video. Choose between 'pil' or 'np.array'.
return_dict bool True Whether to return a WanPipelineOutput object instead of a plain tuple.
attention_kwargs dict A kwargs dictionary passed to the AttentionProcessor.
callback_on_step_end Callable A function called at the end of each denoising step during inference.
callback_on_step_end_tensor_inputs list The list of tensor inputs for the callback_on_step_end function.
max_sequence_length int 512 Maximum sequence length in the encoded prompt.

Wan 2.1 Vace 14b

Required Arguments
Name Type Description
prompt string Text prompt to guide generation
first_frame string Local path or URL to first frame image.
last_frame string Local path or URL to last frame image
model string "wan2.1-vace-14b"
Optional Arguments
Name Type Default Value Description
negative_prompt string "" The prompt or prompts not to guide video generation. Ignored if guidance_scale is less than 1.
video list The input video (List[PIL.Image.Image]) to be used as a starting point for the generation.

Note: this is created in _process_payload for you.
mask list The input mask (List[PIL.Image.Image]) that defines which video regions to condition on (black) and which to generate (white).

Note: this is created in process_payload for you.
reference_images list A list of one or more reference images (List[PIL.Image.Image]) as extra conditioning for the generation.
conditioning_scale float 1.0 The scale applied to the control conditioning latent stream. Can be a float, List[float], or torch.Tensor.
height int 480 The height in pixels of the generated video.
width int 832 The width in pixels of the generated video.
num_frames int 81 Number of frames in the generated video
num_inference_steps int 50 The number of denoising steps. More steps usually lead to higher quality at the expense of slower inference.
guidance_scale float 5.0 Guidance scale for classifier-free diffusion. Higher values encourage generation to be closely linked to the text prompt.
num_videos_per_prompt int 1 The number of videos to generate for each prompt.

Note: Tio Magic Animation Framework currently only supports 1 video output
generator torch.Generator A torch.Generator or List[torch.Generator] to make generation deterministic.
latents torch.FloatTensor Pre-generated noisy latents to be used as inputs for generation.
prompt_embeds torch.FloatTensor Pre-generated text embeddings, used as an alternative to the 'prompt' argument.
output_type str np The output format of the generated video. Choose between 'pil' or 'np.array'.
return_dict bool True Whether to return a WanPipelineOutput object instead of a plain tuple.
attention_kwargs dict A kwargs dictionary passed to the AttentionProcessor.
callback_on_step_end Callable A function called at the end of each denoising step during inference.
callback_on_step_end_tensor_inputs list The list of tensor inputs for the callback_on_step_end function.
max_sequence_length int 512 Maximum sequence length in the encoded prompt.
flow_shift float 5.0 A value that estimates motion between two frames. A larger flow shift focuses on high motion or transformation. A smaller flow shift focuses on stability.

Framepack I2V HY

Required Arguments
Name Type Description
prompt string Text prompt to guide generation
first_frame string Local path or URL to first frame image.
last_frame string Local path or URL to last frame image
model string "framepack-i2v-hy"
Optional Arguments
Name Type Default Value Description
prompt_2 string "" A secondary prompt for the second text encoder; defaults to the main prompt if not provided.
negative_prompt string "" The prompt or prompts not to guide video generation. Ignored if guidance_scale is less than 1.
negative_prompt2 string "" A secondary negative prompt for the second text encoder.
height int 720 The height in pixels of the generated video.
width int 1280 The width in pixels of the generated video.
num_frames int 129 Number of frames to generate.
num_inference_steps int 50 The number of denoising steps. More steps can improve quality but are slower.
sigmas list Custom sigmas for the denoising scheduler.
true_cfg_scale float 1.0 Enables true classifier-free guidance when > 1.0.
guidance_scale float 6.0 Guidance scale to control how closely the video adheres to the prompt.
num_videos_per_prompt int 1 The number of videos to generate for each prompt.

Note: Tio Magic Animation Framework currently only supports 1 video output
generator torch.Generator A torch.Generator or List[torch.Generator] to make generation deterministic.
image_latents torch.Tensor Pre-encoded image latents, bypassing the VAE for the first image.
last_image_latents torch.Tensor Pre-encoded image latents, bypassing the VAE for the last image.
prompt_embeds torch.Tensor Pre-generated text embeddings, an alternative to 'prompt'.
pooled_prompt_embeds torch.FloatTensor Pre-generated pooled text embeddings.
negative_prompt_embeds torch.FloatTensor Pre-generated negative text embeddings, an alternative to 'negative_prompt'.
output_type str pil The output format of the generated video. Choose between 'pil' or 'np.array'.
return_dict bool True Whether to return a HunyuanVideoFramepackPipelineOutput object instead of a plain tuple.
attention_kwargs dict A kwargs dictionary passed to the AttentionProcessor.
clip_skip int Number of final layers to skip from the CLIP model.
callback_on_step_end Callable A function called at the end of each denoising step during inference.
callback_on_step_end_tensor_inputs list The list of tensor inputs for the callback_on_step_end function.

Luma Ray 2

Required Arguments
Name Type Description
prompt string Text prompt to guide generation
first_frame string URL to first frame image.
last_frame string URL to last frame image
model string "luma-ray-2"

Pose Guidance

Wan 2.1 Vace 14b

Required Arguments
Name Type Description
prompt string Text prompt to guide generation
image string Path or URL to input image
model string "wan2.1-vace-14b"
Optional Arguments
Name Type Default Value Description
guiding_video string A video to guide the pose of the output video. If provided, a pose_video will be generated for the output video (List[PIL.Image.Image])
pose_video string A pose skeleton video to guide the pose of the output video (List[PIL.Image.Image])
negative_prompt string "" The prompt or prompts not to guide video generation. Ignored if guidance_scale is less than 1.
video list The input video (List[PIL.Image.Image]) to be used as a starting point for the generation.

Note: this is created in _process_payload for you.
mask list The input mask (List[PIL.Image.Image]) that defines which video regions to condition on (black) and which to generate (white).

Note: this is created in process_payload for you.
reference_images list A list of one or more reference images (List[PIL.Image.Image]) as extra conditioning for the generation.
conditioning_scale float 1.0 The scale applied to the control conditioning latent stream. Can be a float, List[float], or torch.Tensor.
height int 480 The height in pixels of the generated video.
width int 832 The width in pixels of the generated video.
num_frames int 81 Number of frames in the generated video
num_inference_steps int 50 The number of denoising steps. More steps usually lead to higher quality at the expense of slower inference.
guidance_scale float 5.0 Guidance scale for classifier-free diffusion. Higher values encourage generation to be closely linked to the text prompt.
num_videos_per_prompt int 1 The number of videos to generate for each prompt.

Note: Tio Magic Animation Framework currently only supports 1 video output
generator torch.Generator A torch.Generator or List[torch.Generator] to make generation deterministic.
latents torch.FloatTensor Pre-generated noisy latents to be used as inputs for generation.
prompt_embeds torch.FloatTensor Pre-generated text embeddings, used as an alternative to the 'prompt' argument.
output_type str np The output format of the generated video. Choose between 'pil' or 'np.array'.
return_dict bool True Whether to return a WanPipelineOutput object instead of a plain tuple.
attention_kwargs dict A kwargs dictionary passed to the AttentionProcessor.
callback_on_step_end Callable A function called at the end of each denoising step during inference.
callback_on_step_end_tensor_inputs list The list of tensor inputs for the callback_on_step_end function.
max_sequence_length int 512 Maximum sequence length in the encoded prompt.
flow_shift float 3.0 A value that estimates motion between two frames. A larger flow shift focuses on high motion or transformation. A smaller flow shift focuses on stability.