Supervised Fine-Tuning - Veri Documentation

Overview

Veri supports supervised fine-tuning (SFT) for video generation models using LoRA via the sft_video_gen training method. This allows you to fine-tune supported video models on your own video datasets with parameter-efficient LoRA adapters.

Supported Models

Veri currently supports LoRA SFT for the following video generation model families:

CogVideoX (1.5B, 2B, 5B)
Wan2.1 (1.3B, 14B)
LTX Video (2B, 13B)
Mochi (1B)

Browse the full catalog in the model library and filter by the Video Gen tab.

How It Works

Video gen SFT uses diffusers LoRA training scripts via accelerate. Instead of a reward function, you provide a dataset of video examples and the training loop learns to match them directly. Key differences from GRPO:

	GRPO	Video Gen SFT
Reward function	Required	Not used
Output	Full model checkpoint	LoRA adapter
Training method	Reinforcement learning	Supervised imitation
Use case	Language model behaviors	Video generation style/content

Hyperparameters

When creating a video gen SFT job, use method: "sft_video_gen" and configure these hyperparameters:

Parameter	Default	Description
`learning_rate`	`1e-3`	Learning rate for the optimizer.
`num_epochs`	`30`	Number of training epochs.
`max_steps`	`null`	Optional explicit step cap.
`lora_rank`	`64`	LoRA adapter rank.
`lora_alpha`	`64`	LoRA scaling factor.
`resolution_height`	`480`	Video frame height in pixels.
`resolution_width`	`720`	Video frame width in pixels.
`num_frames`	`49`	Number of video frames per sample.
`fps`	`8`	Frames per second.
`batch_size`	`1`	Per-device batch size.
`gradient_accumulation_steps`	`4`	Gradient accumulation steps.
`seed`	`42`	Random seed.

Example Request

curl -X POST https://api.veri.studio/v1/training_jobs \
  -H "Authorization: Bearer vk_your_api_key" \
  -H "Content-Type: application/json" \
  -d '{
    "base_model": "THUDM/CogVideoX-2b",
    "dataset_id": "ds_abc123",
    "method": "sft_video_gen",
    "output_name": "cogvideo-2b-custom",
    "hyperparameters": {
      "learning_rate": 1e-3,
      "num_epochs": 30,
      "lora_rank": 64,
      "resolution_height": 480,
      "resolution_width": 720
    },
    "gpu": {
      "gpu_type": "A100-80GB",
      "gpu_count": 1
    }
  }'

Do not include a reward_function_id when using sft_video_gen. The API will reject the request if one is provided.

Output

Video gen SFT produces a LoRA adapter checkpoint rather than a full model. The adapter can be loaded with the diffusers library and applied to the base model for inference.

Next Steps

Datasets
GRPO Training for reward-based language model fine-tuning
Deployment for serving your fine-tuned model

Documentation Index

​Overview

​Supported Models

​How It Works

​Hyperparameters

​Example Request

​Output

​Next Steps

Overview

Supported Models

How It Works

Hyperparameters

Example Request

Output

Next Steps