Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.veri.studio/llms.txt

Use this file to discover all available pages before exploring further.

Overview

Veri supports supervised fine-tuning (SFT) for video generation models using LoRA via the sft_video_gen training method. This allows you to fine-tune supported video models on your own video datasets with parameter-efficient LoRA adapters.

Supported Models

Veri currently supports LoRA SFT for the following video generation model families:
  • CogVideoX (1.5B, 2B, 5B)
  • Wan2.1 (1.3B, 14B)
  • LTX Video (2B, 13B)
  • Mochi (1B)
Browse the full catalog in the model library and filter by the Video Gen tab.

How It Works

Video gen SFT uses diffusers LoRA training scripts via accelerate. Instead of a reward function, you provide a dataset of video examples and the training loop learns to match them directly. Key differences from GRPO:
GRPOVideo Gen SFT
Reward functionRequiredNot used
OutputFull model checkpointLoRA adapter
Training methodReinforcement learningSupervised imitation
Use caseLanguage model behaviorsVideo generation style/content

Hyperparameters

When creating a video gen SFT job, use method: "sft_video_gen" and configure these hyperparameters:
ParameterDefaultDescription
learning_rate1e-3Learning rate for the optimizer.
num_epochs30Number of training epochs.
max_stepsnullOptional explicit step cap.
lora_rank64LoRA adapter rank.
lora_alpha64LoRA scaling factor.
resolution_height480Video frame height in pixels.
resolution_width720Video frame width in pixels.
num_frames49Number of video frames per sample.
fps8Frames per second.
batch_size1Per-device batch size.
gradient_accumulation_steps4Gradient accumulation steps.
seed42Random seed.

Example Request

curl -X POST https://api.veri.studio/v1/training_jobs \
  -H "Authorization: Bearer vk_your_api_key" \
  -H "Content-Type: application/json" \
  -d '{
    "base_model": "THUDM/CogVideoX-2b",
    "dataset_id": "ds_abc123",
    "method": "sft_video_gen",
    "output_name": "cogvideo-2b-custom",
    "hyperparameters": {
      "learning_rate": 1e-3,
      "num_epochs": 30,
      "lora_rank": 64,
      "resolution_height": 480,
      "resolution_width": 720
    },
    "gpu": {
      "gpu_type": "A100-80GB",
      "gpu_count": 1
    }
  }'
Do not include a reward_function_id when using sft_video_gen. The API will reject the request if one is provided.

Output

Video gen SFT produces a LoRA adapter checkpoint rather than a full model. The adapter can be loaded with the diffusers library and applied to the base model for inference.

Next Steps