Documentation Index
Fetch the complete documentation index at: https://docs.veri.studio/llms.txt
Use this file to discover all available pages before exploring further.
Create Training Job
HuggingFace model ID (e.g.,
Qwen/Qwen3-4B, THUDM/CogVideoX-2b).ID of the uploaded dataset to train on.
ID of the uploaded reward function. Required for
grpo. Must be omitted for sft_video_gen.Training method:
grpo or sft_video_gen.Name for the output model checkpoint.
Training hyperparameters. The schema depends on the
method field.Explicit GPU configuration. Include both
type and count.Compute provider:
prime_intellect, lambda, runpod. Omit to auto-select the cheapest available.Optional final checkpoint target. Defaults to
{ "type": "veri" }.OpenReward environments to use as the reward signal. Mutually exclusive with
reward_function_id — pick one reward source. Each key is an environment ID, and the value contains configuration for that environment.Requires OpenReward integration to be configured first via
PUT /v1/settings/integrations/openreward.Request (GRPO)
Request (GRPO with OpenReward Environment)
When using
environments, omit reward_function_id. The environment provides the reward signal via OpenReward’s evaluate endpoint. You must configure your OpenReward API key first via Settings → Integrations.Request (Video Gen SFT)
Response
List Training Jobs
Maximum number of jobs to return.
Cursor for pagination. Pass the ID of the last item from the previous page.
Filter by job status:
queued, provisioning, running, completed, failed, cancelled.Filter by training method:
grpo, sft_video_gen.Request
Response
Get Training Job
Request
Response
Job Statuses
| Status | Description |
|---|---|
queued | Job row created and waiting to be submitted |
provisioning | Backend accepted the job and compute is starting |
running | Training is in progress |
completed | Training finished successfully; checkpoint is available |
failed | Training failed; check error.message and error.code |
cancelled | Job was cancelled by the user |
Get Job Events
Returns the lifecycle event history for a job.Maximum number of events to return.
Request
Response
Get Training Metrics
Returns sampled per-step training metrics (loss, reward, KL, learning rate, and any other numeric value emitted by the trainer). Metrics are sampled at ~5 second intervals — the worker throttles status callbacks, so adjacent steps within one window collapse to the most recent.Comma-separated list of metric keys to return (e.g.
loss,reward). When omitted, all available keys are returned.Lower bound (inclusive). Pass
latest_step + 1 from the previous response for incremental polling.Upper bound (inclusive).
Server-side stride downsample target. Long runs are reduced to at most this many evenly-spaced points.
Request
Response
available_keys always reflects every metric key persisted for the job, regardless of keys filtering. latest_step is the absolute maximum step regardless of any stride downsample, so clients can use it as the next from_step - 1 for polling.
Cancel Training Job
Request
Response
Only non-terminal jobs can be cancelled. If the job is already
completed, failed, or cancelled, the API returns 400.Stream Job Logs
Returns job logs as Server-Sent Events (SSE).Real-time log streaming from training is not yet implemented. Currently this endpoint returns a link to the provider dashboard where logs can be viewed. Full SSE streaming of training progress (step, loss, reward) is planned.
Request
Response (SSE stream)
Get Model Checkpoint
Returns a download URL for the trained model checkpoint. Only available for jobs withcompleted status.
Request
Response
The same URL is also surfaced as
download_url on completed job responses when checkpoint resolution succeeds.