Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.veri.studio/llms.txt

Use this file to discover all available pages before exploring further.

Create Deployment

model
string
required
Model to deploy. Either a training job ID (for checkpoints) or a HuggingFace model name.
source
string
default:"training_job"
Where the model comes from: training_job or huggingface.
name
string
required
Display name for the deployment endpoint.
gpu
object
required
GPU configuration with gpu_type and gpu_count.
provider
string
Compute provider. Auto-selects if omitted.

Request

curl -X POST https://api.veri.studio/v1/deployments \
  -H "Authorization: Bearer vk_your_api_key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "job_abc123",
    "source": "training_job",
    "name": "my-math-model",
    "gpu": {
      "gpu_type": "H100",
      "gpu_count": 1
    }
  }'

Response

{
  "id": "dep_xyz789",
  "object": "deployment",
  "status": "queued",
  "name": "my-math-model",
  "model": "job_abc123",
  "source": "training_job",
  "endpoint_url": null,
  "gpu": {
    "type": "H100",
    "count": 1
  },
  "provider": "in_memory",
  "cost_per_hour_usd": null,
  "total_cost_usd": 0,
  "total_requests": 0,
  "error": null,
  "started_at": null,
  "stopped_at": null,
  "created_at": "2026-05-05T10:00:00Z",
  "updated_at": "2026-05-05T10:00:00Z"
}

List Deployments

status
string
Filter by status: queued, provisioning, serving, unhealthy, stopped, failed.
limit
integer
default:"50"
Maximum number of deployments to return.
after
string
Cursor for pagination.

Request

curl "https://api.veri.studio/v1/deployments?status=serving&limit=10" \
  -H "Authorization: Bearer vk_your_api_key"

Response

{
  "object": "list",
  "data": [
    {
      "id": "dep_xyz789",
      "object": "deployment",
      "status": "serving",
      "name": "my-math-model",
      "model": "job_abc123",
      "source": "training_job",
      "endpoint_url": "/v1/deployments/dep_xyz789/chat/completions",
      "total_requests": 142,
      "created_at": "2026-05-05T10:00:00Z",
      "updated_at": "2026-05-05T12:30:00Z"
    }
  ],
  "has_more": false
}

Get Deployment

Request

curl https://api.veri.studio/v1/deployments/dep_xyz789 \
  -H "Authorization: Bearer vk_your_api_key"

Stop Deployment

Stops the deployment, releases GPU resources, and performs final billing.

Request

curl -X POST https://api.veri.studio/v1/deployments/dep_xyz789/stop \
  -H "Authorization: Bearer vk_your_api_key"

Response

{
  "id": "dep_xyz789",
  "object": "deployment",
  "status": "stopped",
  "stopped_at": "2026-05-05T14:00:00Z"
}
Only active deployments can be stopped. If the deployment is already stopped or failed, the status is returned as-is.

Chat Completions

Send OpenAI-compatible inference requests to a serving deployment.
model
string
required
Model name (matches the deployment name).
messages
array
required
Array of message objects with role and content.
temperature
float
default:"1.0"
Sampling temperature.
max_tokens
integer
Maximum tokens to generate.

Request

curl -X POST https://api.veri.studio/v1/deployments/dep_xyz789/chat/completions \
  -H "Authorization: Bearer vk_your_api_key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "my-math-model",
    "messages": [
      {"role": "user", "content": "What is 15% of 240?"}
    ],
    "temperature": 0.7
  }'

Response

{
  "id": "chatcmpl-abc12345",
  "object": "chat.completion",
  "created": 1714900000,
  "model": "my-math-model",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "<reasoning>\n15% of 240 = 0.15 × 240 = 36\n</reasoning>\n<answer>36</answer>"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 12,
    "completion_tokens": 28,
    "total_tokens": 40
  }
}
The endpoint is OpenAI-compatible. You can use the OpenAI Python SDK by setting base_url to https://api.veri.studio/v1/deployments/DEPLOYMENT_ID.

List Deployment Requests

Returns the per-request log for a deployment.
limit
integer
default:"50"
Maximum number of requests to return.
after
string
Cursor for pagination.

Request

curl "https://api.veri.studio/v1/deployments/dep_xyz789/requests?limit=20" \
  -H "Authorization: Bearer vk_your_api_key"

Response

{
  "object": "list",
  "data": [
    {
      "id": "req_abc123",
      "object": "deployment_request",
      "deployment_id": "dep_xyz789",
      "prompt_tokens": 12,
      "completion_tokens": 28,
      "latency_ms": 145.2,
      "status_code": 200,
      "error_message": null,
      "created_at": "2026-05-05T12:30:00Z"
    }
  ],
  "has_more": false
}

Get Deployment Metrics

Returns aggregate metrics for a deployment.

Request

curl https://api.veri.studio/v1/deployments/dep_xyz789/metrics \
  -H "Authorization: Bearer vk_your_api_key"

Response

{
  "total_requests": 142,
  "total_prompt_tokens": 4260,
  "total_completion_tokens": 8520,
  "total_cost_usd": 2.45,
  "avg_latency_ms": 132.5,
  "error_rate": 0.007,
  "uptime_seconds": 7200.0
}

Deployment Statuses

StatusDescription
queuedDeployment created, waiting for GPU provisioning
provisioningGPU starting, model loading
servingEndpoint live, accepting inference requests
unhealthyHealth check failed; will auto-recover or transition to failed
stoppedStopped by the user; GPU released
failedDeployment failed; check the error field