Documentation Index
Fetch the complete documentation index at: https://docs.veri.studio/llms.txt
Use this file to discover all available pages before exploring further.
Create Deployment
Model to deploy. Either a training job ID (for checkpoints) or a HuggingFace model name.
source
string
default:"training_job"
Where the model comes from: training_job or huggingface.
Display name for the deployment endpoint.
GPU configuration with gpu_type and gpu_count.
Compute provider. Auto-selects if omitted.
Request
curl -X POST https://api.veri.studio/v1/deployments \
-H "Authorization: Bearer vk_your_api_key" \
-H "Content-Type: application/json" \
-d '{
"model": "job_abc123",
"source": "training_job",
"name": "my-math-model",
"gpu": {
"gpu_type": "H100",
"gpu_count": 1
}
}'
Response
{
"id": "dep_xyz789",
"object": "deployment",
"status": "queued",
"name": "my-math-model",
"model": "job_abc123",
"source": "training_job",
"endpoint_url": null,
"gpu": {
"type": "H100",
"count": 1
},
"provider": "in_memory",
"cost_per_hour_usd": null,
"total_cost_usd": 0,
"total_requests": 0,
"error": null,
"started_at": null,
"stopped_at": null,
"created_at": "2026-05-05T10:00:00Z",
"updated_at": "2026-05-05T10:00:00Z"
}
List Deployments
Filter by status: queued, provisioning, serving, unhealthy, stopped, failed.
Maximum number of deployments to return.
Request
curl "https://api.veri.studio/v1/deployments?status=serving&limit=10" \
-H "Authorization: Bearer vk_your_api_key"
Response
{
"object": "list",
"data": [
{
"id": "dep_xyz789",
"object": "deployment",
"status": "serving",
"name": "my-math-model",
"model": "job_abc123",
"source": "training_job",
"endpoint_url": "/v1/deployments/dep_xyz789/chat/completions",
"total_requests": 142,
"created_at": "2026-05-05T10:00:00Z",
"updated_at": "2026-05-05T12:30:00Z"
}
],
"has_more": false
}
Get Deployment
Request
curl https://api.veri.studio/v1/deployments/dep_xyz789 \
-H "Authorization: Bearer vk_your_api_key"
Stop Deployment
Stops the deployment, releases GPU resources, and performs final billing.
Request
curl -X POST https://api.veri.studio/v1/deployments/dep_xyz789/stop \
-H "Authorization: Bearer vk_your_api_key"
Response
{
"id": "dep_xyz789",
"object": "deployment",
"status": "stopped",
"stopped_at": "2026-05-05T14:00:00Z"
}
Only active deployments can be stopped. If the deployment is already stopped or failed, the status is returned as-is.
Chat Completions
Send OpenAI-compatible inference requests to a serving deployment.
Model name (matches the deployment name).
Array of message objects with role and content.
Maximum tokens to generate.
Request
curl -X POST https://api.veri.studio/v1/deployments/dep_xyz789/chat/completions \
-H "Authorization: Bearer vk_your_api_key" \
-H "Content-Type: application/json" \
-d '{
"model": "my-math-model",
"messages": [
{"role": "user", "content": "What is 15% of 240?"}
],
"temperature": 0.7
}'
Response
{
"id": "chatcmpl-abc12345",
"object": "chat.completion",
"created": 1714900000,
"model": "my-math-model",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "<reasoning>\n15% of 240 = 0.15 × 240 = 36\n</reasoning>\n<answer>36</answer>"
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 12,
"completion_tokens": 28,
"total_tokens": 40
}
}
The endpoint is OpenAI-compatible. You can use the OpenAI Python SDK by setting base_url to https://api.veri.studio/v1/deployments/DEPLOYMENT_ID.
List Deployment Requests
Returns the per-request log for a deployment.
Maximum number of requests to return.
Request
curl "https://api.veri.studio/v1/deployments/dep_xyz789/requests?limit=20" \
-H "Authorization: Bearer vk_your_api_key"
Response
{
"object": "list",
"data": [
{
"id": "req_abc123",
"object": "deployment_request",
"deployment_id": "dep_xyz789",
"prompt_tokens": 12,
"completion_tokens": 28,
"latency_ms": 145.2,
"status_code": 200,
"error_message": null,
"created_at": "2026-05-05T12:30:00Z"
}
],
"has_more": false
}
Get Deployment Metrics
Returns aggregate metrics for a deployment.
Request
curl https://api.veri.studio/v1/deployments/dep_xyz789/metrics \
-H "Authorization: Bearer vk_your_api_key"
Response
{
"total_requests": 142,
"total_prompt_tokens": 4260,
"total_completion_tokens": 8520,
"total_cost_usd": 2.45,
"avg_latency_ms": 132.5,
"error_rate": 0.007,
"uptime_seconds": 7200.0
}
Deployment Statuses
| Status | Description |
|---|
queued | Deployment created, waiting for GPU provisioning |
provisioning | GPU starting, model loading |
serving | Endpoint live, accepting inference requests |
unhealthy | Health check failed; will auto-recover or transition to failed |
stopped | Stopped by the user; GPU released |
failed | Deployment failed; check the error field |