Documentation Index
Fetch the complete documentation index at: https://docs.veri.studio/llms.txt
Use this file to discover all available pages before exploring further.
Upload Dataset
A human-readable name for the dataset (sent as a form field alongside the file).
A JSONL file where each line is a JSON object containing a prompt field.
Request
curl -X POST https://api.veri.studio/v1/datasets \
-H "Authorization: Bearer vk_your_api_key" \
-F "name=math_prompts" \
-F "file=@math_prompts.jsonl"
Response
{
"id": "ds_abc123",
"object": "dataset",
"name": "math_prompts",
"source_type": "upload",
"source_uri": null,
"huggingface_dataset": null,
"num_rows": 1500,
"created_at": "2026-04-14T12:00:00Z"
}
Connect Dataset
Display name for the connected dataset.
One of s3, gs, az, hf, postgres, mysql, snowflake, or bigquery.
Required for s3, gs, and az sources.
Optional Hugging Face config, including split, column_mapping, and token.
Required for SQL, Snowflake, and BigQuery-backed sources.
Required for database-backed sources.
Optional source credentials for validation. These are used to test the connection but are not stored — you will need to provide them again at training time if the source requires authentication.
Request
curl -X POST https://api.veri.studio/v1/datasets/connect \
-H "Authorization: Bearer vk_your_api_key" \
-H "Content-Type: application/json" \
-d '{
"name": "gsm8k-train",
"source_type": "hf",
"huggingface_dataset": "gsm8k",
"huggingface_config": {
"split": "train",
"column_mapping": {
"question": "prompt",
"answer": "label"
}
}
}'
Response
{
"id": "ds_hf123456789",
"object": "dataset",
"name": "gsm8k-train",
"source_type": "hf",
"source_uri": null,
"huggingface_dataset": "gsm8k",
"num_rows": null,
"created_at": "2026-04-21T12:00:00Z"
}
Validate Dataset Connection
This endpoint tests a source and returns preview metadata without creating a dataset row.
Request
curl -X POST https://api.veri.studio/v1/datasets/connect/validate \
-H "Authorization: Bearer vk_your_api_key" \
-H "Content-Type: application/json" \
-d '{
"source_type": "s3",
"source_uri": "s3://my-bucket/prompts.jsonl"
}'
Response
{
"valid": true,
"num_rows": 1500,
"columns": ["prompt", "expected_answer"],
"error": null
}
List Datasets
Maximum number of datasets to return.
Cursor for pagination. Pass the ID of the last item from the previous page.
Request
curl "https://api.veri.studio/v1/datasets?limit=10" \
-H "Authorization: Bearer vk_your_api_key"
Response
{
"object": "list",
"data": [
{
"id": "ds_abc123",
"object": "dataset",
"name": "math_prompts",
"source_type": "upload",
"source_uri": null,
"huggingface_dataset": null,
"num_rows": 1500,
"created_at": "2026-04-14T12:00:00Z"
},
{
"id": "ds_def456",
"object": "dataset",
"name": "gsm8k-train",
"source_type": "hf",
"source_uri": null,
"huggingface_dataset": "gsm8k",
"num_rows": null,
"created_at": "2026-04-21T12:00:00Z"
}
],
"has_more": false
}
Get Dataset
Request
curl https://api.veri.studio/v1/datasets/ds_abc123 \
-H "Authorization: Bearer vk_your_api_key"
Response
{
"id": "ds_abc123",
"object": "dataset",
"name": "math_prompts",
"source_type": "upload",
"source_uri": null,
"huggingface_dataset": null,
"num_rows": 1500,
"created_at": "2026-04-14T12:00:00Z"
}
Each uploaded JSONL row should be a JSON object with at least a prompt field:
{"prompt": "What is 15 * 23?"}
{"prompt": "Solve for x: 2x + 5 = 17"}
{"prompt": "Explain the chain rule in calculus."}
The prompt value can be either a string or a list of chat messages. Additional fields stay available to reward functions and data resolvers.
Connected datasets are resolved at job start, so validation is the best way to catch auth or schema issues early.