Documentation Index
Fetch the complete documentation index at: https://docs.veri.studio/llms.txt
Use this file to discover all available pages before exploring further.
Overview
Veri datasets can come from two places:- direct JSONL uploads stored in Veri-managed S3
- connected external sources that are resolved into JSONL-like rows when a job starts
Supported Sources
upload- JSONL file uploaded to Veris3-s3://bucket/key.jsonlgs-gs://bucket/key.jsonlaz-az://container/blob.jsonlhf- Hugging Face dataset ID plus optional configpostgresmysqlsnowflakebigquery
Upload Format
Each line must be a valid JSON object with at least aprompt field:
label, expected_answer, or metadata stay attached to the row so reward functions can use them.
Requirements
- File format: JSONL (one JSON object per line)
- Required field:
prompt - Prompt type: string or chat-style message array
- Encoding: UTF-8
Preparing Your Dataset
Collect prompts
Gather prompts that represent the task you want the model to learn. Quality and diversity of prompts matter more than quantity.
Tips
- Diverse prompts: Include a range of difficulty levels and problem types to train a more robust model.
- Enough data: A few hundred prompts can work, but 1,000+ is generally better for stable training.
- Clean data: Remove duplicates, empty prompts, and malformed entries before uploading.
- Map columns explicitly: For external sources, use Hugging Face column mapping or SQL aliases to normalize rows into the fields your reward function expects.
- Match your task: Prompts should closely reflect the types of queries you expect at inference time.