Object Type Definitions

Core Object Types

This section defines the data structures used by the Supervised AI Testing API. These objects represent the fundamental entities you will interact with when managing test suites, executing evaluations, and analyzing model performance.

TestSet

A TestSet is a logical container for a collection of test cases. It is used to organize testing data by version, task type (e.g., summarization, classification), or specific model requirements.

Example Usage:

{
  "id": "ts_88219",
  "name": "LLM Hallucination Benchmark",
  "description": "Tests for factual consistency in RAG pipelines.",
  "version": "1.2.0",
  "created_at": "2023-11-15T10:30:00Z"
}

TestCase

The TestCase is the granular unit of data within a TestSet. It contains the input parameters provided to the model and the ground truth (if applicable) used for evaluation.

Example Usage:

{
  "id": "tc_001",
  "inputs": {
    "prompt": "What is the capital of France?",
    "context": "Geography quiz data."
  },
  "expected_output": "The capital of France is Paris.",
  "metadata": {
    "category": "factual_recall"
  }
}

EvaluationRun

An EvaluationRun represents a specific execution instance where a model's outputs are generated and scored against a TestSet.

Example Usage:

{
  "run_id": "run_9942",
  "model_id": "gpt-4-turbo",
  "status": "COMPLETED",
  "summary": {
    "accuracy": 0.94,
    "avg_latency_ms": 450
  }
}

MetricResult

A MetricResult provides the specific outcome of a single evaluation metric applied to a single TestCase output.

Example Usage:

{
  "metric_name": "exact_match",
  "score": 1.0,
  "reasoning": "The model output matched the expected output string perfectly.",
  "status": "PASS"
}

ModelOutput

This object captures the raw response from the target model before it is processed by the evaluation engine.

Example Usage:

{
  "raw_response": "Paris is the capital.",
  "latency": 0.82,
  "tokens_used": 15
}