Core Entities

This section defines the fundamental data structures used within the Supervised AI testing-api. Understanding these entities is essential for configuring test suites, interpreting evaluation results, and integrating with the platform's automated testing workflows.

TestCase

The TestCase is the atomic unit of testing. It represents a single input scenario and its corresponding expected behavior or criteria.

Usage Example:

{
  "id": "tc_001",
  "input": {
    "prompt": "Summarize the following text..."
  },
  "expected_output": "The text discusses the rise of...",
  "metadata": {
    "feature_area": "summarization",
    "version": "1.2"
  }
}

Metric

A Metric defines how the performance of the AI model is measured for a given test execution. Metrics can be quantitative (e.g., latency) or qualitative (e.g., toxicity score).

Usage Example:

metric = Metric(
    name="response_latency",
    value=150.5,
    threshold=200.0,
    unit="ms"
)

TestResult

The TestResult captures the outcome of an individual TestCase execution. It aggregates the raw output from the AI model and the calculated metrics.

Usage Example:

{
  "test_case_id": "tc_001",
  "actual_output": "The provided text is summarized as...",
  "status": "PASS",
  "metrics": [
    { "name": "semantic_similarity", "value": 0.92 }
  ]
}

EvaluationRun

An EvaluationRun represents a complete execution of a test suite against a specific version of an AI model. It acts as a container for multiple TestResults.

Usage Example:

# Initiating a run via the API
run = testing_client.create_run(
    model_id="gpt-4-summarizer",
    dataset_id="prod_eval_set_v1",
    config={"temperature": 0.7}
)

Dataset Reference (Internal)

While datasets are managed via the platform UI or Storage API, the testing-api uses a DatasetReference to link test cases to a specific run. Note: This is an internal pointer used by the API to fetch batches of TestCases during execution and is typically not modified by the user directly.