Schema Design Principles

Core Philosophy

The testing-api schema is built to serve as the backbone for the Supervised AI platform's evaluation and benchmarking suite. Its primary objective is to bridge the gap between human-led quality assurance and automated AI evaluation. To achieve this, the schema follows three primary pillars: Strict Consistency, Semantic Clarity, and Machine Interpretability.

1. Semantic AI-Readability

Since this API often feeds into LLM-based evaluators or automated grading scripts, the schema avoids cryptic abbreviations. Keys are descriptive and context-rich to ensure that an AI model can parse the intent of a field without additional documentation.

Descriptive Keys: Use expected_output instead of exp_out.
Contextual Metadata: Every test object includes a metadata block to store environmental variables (model version, temperature, prompt ID) that influence the outcome.

2. Structural Consistency

To ensure seamless integration across different modules of the Supervised AI platform, the schema enforces a predictable hierarchy. This consistency allows developers to write generic parsers that work across various test suites.

Flat Hierarchies: Where possible, we favor flat structures over deeply nested objects. This reduces parsing complexity and makes it easier for AI models to attend to all relevant fields.
Uniform Error Envelopes: All validation failures and API errors follow a standardized format, providing a code, message, and path to the offending field.

3. Strong Typing and Validation

The schema utilizes strict typing to prevent "hallucinations" in data entry and to ensure that programmatic evaluations are performed on valid data sets.

Enum Enforcement: Fields such as status or evaluation_method use strict Enums to prevent data fragmentation.
Strict Null Handling: We explicitly define which fields are nullable. In a testing context, an empty string is often different from a null value (missing data), and our schema respects this distinction.

4. Extensibility via Metadata

While the core schema is rigid to ensure stability, we provide a custom_properties or metadata object in every primary entity. This allows users to attach platform-specific or experiment-specific data without breaking the standard API interface.

{
  "test_id": "eval_01HGP",
  "input_data": {
    "prompt": "Summarize the following text...",
    "context": "..."
  },
  "expected_output": "A concise three-sentence summary.",
  "metadata": {
    "version": "1.0.4",
    "tags": ["regression", "summarization-v2"],
    "model_parameters": {
      "temperature": 0.7,
      "top_p": 1.0
    }
  }
}

5. Versioning and Compatibility

The schema version is included in the payload or the request header. We follow semantic versioning for the API structure:

Major updates: Introduce breaking changes to the field requirements.
Minor updates: Add optional fields that enhance AI-readability or provide more context.
Patch updates: Clarify documentation or refine validation regex without changing the data model.