Object Data Models
The testing-api utilizes a structured data hierarchy to manage the lifecycle of AI model evaluation. Understanding these core entities—Test Cases, Sessions, and Predictions—is essential for integrating with the Supervised AI testing platform.
Test Case
A Test Case represents the foundational unit of testing. It defines the specific input provided to an AI model and the ground truth (expected output) used for validation.
Schema
| Field | Type | Description |
| :--- | :--- | :--- |
| id | string | A unique identifier for the test case. |
| input_data | object | The payload sent to the model (e.g., prompts, images, or structured data). |
| expected_output | any | The target value or ground truth to compare the model's prediction against. |
| tags | array<string> | Optional labels for filtering (e.g., ["regression", "edge-case"]). |
| metadata | object | Arbitrary key-value pairs for additional context (e.g., versioning). |
Example
{
"id": "tc_001",
"input_data": {
"prompt": "Translate 'Hello' to French"
},
"expected_output": "Bonjour",
"tags": ["translation", "v1-baseline"]
}
Session
A Session is a logical grouping of test executions. It tracks the context of a specific test run, such as which model version is being evaluated and the environment in which the tests are occurring.
Schema
| Field | Type | Description |
| :--- | :--- | :--- |
| session_id | string | Unique identifier for the test run. |
| model_id | string | The identifier of the AI model being tested. |
| environment | string | The stage of the lifecycle (e.g., development, staging, production). |
| status | enum | The current state: pending, running, completed, or failed. |
| created_at | ISO8601 | Timestamp of when the session was initialized. |
Example
{
"session_id": "sess_88291",
"model_id": "gpt-4-turbo",
"environment": "staging",
"status": "completed",
"created_at": "2023-10-27T10:00:00Z"
}
Prediction
A Prediction is the output generated by the model for a specific Test Case within a Session. It serves as the bridge between raw model output and the final evaluation metrics.
Schema
| Field | Type | Description |
| :--- | :--- | :--- |
| prediction_id | string | Unique identifier for this specific prediction entry. |
| test_case_id | string | Reference to the associated Test Case. |
| session_id | string | Reference to the parent Session. |
| actual_output | any | The raw output returned by the model. |
| latency_ms | integer | Time taken for the model to respond, in milliseconds. |
| metrics | object | Calculated scores (e.g., accuracy, f1_score) relative to the expected_output. |
Example
{
"prediction_id": "pred_4452",
"test_case_id": "tc_001",
"session_id": "sess_88291",
"actual_output": "Bonjour",
"latency_ms": 120,
"metrics": {
"exact_match": true,
"confidence": 0.98
}
}
Relationships
- Session (1:N) Prediction: A single session contains multiple predictions.
- Test Case (1:N) Prediction: A single test case can be reused across multiple sessions, resulting in multiple prediction records over time.