Data Modeling
Overview of Data Entities
In the Supervised AI platform, data modeling is centered around three primary entities: Test Cases, Supervision Results, and Outcomes. The API follows a structured schema to ensure that AI model evaluations are consistent, measurable, and traceable across different versions and environments.
Modeling Supervision Results
A SupervisionResult represents the granular data captured when an AI model's output is evaluated—either by an automated system or a human supervisor. This entity bridges the gap between raw model predictions and the final quality score.
Schema Definition
| Field | Type | Description |
| :--- | :--- | :--- |
| result_id | string (UUID) | Unique identifier for the supervision event. |
| model_version | string | The specific version of the AI model being tested. |
| input_payload | object | The original data sent to the model. |
| model_output | object | The raw response generated by the model. |
| score | float | A normalized value (0.0 to 1.0) indicating accuracy or quality. |
| feedback | string | Optional qualitative notes from the supervisor. |
| metadata | object | Key-value pairs for custom attributes (e.g., latency, region). |
Example Payload
{
"result_id": "res_88291_ax",
"model_version": "gpt-4-turbo-v2",
"input_payload": {
"prompt": "Summarize the quarterly earnings report."
},
"model_output": {
"summary": "The company saw a 15% increase in revenue..."
},
"score": 0.95,
"feedback": "Highly accurate summary, captured all key KPIs.",
"metadata": {
"processing_time_ms": 450,
"environment": "staging"
}
}
Modeling Testing Outcomes
Testing Outcomes aggregate individual supervision results into a high-level report. Use this structure to define whether a specific test suite or deployment gate has met the required threshold for production.
Outcome Status Types
Outcomes are categorized using the following states:
PASSED: All assertions met the defined thresholds.FAILED: One or more metrics fell below the acceptable limit.INCONCLUSIVE: Insufficient data or supervision results to determine quality.PENDING: Supervision is currently in progress (common in human-in-the-loop workflows).
Schema Definition
| Field | Type | Description |
| :--- | :--- | :--- |
| test_suite_id | string | Identifier for the group of tests. |
| status | enum | One of: PASSED, FAILED, INCONCLUSIVE, PENDING. |
| metrics | object | Aggregated data such as mean_accuracy, f1_score, etc. |
| timestamp | iso8601 | When the testing outcome was finalized. |
Example Usage
{
"test_suite_id": "suite_regression_v4",
"status": "PASSED",
"metrics": {
"total_samples": 1000,
"average_score": 0.88,
"failure_rate": 0.02
},
"timestamp": "2023-11-01T14:30:00Z"
}
Relationships and Hierarchy
The data model follows a hierarchical structure to maintain data integrity:
- Project: The highest level container.
- Test Suite: A collection of related test cases.
- Supervision Result: The individual evaluation of a model's response to a single test case.
- Outcome: The final report generated by aggregating results within a Test Suite.
Best Practices for Data Modeling
- Immutable Results: Once a
SupervisionResultis submitted, it should be treated as immutable. If a model is re-tested, a new result ID should be generated to preserve the audit trail. - Extensible Metadata: Use the
metadataobject to store platform-specific information that doesn't fit into the standard schema. This ensures the API remains flexible for different AI domains (NLP, Computer Vision, etc.). - Versioning: Always include the
model_versionin your data models to allow for side-by-side performance comparisons over time.