Core Data Types
Core Data Types
This section defines the fundamental data structures used to interact with the Supervised AI testing API. These types ensure consistency across model evaluations, dataset management, and reporting.
Primitive Types and Enums
TestStatus
Represents the current lifecycle state of a testing job.
| Value | Description |
| :--- | :--- |
| PENDING | The test has been queued but not yet picked up by a worker. |
| RUNNING | The test is currently being evaluated against the model. |
| COMPLETED | The test finished successfully and results are available. |
| FAILED | An error occurred during the execution. |
MetricType
Standard identifiers for automated evaluation metrics.
ACCURACY: Percentage of correct predictions.LATENCY: Response time of the model in milliseconds.F1_SCORE: Weighted average of precision and recall.TOKEN_USAGE: Count of tokens consumed (specific to LLMs).
Complex Objects
TestInput
The TestInput object represents the payload sent to your model for evaluation.
| Field | Type | Description |
| :--- | :--- | :--- |
| id | string | A unique identifier for the specific test case. |
| payload | object | The raw data (JSON) to be processed by the model. |
| metadata | object | (Optional) Key-value pairs for filtering and categorization. |
Example Usage:
{
"id": "case_001",
"payload": {
"prompt": "Summarize the following text...",
"temperature": 0.7
},
"metadata": {
"tier": "enterprise",
"region": "us-east-1"
}
}
EvaluationFrame
The bridge between input data and the expected output (Ground Truth). Use this when defining datasets for supervised testing.
| Field | Type | Description |
| :--- | :--- | :--- |
| input | TestInput | The data to be sent to the model. |
| expected_output | string|object | The reference answer or "ground truth". |
| weight | number | The importance of this specific frame (default: 1.0). |
MetricResult
The output generated by the evaluation engine for a specific metric.
| Field | Type | Description |
| :--- | :--- | :--- |
| name | MetricType | The metric being reported. |
| value | float | The numerical score calculated. |
| threshold_passed | boolean | Whether the score met the pre-defined success criteria. |
Example Usage:
{
"name": "LATENCY",
"value": 145.2,
"threshold_passed": true
}
API Response Objects
TestReport
The top-level object returned when querying the results of a testing suite.
| Field | Type | Description |
| :--- | :--- | :--- |
| test_id | UUID | Unique identifier for the test run. |
| status | TestStatus | Current state of the run. |
| summary | object | Aggregated scores (e.g., mean accuracy). |
| results | Array<MetricResult> | Detailed breakdown of individual metrics. |
| created_at | ISO8601 | Timestamp of initiation. |
Example Response:
{
"test_id": "550e8400-e29b-41d4-a716-446655440000",
"status": "COMPLETED",
"summary": {
"total_cases": 100,
"passed": 98
},
"results": [
{
"name": "ACCURACY",
"value": 0.98,
"threshold_passed": true
}
],
"created_at": "2023-10-27T10:00:00Z"
}
Internal Components
Note: These are used by the system internally to manage state but may appear in advanced configuration logs.
WorkerNode: Represents the compute instance executing the test logic.DataBuffer: A transient internal stream used to pipe large datasets from storage to the evaluation engine. Avoid manual instantiation of this type.