Testing Endpoints Catalog
Overview
The Testing API provides a standardized interface for executing evaluations, running regression tests, and benchmarking models within the Supervised AI platform. It allows users to programmatically trigger test suites and retrieve performance metrics.
Authentication
All requests to the Testing API must include a Bearer Token in the authorization header.
Authorization: Bearer <YOUR_ACCESS_TOKEN>
Endpoints Catalog
1. Execute Test Suite
Triggers a predefined test suite against a specific model or deployment.
- URL:
/v1/test/run - Method:
POST - Content-Type:
application/json
Request Body
| Field | Type | Required | Description |
| :--- | :--- | :--- | :--- |
| model_id | string | Yes | The unique identifier of the model to be tested. |
| test_suite_id | string | Yes | The ID of the test suite (e.g., "regression-v1", "latency-bench"). |
| environment | string | No | The target environment (staging, production). Defaults to staging. |
| parameters | object | No | Key-value pairs for dynamic test configuration. |
Example Usage
curl -X POST https://api.supervised.ai/v1/test/run \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{
"model_id": "gpt-4-custom-01",
"test_suite_id": "accuracy-validation-set",
"parameters": {
"threshold": 0.85,
"sample_size": 100
}
}'
Response
- 202 Accepted
{
"job_id": "test_98765abc",
"status": "queued",
"estimated_completion": "2023-10-27T15:00:00Z"
}
2. Get Job Status
Polls the status of a specific test execution job.
- URL:
/v1/test/status/{job_id} - Method:
GET
Parameters
| Parameter | Type | Description |
| :--- | :--- | :--- |
| job_id | string | The unique ID returned by the /run endpoint. |
Response
- 200 OK
{
"job_id": "test_98765abc",
"status": "processing",
"progress": "45%",
"started_at": "2023-10-27T14:55:00Z"
}
3. Retrieve Test Results
Fetches the detailed metrics and logs once a test job has reached the completed state.
- URL:
/v1/test/results/{job_id} - Method:
GET
Response
- 200 OK
{
"job_id": "test_98765abc",
"summary": {
"total_cases": 100,
"passed": 92,
"failed": 8
},
"metrics": {
"accuracy": 0.92,
"p99_latency_ms": 450,
"f1_score": 0.89
},
"artifacts_url": "https://storage.supervised.ai/results/test_98765abc.json"
}
4. Evaluate Custom Input
Run a single, ad-hoc test case against a model endpoint for real-time validation.
- URL:
/v1/eval/single - Method:
POST
Request Body
| Field | Type | Description |
| :--- | :--- | :--- |
| model_id | string | The model to query. |
| input_data | object | The payload to send to the model. |
| expected_output | object | (Optional) The ground truth for comparison. |
Example Usage
import requests
payload = {
"model_id": "sentiment-analyzer-v2",
"input_data": {"text": "This platform is incredibly intuitive."},
"expected_output": {"label": "positive"}
}
response = requests.post(
"https://api.supervised.ai/v1/eval/single",
json=payload,
headers={"Authorization": "Bearer YOUR_TOKEN"}
)
print(response.json())
Data Objects
Test Status Types
The following strings represent the possible states of a testing job:
queued: Job is waiting for an available runner.processing: Job is currently executing.completed: Job finished successfully.failed: Job encountered a system error (distinct from a "failed test case").cancelled: Job was manually terminated by a user.
Metric Schema
Standard metrics returned in the results object:
accuracy: (float) Ratio of correct predictions to total.latency_ms: (integer) Time taken for the model to respond.cost_estimate: (float) Estimated API cost for the run.