Introduction to Supervised AI API
Overview
The Supervised AI Testing API serves as the bridge between model development and production readiness. It provides a standardized framework for validating model performance, ensuring data integrity, and automating regression testing within the Supervised AI ecosystem.
This API allows developers to programmatically trigger evaluation pipelines, compare model versions, and retrieve detailed diagnostic metrics. By integrating this API into your CI/CD workflows, you can enforce quality gates that prevent suboptimal models from reaching deployment.
Key Capabilities
- Automated Evaluation: Trigger comprehensive test suites against specific datasets or model endpoints.
- Performance Benchmarking: Programmatically compare current model outputs against "Gold Standard" datasets.
- Regression Testing: Ensure that new iterations of a model maintain accuracy on critical edge cases.
- Metric Retrieval: Fetch granular performance data including precision, recall, F1 scores, and custom business logic metrics.
Core Concepts
To effectively use the Testing API, it is important to understand the following primary entities:
| Entity | Description | | :--- | :--- | | Test Suite | A collection of test cases and configurations used to evaluate a model's performance. | | Test Run | A single execution of a Test Suite. Each run generates a unique set of results and logs. | | Evaluation Dataset | The specific subset of data used as the ground truth during the testing process. | | Metric Report | The structured output of a Test Run, containing raw data and calculated performance scores. |
Basic Usage
The Testing API follows RESTful principles. Most interactions involve defining a test configuration and submitting it to the execution engine.
Triggering a Test Run
To initiate a new evaluation, send a POST request to the test execution endpoint. You must specify the model identifier and the dataset to be used.
Endpoint: POST /v1/tests/run
Input Parameters:
| Parameter | Type | Required | Description |
| :--- | :--- | :--- | :--- |
| model_id | string | Yes | The unique identifier of the model version to test. |
| suite_id | string | Yes | The ID of the predefined test suite to execute. |
| callback_url | string | No | A URL to receive a webhook notification once the test completes. |
Example Request:
{
"model_id": "ner-model-v2.4",
"suite_id": "production-validation-set",
"callback_url": "https://api.yourdomain.com/hooks/testing"
}
Retrieving Results
Once a Test Run has moved to a COMPLETED state, you can retrieve the Metric Report to analyze performance.
Endpoint: GET /v1/results/{run_id}
Example Response:
{
"run_id": "tr_987654321",
"status": "COMPLETED",
"summary": {
"accuracy": 0.942,
"latency_ms": 120,
"status": "PASS"
},
"detailed_metrics": [
{
"category": "edge-cases",
"score": 0.88,
"passed": true
}
]
}
Internal Components
While the majority of the Testing API is exposed via the public endpoints described above, the repository contains internal modules responsible for environment isolation and raw log processing.
- Test Orchestrator (Internal): Handles the lifecycle of a test execution, from provisioning resources to cleanup.
- Result Parser (Internal): Normalizes output from various model types into the standardized JSON format used by the API.
Users do not interact with these components directly, but they are responsible for the consistency and reliability of the data returned by the public interface.