System Architecture
System Overview
The testing-api serves as the centralized orchestration layer for model evaluation and data validation within the Supervised AI platform. It acts as the bridge between the core platform services (where datasets and models are managed) and the distributed execution environments where testing logic is applied.
By standardizing the communication protocol for testing, this API ensures that diverse AI models—ranging from LLMs to computer vision systems—can be evaluated against a unified set of performance metrics and safety benchmarks.
Integration Architecture
The testing-api is designed as a stateless service that interacts with three primary architectural tiers:
- Platform Core (Client): Initiates testing jobs, manages versioning of models/datasets, and consumes the final evaluation reports.
- Execution Engines: Specialized workers or ephemeral containers that run the actual test scripts and validation logic.
- Data Lake/Storage: Provides the source data (ground truth) and receives the telemetry/artifacts generated during the testing process.
High-Level Data Flow
Public Interface & Usage
The system exposes a RESTful interface to manage the lifecycle of a test. Users primarily interact with the API to define test suites, trigger executions, and retrieve metrics.
Test Execution Lifecycle
To initiate a test through the platform, the client must provide a configuration payload defining the scope and parameters of the evaluation.
1. Dispatching a Test Job
POST /v1/tests/run
Request Schema:
| Field | Type | Description |
| :--- | :--- | :--- |
| model_id | String | Unique identifier of the model version under test. |
| dataset_id | String | The reference dataset used for evaluation. |
| test_suite | Array | List of specific test cases or metric names to run (e.g., accuracy, latency, bias_check). |
| callback_url | String | (Optional) The URL where the platform will send results upon completion. |
Example Usage:
curl -X POST https://api.supervised.ai/testing/v1/tests/run \
-H "Authorization: Bearer <token>" \
-H "Content-Type: application/json" \
-d '{
"model_id": "llama-3-finetuned-v1",
"dataset_id": "qa-benchmark-gold",
"test_suite": ["hallucination_rate", "response_time"],
"callback_url": "https://hooks.supervised.ai/results/123"
}'
2. Monitoring Status
GET /v1/tests/{job_id}/status
The API provides real-time status updates. Common states include PENDING, RUNNING, COMPLETED, or FAILED.
Result Reporting
Once execution is complete, the testing-api aggregates metrics from the execution engine and returns a structured JSON object containing:
- Summary Metrics: Aggregate scores for the entire test run.
- Trace Details: Per-sample logs or failure reasons for debugging.
- Artifact Links: Signed URLs to download generated reports or visualization plots.
Security and Authentication
- Bearer Token Authentication: All requests to the
testing-apimust be authenticated via the Supervised AI Identity Provider. - Isolation: Each test execution is scoped to a specific project or organization, ensuring that model parameters and dataset contents are never leaked across different platform tenants.