Introduction
Overview
The testing-api serves as the foundational interface for managing and executing evaluations within the Supervised AI platform. It provides a standardized structure for defining test suites, executing model evaluations, and capturing performance metrics.
Designed for developers and data scientists, this API ensures that testing workflows are consistent across different models and environments, enabling high-fidelity quality assurance for AI-driven applications.
Core Objectives
The primary goal of the testing-api is to streamline the validation phase of the AI lifecycle. It achieves this through three core pillars:
- Standardization: Providing a uniform schema for test cases and results to ensure compatibility across the Supervised AI ecosystem.
- Automation: Enabling programmatic triggers for testing suites within CI/CD pipelines or model training workflows.
- Reliability: Delivering precise, reproducible evaluation data that helps teams make informed decisions regarding model deployment.
Public Interface and Integration
The API is structured to handle various testing modalities, including unit tests for model outputs, integration tests for platform workflows, and benchmarking against gold-standard datasets.
Interaction Pattern
Users typically interact with the testing-api via structured JSON requests to define test parameters and retrieve evaluation summaries.
// Example: Initiating a test run
POST /v1/tests/run
{
"model_id": "sup-ai-v2-alpha",
"test_suite": "sentiment-analysis-core",
"config": {
"threshold": 0.85,
"concurrency": 10
}
}
Response Structure
The API returns detailed execution objects that include status codes, performance metrics (latency, throughput), and evaluation scores (accuracy, F1-score, etc.).
// Example: Evaluation Result
{
"test_id": "test_88291",
"status": "completed",
"results": {
"accuracy": 0.94,
"avg_latency_ms": 120
},
"timestamp": "2023-10-27T10:00:00Z"
}
Role in the Supervised AI Platform
Within the broader Supervised AI platform, the testing-api acts as the bridge between model development and production readiness. By centralizing the testing logic, it allows the platform to maintain a "source of truth" for model performance, which can be queried by the dashboard for visualization or by deployment services for automated gating.