Introduction & Platform Vision
Overview
The testing-api serves as the foundational communication layer for the Supervised AI platform. It provides a standardized interface for defining, executing, and monitoring AI-driven tests. By acting as the central gateway between raw data inputs and the Supervised AI evaluation engine, this API ensures that model performance and safety metrics are consistently captured and reported.
This repository defines the structures necessary to programmatically interact with the testing lifecycle, allowing developers to integrate rigorous AI supervision directly into their existing CI/CD pipelines or production monitoring tools.
Platform Vision
Supervised AI is built on the principle that AI models require continuous, automated oversight to remain reliable in production environments. The testing-api is the cornerstone of this vision, enabling a transition from manual spot-checking to systematic, data-driven validation.
Our goal is to provide a unified "source of truth" for model quality. Whether you are validating a Large Language Model (LLM) for hallucinations or testing a computer vision model for edge-case accuracy, the testing-api provides the consistent schema and interface needed to ensure every model deployment meets your organization's safety and performance benchmarks.
Core Integration Patterns
The testing-api is designed to be consumed by developers building integration layers or automated testing suites. The primary interface revolves around submitting Test Jobs and retrieving Evaluation Reports.
Initializing a Test Session
Users interact with the platform by defining a test configuration that specifies the target model and the metrics to be evaluated.
// Example: Initializing a test suite via the API structure
import { TestingClient, TestCriteria } from '@supervised-ai/testing-api';
const client = new TestingClient({
apiKey: process.env.SUPERVISED_AI_KEY,
environment: 'production'
});
const suite = await client.createSuite({
name: "LLM Safety Guardrails",
modelId: "gpt-4-eval-v1",
criteria: [TestCriteria.Accuracy, TestCriteria.Safety, TestCriteria.Latency]
});
Executing Evaluation Jobs
Once a suite is defined, the API allows for the submission of datasets for batch processing.
// Submitting a payload for evaluation
const evaluation = await client.runTest({
suiteId: suite.id,
inputData: {
prompt: "Analyze the following financial report...",
context: "User-provided PDF document"
},
expectedOutput: "The report indicates a 5% growth..."
});
console.log(`Test Job ID: ${evaluation.jobId}`);
Public Interface Components
| Component | Role | Access Level |
| :--- | :--- | :--- |
| TestingClient | Primary entry point for all API interactions. | Public |
| TestCriteria | Enum defining supported evaluation metrics (e.g., Accuracy, Bias, Toxicity). | Public |
| ReportGenerator | Internal utility that formats raw test data into human-readable summaries. | Internal (Accessible via client.getReport()) |
| ValidationSchema | Ensures input data meets the requirements for the specific model type being tested. | Public |
Input/Output Types
TestConfiguration
This object defines the parameters of a testing session.
modelId(string): The unique identifier of the model under test.version(string): The semantic version of the model.thresholds(Map<string, number>): A key-value pair defining the minimum passing scores for various metrics.
EvaluationResult
The output returned by the API after a test job completes.
status('passed' | 'failed' | 'error'): The final result of the test based on thresholds.scores(Record<string, number>): The raw numerical values for each requested metric.metadata(object): Telemetry data including execution time and token usage.