Introduction & Platform Vision

Overview

The testing-api serves as the foundational communication layer for the Supervised AI platform. It provides a standardized interface for defining, executing, and monitoring AI-driven tests. By acting as the central gateway between raw data inputs and the Supervised AI evaluation engine, this API ensures that model performance and safety metrics are consistently captured and reported.

This repository defines the structures necessary to programmatically interact with the testing lifecycle, allowing developers to integrate rigorous AI supervision directly into their existing CI/CD pipelines or production monitoring tools.

Platform Vision

Supervised AI is built on the principle that AI models require continuous, automated oversight to remain reliable in production environments. The testing-api is the cornerstone of this vision, enabling a transition from manual spot-checking to systematic, data-driven validation.

Our goal is to provide a unified "source of truth" for model quality. Whether you are validating a Large Language Model (LLM) for hallucinations or testing a computer vision model for edge-case accuracy, the testing-api provides the consistent schema and interface needed to ensure every model deployment meets your organization's safety and performance benchmarks.

Core Integration Patterns

The testing-api is designed to be consumed by developers building integration layers or automated testing suites. The primary interface revolves around submitting Test Jobs and retrieving Evaluation Reports.

Initializing a Test Session

Users interact with the platform by defining a test configuration that specifies the target model and the metrics to be evaluated.

// Example: Initializing a test suite via the API structure
import { TestingClient, TestCriteria } from '@supervised-ai/testing-api';

const client = new TestingClient({
  apiKey: process.env.SUPERVISED_AI_KEY,
  environment: 'production'
});

const suite = await client.createSuite({
  name: "LLM Safety Guardrails",
  modelId: "gpt-4-eval-v1",
  criteria: [TestCriteria.Accuracy, TestCriteria.Safety, TestCriteria.Latency]
});

Executing Evaluation Jobs

Once a suite is defined, the API allows for the submission of datasets for batch processing.

// Submitting a payload for evaluation
const evaluation = await client.runTest({
  suiteId: suite.id,
  inputData: {
    prompt: "Analyze the following financial report...",
    context: "User-provided PDF document"
  },
  expectedOutput: "The report indicates a 5% growth..."
});

console.log(`Test Job ID: ${evaluation.jobId}`);

Public Interface Components

Input/Output Types

`TestConfiguration`

This object defines the parameters of a testing session.

modelId (string): The unique identifier of the model under test.
version (string): The semantic version of the model.
thresholds (Map<string, number>): A key-value pair defining the minimum passing scores for various metrics.

`EvaluationResult`

The output returned by the API after a test job completes.

status ('passed' | 'failed' | 'error'): The final result of the test based on thresholds.
scores (Record<string, number>): The raw numerical values for each requested metric.
metadata (object): Telemetry data including execution time and token usage.