Supervised AI Workflow

Supervised AI Workflow Integration

The testing-api serves as the bridge between your model's training/inference logic and the Supervised AI platform's evaluation engine. It standardizes how metrics are calculated, validated, and reported back to the platform dashboard.

Overview of the Evaluation Loop

The testing workflow typically occurs at the end of an epoch or after a full training run. The integration follows a three-step process:

Payload Preparation: Collecting model predictions and ground truth labels.
Validation: Ensuring the data format matches the platform's requirements.
Reporting: Sending the processed results to the Supervised AI backend.

Integrating with Training Loops

To integrate the testing-api into your existing training scripts (e.g., PyTorch, TensorFlow, or Scikit-Learn), use the standard evaluation interface.

Example: Post-Epoch Evaluation

from testing_api import SupervisedTester

# Initialize the tester with your platform credentials
tester = SupervisedTester(api_key="your_api_key", project_id="proj_123")

def evaluation_step(model, validation_loader):
    predictions = []
    targets = []
    
    for inputs, labels in validation_loader:
        output = model(inputs)
        predictions.extend(output.cpu().numpy())
        targets.extend(labels.cpu().numpy())
    
    # Send results to the platform for metric calculation
    response = tester.log_evaluation(
        epoch=1,
        predictions=predictions,
        ground_truth=targets,
        task_type="classification"
    )
    
    print(f"Platform Analysis: {response['status']}")

Core Public Interface

`log_evaluation()`

Sends a batch of predictions and labels to the platform for analysis.

Inputs:

Output: Returns a JSON object containing the test_run_id and a URL to view the results on the Supervised AI dashboard.

Internal Validation Component

While the API handles most logic via the public log_evaluation method, it utilizes an internal DataValidator class to ensure schema consistency.

Role: Validates that the dimensions of predictions and ground_truth match and that the data types are compatible with the specified task_type.
Behavior: If validation fails, the API will raise a ValidationError before attempting to transmit data to the platform to save bandwidth and compute resources.

Handling Real-time Inference Tests

For production monitoring, the testing-api can be used to run "Golden Dataset" checks against a live endpoint.

# Run a specific test suite against a production-ready model
test_results = tester.run_test_suite(
    suite_id="production_sanity_check",
    input_data=test_dataset_samples
)

if not test_results.passed:
    raise Exception("Model failed critical quality gates.")

Data Types and Formatting

Classification: Expects int or string labels.
Regression: Expects float values.
Object Detection: Expects a dictionary containing bounding box coordinates [x, y, w, h] and class IDs.