Supervised AI Workflow
Supervised AI Workflow Integration
The testing-api serves as the bridge between your model's training/inference logic and the Supervised AI platform's evaluation engine. It standardizes how metrics are calculated, validated, and reported back to the platform dashboard.
Overview of the Evaluation Loop
The testing workflow typically occurs at the end of an epoch or after a full training run. The integration follows a three-step process:
- Payload Preparation: Collecting model predictions and ground truth labels.
- Validation: Ensuring the data format matches the platform's requirements.
- Reporting: Sending the processed results to the Supervised AI backend.
Integrating with Training Loops
To integrate the testing-api into your existing training scripts (e.g., PyTorch, TensorFlow, or Scikit-Learn), use the standard evaluation interface.
Example: Post-Epoch Evaluation
from testing_api import SupervisedTester
# Initialize the tester with your platform credentials
tester = SupervisedTester(api_key="your_api_key", project_id="proj_123")
def evaluation_step(model, validation_loader):
predictions = []
targets = []
for inputs, labels in validation_loader:
output = model(inputs)
predictions.extend(output.cpu().numpy())
targets.extend(labels.cpu().numpy())
# Send results to the platform for metric calculation
response = tester.log_evaluation(
epoch=1,
predictions=predictions,
ground_truth=targets,
task_type="classification"
)
print(f"Platform Analysis: {response['status']}")
Core Public Interface
log_evaluation()
Sends a batch of predictions and labels to the platform for analysis.
Inputs:
| Parameter | Type | Description |
| :--- | :--- | :--- |
| epoch | int | The current iteration or training cycle. |
| predictions | List[Any] | The raw output or labels predicted by your model. |
| ground_truth | List[Any] | The actual labels (Y-true) from your dataset. |
| task_type | str | The ML task (e.g., classification, regression, ner). |
| metadata | dict | (Optional) Key-value pairs for additional context (e.g., hyperparameters). |
Output:
Returns a JSON object containing the test_run_id and a URL to view the results on the Supervised AI dashboard.
Internal Validation Component
While the API handles most logic via the public log_evaluation method, it utilizes an internal DataValidator class to ensure schema consistency.
- Role: Validates that the dimensions of
predictionsandground_truthmatch and that the data types are compatible with the specifiedtask_type. - Behavior: If validation fails, the API will raise a
ValidationErrorbefore attempting to transmit data to the platform to save bandwidth and compute resources.
Handling Real-time Inference Tests
For production monitoring, the testing-api can be used to run "Golden Dataset" checks against a live endpoint.
# Run a specific test suite against a production-ready model
test_results = tester.run_test_suite(
suite_id="production_sanity_check",
input_data=test_dataset_samples
)
if not test_results.passed:
raise Exception("Model failed critical quality gates.")
Data Types and Formatting
- Classification: Expects
intorstringlabels. - Regression: Expects
floatvalues. - Object Detection: Expects a dictionary containing bounding box coordinates
[x, y, w, h]and class IDs.