Performance Benchmarking

This section provides the performance baselines and throughput constraints for the testing-api. These metrics are intended to help users calibrate their automated tests and integration pipelines to match the Supervised AI platform's environment capabilities.

Latency Expectations

The following table outlines the expected response times for standard API operations within the testing environment. These metrics assume a stable network connection and standard payload sizes.

| Operation Type | Average Latency (p50) | Tail Latency (p99) | Notes | | :--- | :--- | :--- | :--- | | Health Checks / Heartbeat | < 50ms | < 150ms | Basic connectivity verification. | | Metadata Retrieval | 100ms - 200ms | < 400ms | Fetching test configurations or schemas. | | Inference Mocking | 300ms - 800ms | < 1.5s | Simulated model execution latency. | | Data Batch Submission | 500ms - 1.2s | < 2.5s | Processing payloads up to 5MB. |

[!NOTE]
Latency may vary based on the complexity of the test suite and the geographic region of the requester.

Throughput Limits

To ensure stability for all users, the testing environment enforces rate limits. Exceeding these limits will result in 429 Too Many Requests responses.

Rate Limit: 50 requests per second (RPS) per API Key.
Burst Capacity: Up to 100 requests in a 2-second window.
Concurrent Connections: Maximum 20 simultaneous persistent connections.

Handling Rate Limits

The API returns standard headers to help you manage your request frequency:

X-RateLimit-Limit: 50
X-RateLimit-Remaining: 42
X-RateLimit-Reset: 1625097600

Benchmarking Usage Example

You can use the following pattern to measure the latency of your integration within your test suite. This ensures your local environment aligns with the platform's performance expectations.

import time
from testing_api import Client

client = Client(api_key="your_test_key")

def benchmark_inference():
    start_time = time.perf_counter()
    
    # Trigger a standard test inference
    response = client.test_inference(payload={"input": "sample_data"})
    
    end_time = time.perf_counter()
    latency_ms = (end_time - start_time) * 1000
    
    print(f"Status Code: {response.status_code}")
    print(f"Latency: {latency_ms:.2f} ms")

if __name__ == "__main__":
    benchmark_inference()

Performance Best Practices

To optimize your interaction with the testing-api and avoid hitting throughput bottlenecks:

Connection Pooling: Reuse TCP connections via the Client object to reduce overhead from repeated handshakes.
Payload Optimization: Keep test payloads under 2MB where possible. Large JSON structures increase serialization and network transit time.
Backoff Strategy: Implement an exponential backoff strategy when encountering 429 errors.
Asynchronous Calls: For high-volume testing, utilize the asynchronous methods provided in the SDK to handle concurrent requests efficiently without blocking the main execution thread.