Performance Benchmarking
Performance Benchmarking
This section provides the performance baselines and throughput constraints for the testing-api. These metrics are intended to help users calibrate their automated tests and integration pipelines to match the Supervised AI platform's environment capabilities.
Latency Expectations
The following table outlines the expected response times for standard API operations within the testing environment. These metrics assume a stable network connection and standard payload sizes.
| Operation Type | Average Latency (p50) | Tail Latency (p99) | Notes | | :--- | :--- | :--- | :--- | | Health Checks / Heartbeat | < 50ms | < 150ms | Basic connectivity verification. | | Metadata Retrieval | 100ms - 200ms | < 400ms | Fetching test configurations or schemas. | | Inference Mocking | 300ms - 800ms | < 1.5s | Simulated model execution latency. | | Data Batch Submission | 500ms - 1.2s | < 2.5s | Processing payloads up to 5MB. |
[!NOTE]
Latency may vary based on the complexity of the test suite and the geographic region of the requester.
Throughput Limits
To ensure stability for all users, the testing environment enforces rate limits. Exceeding these limits will result in 429 Too Many Requests responses.
- Rate Limit: 50 requests per second (RPS) per API Key.
- Burst Capacity: Up to 100 requests in a 2-second window.
- Concurrent Connections: Maximum 20 simultaneous persistent connections.
Handling Rate Limits
The API returns standard headers to help you manage your request frequency:
X-RateLimit-Limit: 50
X-RateLimit-Remaining: 42
X-RateLimit-Reset: 1625097600
Benchmarking Usage Example
You can use the following pattern to measure the latency of your integration within your test suite. This ensures your local environment aligns with the platform's performance expectations.
import time
from testing_api import Client
client = Client(api_key="your_test_key")
def benchmark_inference():
start_time = time.perf_counter()
# Trigger a standard test inference
response = client.test_inference(payload={"input": "sample_data"})
end_time = time.perf_counter()
latency_ms = (end_time - start_time) * 1000
print(f"Status Code: {response.status_code}")
print(f"Latency: {latency_ms:.2f} ms")
if __name__ == "__main__":
benchmark_inference()
Performance Best Practices
To optimize your interaction with the testing-api and avoid hitting throughput bottlenecks:
- Connection Pooling: Reuse TCP connections via the
Clientobject to reduce overhead from repeated handshakes. - Payload Optimization: Keep test payloads under 2MB where possible. Large JSON structures increase serialization and network transit time.
- Backoff Strategy: Implement an exponential backoff strategy when encountering
429errors. - Asynchronous Calls: For high-volume testing, utilize the asynchronous methods provided in the SDK to handle concurrent requests efficiently without blocking the main execution thread.