Rate Limiting & Concurrency

To ensure the stability of the Supervised AI testing environment and provide equitable resource distribution, the testing-api enforces specific usage quotas and concurrency limits. Users should design their test suites to handle throttling gracefully.

Rate Limits

The API uses a sliding window algorithm to monitor request volume. Limits are applied based on your API Key or origin IP address.

Rate Limit Headers

Every response from the API includes headers to help you track your current usage:

X-RateLimit-Limit: The maximum number of requests allowed in the current window.
X-RateLimit-Remaining: The number of requests remaining in the current window.
X-RateLimit-Reset: The time (in UTC Epoch seconds) when the current rate limit window resets.

Concurrency Limits

In addition to the total number of requests, the API limits the number of simultaneous active connections.

Maximum Concurrent Requests: 5
Behavior: If you exceed this limit, the server will queue the request for a short period (up to 5 seconds) before rejecting it.

Handling Throttling (HTTP 429)

When a rate limit is exceeded, the API returns an HTTP 429 Too Many Requests error. The response body will provide a retry suggestion:

{
  "error": "rate_limit_exceeded",
  "message": "Rate limit exceeded. Please try again in 15 seconds.",
  "retry_after_seconds": 15
}

Example: Implementing Backoff in Python

When writing automated tests, it is recommended to implement an exponential backoff strategy to handle these limits:

import time
import requests

def call_testing_api(endpoint):
    for i in range(5):  # Retry up to 5 times
        response = requests.get(endpoint)
        
        if response.status_code == 429:
            # Get wait time from header or default to 2^i
            wait = int(response.headers.get("Retry-After", 2 ** i))
            print(f"Rate limited. Retrying in {wait}s...")
            time.sleep(wait)
            continue
            
        return response.json()

Best Practices for Testing

Reuse Connections: Use persistent HTTP connections (Keep-Alive) to reduce overhead, but respect the concurrency cap.
Stagger Test Suites: If running multiple CI/CD pipelines simultaneously, stagger the start times to avoid hitting the burst capacity.
Mocking: For high-volume unit tests that do not require actual model output, consider mocking the testing-api responses locally to preserve your quota for integration testing.