Rate Limiting & Concurrency
Rate Limiting & Concurrency
To ensure the stability of the Supervised AI testing environment and provide equitable resource distribution, the testing-api enforces specific usage quotas and concurrency limits. Users should design their test suites to handle throttling gracefully.
Rate Limits
The API uses a sliding window algorithm to monitor request volume. Limits are applied based on your API Key or origin IP address.
| Environment | Rate Limit | Burst Capacity | | :--- | :--- | :--- | | Sandbox / Testing | 100 requests per minute | 20 requests | | Staging | 500 requests per minute | 50 requests |
Rate Limit Headers
Every response from the API includes headers to help you track your current usage:
X-RateLimit-Limit: The maximum number of requests allowed in the current window.X-RateLimit-Remaining: The number of requests remaining in the current window.X-RateLimit-Reset: The time (in UTC Epoch seconds) when the current rate limit window resets.
Concurrency Limits
In addition to the total number of requests, the API limits the number of simultaneous active connections.
- Maximum Concurrent Requests: 5
- Behavior: If you exceed this limit, the server will queue the request for a short period (up to 5 seconds) before rejecting it.
Handling Throttling (HTTP 429)
When a rate limit is exceeded, the API returns an HTTP 429 Too Many Requests error. The response body will provide a retry suggestion:
{
"error": "rate_limit_exceeded",
"message": "Rate limit exceeded. Please try again in 15 seconds.",
"retry_after_seconds": 15
}
Example: Implementing Backoff in Python
When writing automated tests, it is recommended to implement an exponential backoff strategy to handle these limits:
import time
import requests
def call_testing_api(endpoint):
for i in range(5): # Retry up to 5 times
response = requests.get(endpoint)
if response.status_code == 429:
# Get wait time from header or default to 2^i
wait = int(response.headers.get("Retry-After", 2 ** i))
print(f"Rate limited. Retrying in {wait}s...")
time.sleep(wait)
continue
return response.json()
Best Practices for Testing
- Reuse Connections: Use persistent HTTP connections (Keep-Alive) to reduce overhead, but respect the concurrency cap.
- Stagger Test Suites: If running multiple CI/CD pipelines simultaneously, stagger the start times to avoid hitting the burst capacity.
- Mocking: For high-volume unit tests that do not require actual model output, consider mocking the
testing-apiresponses locally to preserve your quota for integration testing.