Logging & Observability

Monitoring and Logging

The testing-api provides comprehensive logging and observability features to help you monitor the health of the Supervised AI platform and debug issues during the testing lifecycle. All logs follow a standardized format to ensure compatibility with modern log aggregators and monitoring tools.

Log Format and Structure

By default, the API outputs logs in structured JSON format to stdout. This allows for easy parsing by tools like ELK Stack, Datadog, or CloudWatch.

Each log entry typically contains the following fields:

Example Log Entry:

{
  "timestamp": "2023-10-27T10:15:30.452Z",
  "level": "INFO",
  "request_id": "a1b2c3d4-e5f6-4g7h-8i9j-k1l2m3n4o5p6",
  "method": "POST",
  "path": "/v1/test/execute",
  "status": 200,
  "latency_ms": 145.2,
  "message": "Test execution completed successfully"
}

Request Tracing

To track a request across multiple microservices within the Supervised AI platform, the testing-api utilizes the X-Request-ID header.

Incoming Requests: If you provide an X-Request-ID in your request headers, the API will use that ID in all associated logs.
Auto-generation: If no header is provided, the API automatically generates a unique UUID for the session.
Response Header: The X-Request-ID is always returned in the response headers, allowing you to map client-side errors directly to server-side logs.

Health Check Endpoints

You can programmatically monitor the status of the testing-api using the following health endpoints. These are essential for liveness and readiness probes in containerized environments (like Kubernetes).

GET `/health`

Returns the general status of the API service.

Response (200 OK):

{
  "status": "UP",
  "version": "1.0.4",
  "uptime": "2d 4h 12m"
}

GET `/health/ready`

Checks if the API is ready to handle traffic, including connectivity to internal dependencies (databases, AI model workers).

Response (200 OK): API is ready.
Response (503 Service Unavailable): One or more downstream dependencies are failing.

Debugging Failed Requests

When a request fails (4xx or 5xx status codes), the API includes a detailed error object in the response body. Use the request_id from the response header to locate the full stack trace in the platform logs.

Error Response Structure:

{
  "error": {
    "code": "VALIDATION_FAILED",
    "message": "The field 'test_suite_id' is required.",
    "request_id": "a1b2c3d4-e5f6-4g7h-8i9j-k1l2m3n4o5p6"
  }
}

Log Levels Configuration

You can adjust the verbosity of the logs by setting the LOG_LEVEL environment variable.

DEBUG: Detailed information, typically of interest only when diagnosing problems.
INFO: (Default) Confirmation that things are working as expected.
WARN: An indication that something unexpected happened (e.g., a retried network request).
ERROR: A more serious problem that prevented a specific function from completing.