Object Definitions & Types
Core Object Definitions
The testing-api utilizes a set of standardized objects to ensure consistency across the Supervised AI platform. These objects represent the core entities involved in the lifecycle of an AI test—from dataset definition to performance evaluation.
TestSet
A TestSet is a curated collection of TestCase objects. It serves as the primary container for benchmarking a specific version of a model or agent.
| Field | Type | Description |
| :--- | :--- | :--- |
| id | string (UUID) | Unique identifier for the TestSet. |
| name | string | A human-readable label for the test collection. |
| version | string | Semantic versioning string (e.g., "1.0.4"). |
| metadata | object | Key-value pairs for custom categorization (e.g., environment: "staging"). |
| cases | array[TestCase] | List of individual test instances. |
Example Usage:
{
"id": "ts_f47ac10b-58cc",
"name": "Production Sentiment Benchmarks",
"version": "2.1.0",
"metadata": {
"model_type": "LLM",
"priority": "high"
}
}
TestCase
A TestCase represents a single unit of evaluation. It defines the input provided to the AI and the expected ground truth or constraints for the output.
| Field | Type | Description |
| :--- | :--- | :--- |
| case_id | string | Unique identifier within the TestSet. |
| input_data | object | The payload sent to the model (e.g., a prompt or feature vector). |
| expected_output | any | The "Gold Standard" answer or targeted result. |
| weight | float | Relative importance of this test case (default: 1.0). |
Example Usage:
{
"case_id": "case_001",
"input_data": {
"prompt": "Summarize the following text: [Text Content]"
},
"expected_output": "This text discusses the impact of AI on writing.",
"weight": 1.5
}
EvaluationResult
This object is returned after a model processes a TestCase. It contains the actual model response and the platform's calculated performance metrics.
| Field | Type | Description |
| :--- | :--- | :--- |
| case_id | string | Reference to the original TestCase. |
| actual_output | any | The raw response generated by the model. |
| status | enum | The result status: passed, failed, or error. |
| score | float | Normalized similarity or accuracy score (0.0 to 1.0). |
| latency_ms | integer | Time taken for the model to respond in milliseconds. |
Example Usage:
{
"case_id": "case_001",
"actual_output": "The text explores how AI affects the writing process.",
"status": "passed",
"score": 0.92,
"latency_ms": 450
}
Data Types & Constraints
The following standard types are used across all API endpoints to ensure data integrity.
MetricTypes
When configuring evaluations, you must specify the metric type used for scoring the actual_output against the expected_output.
EXACT_MATCH: Binary comparison (1.0 for identical, 0.0 otherwise).FUZZY_MATCH: String similarity scoring (Levenshtein distance).SEMANTIC_SIMILARITY: Vector-based similarity using embeddings.REGEX_VALIDATION: Validates output against a provided pattern.
ModelConfiguration
Used to define the parameters of the model being tested.
| Field | Type | Default | Description |
| :--- | :--- | :--- | :--- |
| model_name | string | N/A | The specific model identifier (e.g., "gpt-4"). |
| temperature | float | 0.7 | Controls randomness in the output. |
| max_tokens | integer | 512 | The maximum length of the generated response. |
Internal Utility Objects
While the following objects are primarily used internally for platform orchestration, they may appear in debug logs or advanced configuration headers.
TraceContext (Internal)
Used to track a single request across the Supervised AI microservices architecture.
- Role: Facilitates distributed tracing.
- Usage: Should not be modified by the user; however, it can be passed in headers for correlation in enterprise support tickets.
ValidationSchema (Internal)
An internal JSON Schema used to validate incoming TestSet payloads before they are persisted to the database. If a request returns a 422 Unprocessable Entity error, the response will reference the ValidationSchema violation.