Core Data Objects
Overview
The testing-api utilizes a set of standardized data objects to ensure consistency across the Supervised AI platform. These objects define how test data is structured, how models are evaluated, and how results are reported.
TestSet
The TestSet is the primary container for evaluation data. It bundles individual test cases with metadata required for the platform to identify the versioning and purpose of the evaluation.
| Property | Type | Description |
| :--- | :--- | :--- |
| id | string | A unique identifier for the test set. |
| name | string | A human-readable name for the dataset. |
| version | string | Semantic versioning string (e.g., 1.0.2). |
| cases | Array<TestCase> | A list of individual test instances. |
Usage Example:
{
"id": "ts_98765",
"name": "Customer Intent Classification - Production",
"version": "2.1.0",
"cases": [...]
}
TestCase
The TestCase represents a single unit of evaluation. It contains the input payload sent to the AI model and the "Ground Truth" used for validation.
| Property | Type | Description |
| :--- | :--- | :--- |
| caseId | string | Unique ID for the specific test case. |
| input | Object | The data object sent to the model (e.g., prompt, image URL, or features). |
| expectedOutput | any | The target ground truth used to calculate accuracy/performance. |
| metadata | Record<string, any> | (Optional) Key-value pairs for filtering results (e.g., difficulty: "high"). |
Usage Example:
{
"caseId": "case_001",
"input": {
"text": "How do I reset my password?"
},
"expectedOutput": "account_security_intent",
"metadata": {
"category": "security",
"priority": 1
}
}
ModelResponse
When a model is invoked via the testing API, it must return a ModelResponse. This object normalizes the output of various AI models into a format the evaluation engine can parse.
| Property | Type | Description |
| :--- | :--- | :--- |
| rawOutput | any | The direct response from the model provider. |
| parsedOutput | string \| number | The extracted value used for comparison against expectedOutput. |
| latencyMs | number | Time taken for the model to respond in milliseconds. |
| tokensUsed | number | (Optional) Total tokens consumed for LLM-based evaluations. |
Usage Example:
const response: ModelResponse = {
rawOutput: { choices: [{ message: "The user wants to reset password" }] },
parsedOutput: "account_security_intent",
latencyMs: 145,
tokensUsed: 42
};
EvaluationResult
After running a TestCase, the API generates an EvaluationResult. This object determines the success or failure of the test based on defined metrics.
| Property | Type | Description |
| :--- | :--- | :--- |
| caseId | string | Reference to the original test case. |
| status | enum | Result status: PASSED, FAILED, or ERROR. |
| score | number | A normalized score between 0.0 and 1.0. |
| reason | string | (Optional) Explanation for failures or discrepancy details. |
Usage Example:
{
"caseId": "case_001",
"status": "PASSED",
"score": 1.0,
"reason": "Exact match found between parsed output and expected intent."
}
MetricConfiguration
The MetricConfiguration object allows users to define how the API should calculate "correctness" for a TestSuite.
| Property | Type | Description |
| :--- | :--- | :--- |
| metricType | string | The algorithm to use (e.g., EXACT_MATCH, FUZZY_MATCH, COSINE_SIMILARITY). |
| threshold | number | The minimum score required for a PASSED status. |
| caseSensitive | boolean | (Internal) Whether string comparisons should ignore case. Defaults to false. |
Usage Example:
{
"metricType": "COSINE_SIMILARITY",
"threshold": 0.85,
"caseSensitive": false
}