Data Models
Core Entity Models
The testing-api uses a structured set of data models to ensure consistency across the Supervised AI platform. These entities define how tests are configured, how datasets are structured, and how evaluation results are reported.
Evaluation
The Evaluation model is the primary object representing a testing session. It encapsulates the configuration, the target model details, and the dataset being used for the assessment.
| Property | Type | Description |
| :--- | :--- | :--- |
| id | string | Unique identifier for the evaluation. |
| name | string | Human-readable name for the test run. |
| status | enum | Current state: PENDING, RUNNING, COMPLETED, FAILED. |
| config | EvaluationConfig | Parameters governing the execution of the test. |
| createdAt | timestamp | ISO 8601 timestamp of creation. |
{
"id": "eval_882910",
"name": "LLM-Sentiment-Benchmark-v1",
"status": "COMPLETED",
"config": {
"threshold": 0.85,
"retryCount": 3
}
}
Dataset
Datasets represent the collection of inputs and expected outputs (ground truth) used to validate model performance.
| Property | Type | Description |
| :--- | :--- | :--- |
| datasetId | string | Reference to the source dataset. |
| version | string | Semantic version of the dataset used. |
| entries | Array<DataEntry> | The actual test cases (inputs and expected labels). |
DataEntry Structure:
interface DataEntry {
id: string;
input: Record<string, any>; // The prompt or raw data
expectedOutput?: any; // Ground truth for comparison
metadata?: Record<string, any>; // Optional context (tags, difficulty, etc.)
}
ModelConfiguration
This model defines the interface for the AI model being tested. It specifies the endpoint, parameters, and authentication required to invoke the model during the evaluation process.
| Property | Type | Description |
| :--- | :--- | :--- |
| provider | string | The hosting provider (e.g., openai, anthropic, custom). |
| modelName | string | The specific model identifier. |
| parameters | object | Hyperparameters like temperature, max_tokens, or top_p. |
| endpointUrl | string | (Optional) Custom API endpoint for private deployments. |
{
"provider": "custom",
"modelName": "supervised-ai-finetune-v2",
"parameters": {
"temperature": 0.2,
"max_tokens": 512
}
}
TestResult
The TestResult object is generated for every individual DataEntry processed during an evaluation. It maps the model's actual response against the expected output.
| Property | Type | Description |
| :--- | :--- | :--- |
| testCaseId | string | Reference to the input DataEntry. |
| actualOutput | any | The raw response received from the model. |
| scores | Map<string, number> | Calculated metrics (e.g., accuracy, latency, f1_score). |
| error | string | (Optional) Error message if the specific test case failed. |
{
"testCaseId": "case_001",
"actualOutput": "The sentiment is positive.",
"scores": {
"exact_match": 1.0,
"semantic_similarity": 0.98,
"latency_ms": 145
}
}
EvaluationSummary
A high-level aggregation of results once an evaluation process is finished. This model is typically used for dashboarding and reporting within the Supervised AI platform.
| Property | Type | Description |
| :--- | :--- | :--- |
| totalCases | number | Total number of test cases processed. |
| passedCases | number | Number of cases meeting the success criteria. |
| aggregateMetrics | object | Mean, median, and percentiles for all scores. |
| duration | number | Total time taken for the evaluation in milliseconds. |