Mock Data Generation
Overview
The testing-api provides a suite of utilities designed to generate synthetic data that mirrors the Supervised AI platform's data structures. These tools allow developers to simulate edge cases, perform load testing, and validate UI components without relying on production databases or manual data entry.
Mock Data Factory
The primary interface for creating synthetic records is the MockFactory class. It allows for the generation of single entities or large batches based on predefined schemas that align with the platform's core entities.
Basic Usage
To generate a single mock object, use the create() method. To generate multiple records, use the create_batch() method.
from testing_api import MockFactory
# Generate a single mock user profile
user = MockFactory.create(schema="user")
# Generate 50 mock dataset entries
datasets = MockFactory.create_batch(schema="dataset", count=50)
Supported Data Schemas
The generator supports several specific schemas relevant to the Supervised AI ecosystem:
| Schema Name | Description | Key Fields Included |
| :--- | :--- | :--- |
| user | Platform user profile | uuid, email, role, api_key |
| dataset | Metadata for an AI training set | id, name, sample_count, created_at |
| model_output | Simulated inference results | prediction_id, confidence_score, label |
| audit_log | System activity logs | timestamp, action_type, user_id |
API Reference
MockFactory.create(schema, overrides=None)
Generates a single dictionary representing a platform entity.
- Parameters:
schema(string): The identifier for the data structure (e.g.,"user").overrides(dict, optional): Specific key-value pairs to overwrite default generated values.
- Returns:
dict- A populated data object.
MockFactory.create_batch(schema, count, seed=None)
Generates a list of mock entities.
- Parameters:
schema(string): The identifier for the data structure.count(int): Number of records to generate.seed(int, optional): A seed value to ensure deterministic, reproducible output for automated testing.
- Returns:
list[dict]- A list of populated data objects.
Deterministic Data Generation
For regression testing, it is often necessary to generate the same "random" data across multiple test runs. You can provide a seed to the generator to ensure consistency.
# This will always produce the same set of data
stable_data = MockFactory.create_batch(
schema="model_output",
count=10,
seed=42
)
Customizing Mock Data
If the default schema does not cover a specific test scenario, use the overrides parameter to inject specific values while keeping the rest of the object randomized.
# Create a user specifically with an 'Admin' role
admin_user = MockFactory.create(
schema="user",
overrides={"role": "admin", "is_verified": True}
)
Internal Providers
While the MockFactory is the public entry point, it utilizes internal Data Providers to map fields to specific data types (e.g., standardizing how UUIDs or timestamps are formatted across the API). Users generally do not need to interact with these providers directly, as they are managed via the schema argument.