The Analytics Store¶
This notebook introduces the analytics store — a structured storage layer that makes capability run results queryable via SQL.
We cover why the store exists, how to use and extend it, and the on-disk file layout.
In local synchronous execution, compute/write/read all happen in one process. In distributed job submission, those responsibilities split across client and worker processes, so the client must explicitly pass analytics-store configuration (configure_job_backend(..., analytics_store=...)) to tell workers where to write durable data that the client will later read.
See Analytics store in distributed execution and Job backend configuration.
Part 1: Why¶
The problem¶
When capabilities run, they produce rich Python objects — numpy arrays, nested dicts, per-image statistics, binary tensors. These are useful for programmatic access, but they aren't queryable:
- No cross-run queries. To answer "which of my CIFAR-10 runs had the highest accuracy?", you'd need to load every run object, check its type and dataset, and extract the metric manually. There's no index, no schema, and no way to filter without loading everything.
- Not accessible outside Python. Run results contain Python-specific structures (class paths, binary blob references) that only
checkmaitecan resolve. A data analyst with DuckDB or a BI dashboard cannot query them. - Too much data for comparison. A
DataevalCleaningRunincludes raw per-image dimension arrays, per-image visual statistics, full outlier dictionaries, etc. Most cross-run analysis only needs aggregate summaries: "how many duplicates?", "what was the mean brightness?", "what was the class imbalance ratio?"
What the analytics store provides¶
The analytics store is the public API for querying capability results across runs. It provides:
- Flat, typed tables — one per capability — where each row is a scalar-only summary of a run. When a run produces variable-length data (e.g. multiple metrics), multiple rows are created. No nested objects, no binary references, no Python class paths.
- SQL as the primary query interface. Filter, join, aggregate across runs and across capabilities using standard SQL. Every
StorageBackendimplementation exposes aquery_sql()method. - Plain Parquet files on disk with the default backend. Readable by DuckDB, Spark, pandas, pyarrow, Snowflake, or any other tool that speaks Parquet — no Python required. Other backends (DuckDB, Postgres, etc.) provide their own native access paths.
- A
StorageBackendprotocol that allows swapping Parquet for DuckDB, Delta Lake, Postgres, etc. when scale demands it.
Beyond local: platform integration¶
Note: This notebook demonstrates local
store.write(...)usage. In job submission mode, workers can be remote and must be told where to write analytics data.
In local synchronous execution, the process that computes the run already knows:
- where the analytics store lives,
- how to write to it,
- and how to read it back later.
In distributed job submission, those responsibilities are split:
- the client chooses the durable store location,
- the worker needs that information so it can persist results,
- and the client later needs a stable way to find/read the payload data the worker wrote.
That is why configure_job_backend(...) requires explicit analytics-store configuration:
from checkmaite.jobs import configure_job_backend
configure_job_backend(
"ray",
analytics_store={
"backend": "parquet",
"uri": "./analytics_store",
},
)
checkmaite.jobs._store defines the typed AnalyticsStoreConfig; the job backend then forwards that config with each submission so workers build/write to the intended store location. See Analytics store in distributed execution and Job backend configuration.
The store's design — scalar-only Parquet tables behind a swappable backend protocol — also opens a path to platform-level integration.
Consider a Databricks deployment:
- Store Parquet files on cloud storage (S3, ADLS, GCS) and they are immediately queryable as Delta Lake tables, external tables, or via
read_parquet()in Databricks SQL or Spark — no Python glue needed. - Implement a
DeltaLakeBackend(or aDatabricksBackendusing Unity Catalog) and the store writes directly into managed tables. Capability results become first-class catalog objects: discoverable, governed, and queryable by any team member with SQL access — not just the Python users who ran the capabilities. - Cross-team analytics become possible. A data scientist runs capabilities locally or in a Databricks notebook; an ML engineer queries the results via Databricks SQL; a program manager builds a dashboard on the same tables. Everyone operates on the same structured data without any custom export step.
None of this requires changes to the store API or to capability extract() implementations — only a new StorageBackend.
Key design decisions¶
These decisions were taken deliberately and inform everything that follows.
| Decision | Rationale |
|---|---|
| One table per capability | For example, there are tables named dataeval_cleaning, maite_evaluation, etc. Each table has a fixed schema defined by a BaseRecord subclass for each capability. Capabilities produce structurally different outputs, so separate tables with distinct schemas are the natural representation. |
| Flat, scalar-only records | Every field must be str, int, float, bool, bytes, datetime, or Optional variants. No lists, dicts, or nested models. This guarantees that every record maps to a Parquet/SQL row without transformation. Variable-length data (e.g. per-metric results) uses multiple records instead. |
StorageBackend protocol |
The store doesn't know or care how data is persisted. The default Parquet backend is the simplest thing that works. When you outgrow it, swap in a DuckDB, Delta Lake, or Databricks backend without changing any store or record code. |
| Append-only, immutable writes | Run results are historical facts. Each write() call persists records via the configured backend. No updates, no deletes. Erroneous runs are handled by re-running and filtering in queries. |
| Idempotent writes (across calls) | Writing the same run_uid twice across separate write() calls is a no-op (deduplicated by run_uid). Safe for notebook re-execution. Note: deduplication is checked against previously written files — duplicate run_uid values within a single write() call are not deduplicated. |
Automatic runs table |
Maps every run_uid to its datasets, models, and metrics. Capability authors don't manage this — the store writes it automatically. |
What the store is NOT¶
- Not an experiment tracker. There is no tagging, no run comparison UI, no artifact storage. It is a structured data layer that those tools can be built on top of.
Part 2: How¶
Architecture overview¶
capability.run()
│
└── returns CapabilityRunBase
│
│ .extract()
▼
[BaseRecord, ...] ── scalar summaries
│
▼
AnalyticsStore.write()
│ │
▼ ▼
capability runs table
records (auto-generated)
│ │
▼ ▼
StorageBackend.write()
│
┌───────┼────────┐
▼ ▼ ▼
Parquet DuckDB Postgres
(default) (future) (future)
The store is populated explicitly by calling store.write([run1, run2]), which invokes each run's extract() method to produce flat records.
Setup¶
Let's create a store backed by a temporary Parquet directory.
import tempfile
from pathlib import Path
from checkmaite.core.analytics_store import AnalyticsStore, ParquetBackend
# In practice you'd use a persistent path like "./analytics_store"
store_dir = tempfile.mkdtemp(prefix="analytics_store_")
store = AnalyticsStore(ParquetBackend(store_dir))
print(f"Store path: {store_dir}")
Store path: /tmp/analytics_store_20fvb7wc
/home/runner/work/checkmaite/checkmaite/.venv/lib/python3.10/site-packages/xaitk_saliency/__init__.py:3: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81. import pkg_resources
Writing records directly (understanding the primitives)¶
Before using the full store.write(runs) workflow, let's see what records look like and how the backend works. This makes the abstractions concrete.
A BaseRecord subclass defines a table schema. Every field must be a scalar type — this is enforced at class definition time.
from checkmaite.core.analytics_store import BaseRecord
# A valid record: all fields are scalar
class ExampleRecord(BaseRecord, table_name="example"):
dataset_id: str
score: float
sample_count: int
notes: str | None = None
record = ExampleRecord(
run_uid="abc123",
dataset_id="cifar10",
score=0.95,
sample_count=50000,
)
print(record)
print(f"\nTable: {record.table_name}")
print(f"Serialised: {record.model_dump(mode='python')}")
run_uid='abc123' created_at=datetime.datetime(2026, 5, 22, 18, 32, 15, 820380, tzinfo=datetime.timezone.utc) dataset_id='cifar10' score=0.95 sample_count=50000 notes=None
Table: example
Serialised: {'run_uid': 'abc123', 'created_at': datetime.datetime(2026, 5, 22, 18, 32, 15, 820380, tzinfo=datetime.timezone.utc), 'dataset_id': 'cifar10', 'score': 0.95, 'sample_count': 50000, 'notes': None}
# This will fail: list is not a scalar type
try:
class BadRecord(BaseRecord, table_name="bad"):
tags: list[str] # NOT allowed
except TypeError as e:
print(f"Rejected: {e}")
Rejected: Field 'tags' on BadRecord uses non-scalar type list[str]. BaseRecord subclasses must use only flat types (str, int, float, bool, bytes, datetime, or Optional variants). If you need variable-length data, return multiple records from extract().
The scalar constraint is what makes every record directly queryable via SQL — no binary blobs, no nested structures, no Python-specific references.
If a capability produces variable-length data (e.g. one metric value per class), the extract() method returns multiple records — one per logical row. This is the Entity-Attribute-Value (EAV) pattern.
Writing and querying via the backend¶
The ParquetBackend handles file layout and SQL execution. Let's write some records and query them.
backend = ParquetBackend(store_dir)
# Write records from two different "runs"
backend.write([
ExampleRecord(run_uid="run_1", dataset_id="cifar10", score=0.92, sample_count=50000),
ExampleRecord(run_uid="run_2", dataset_id="mnist", score=0.98, sample_count=60000),
ExampleRecord(run_uid="run_3", dataset_id="cifar10", score=0.94, sample_count=50000, notes="augmented"),
])
print("Tables:", backend.list_tables())
print("\nSchema:", backend.describe_table("example"))
Tables: ['example']
Schema: {'run_uid': 'String', 'created_at': "Datetime(time_unit='us', time_zone='UTC')", 'dataset_id': 'String', 'score': 'Float64', 'sample_count': 'Int64', 'notes': 'String'}
# SQL queries work directly
df = backend.query_sql("SELECT dataset_id, score, notes FROM example ORDER BY score DESC")
print(df)
shape: (3, 3) ┌────────────┬───────┬───────────┐ │ dataset_id ┆ score ┆ notes │ │ --- ┆ --- ┆ --- │ │ str ┆ f64 ┆ str │ ╞════════════╪═══════╪═══════════╡ │ mnist ┆ 0.98 ┆ null │ │ cifar10 ┆ 0.94 ┆ augmented │ │ cifar10 ┆ 0.92 ┆ null │ └────────────┴───────┴───────────┘
# Aggregation across runs for the same dataset
df = backend.query_sql("""
SELECT dataset_id, COUNT(*) AS run_count, AVG(score) AS avg_score
FROM example
GROUP BY dataset_id
""")
print(df)
shape: (2, 3) ┌────────────┬───────────┬───────────┐ │ dataset_id ┆ run_count ┆ avg_score │ │ --- ┆ --- ┆ --- │ │ str ┆ u32 ┆ f64 │ ╞════════════╪═══════════╪═══════════╡ │ mnist ┆ 1 ┆ 0.98 │ │ cifar10 ┆ 2 ┆ 0.93 │ └────────────┴───────────┴───────────┘
# Idempotent writes — writing the same run_uid again is a no-op
backend.write([
ExampleRecord(run_uid="run_1", dataset_id="cifar10", score=0.92, sample_count=50000),
])
count = backend.query_sql("SELECT COUNT(*) AS n FROM example")
print(f"Still 3 rows (not 4): {count}")
Still 3 rows (not 4): shape: (1, 1) ┌─────┐ │ n │ │ --- │ │ u32 │ ╞═════╡ │ 3 │ └─────┘
Without the store, getting the same result would require keeping every capability run result in memory, filtering by capability type and dataset, extracting the metric value, and aggregating manually. The store makes this a one-line SQL query. Additionally, the Parquet backend leverages columnar storage — queries that touch only a subset of columns read only those columns from disk, avoiding full-file scans.
Schema evolution¶
The Parquet backend supports adding and removing fields over time. This is a property of how the backend reads Parquet files: it uses Polars' diagonal_relaxed concatenation, which aligns columns by name and fills missing columns with null. Future backend implementations should provide equivalent behaviour.
- Adding a field: Old data gets
Nonefor the new column. - Removing a field: Old data retains the column; new records simply don't populate it.
- Renaming or changing types: Not supported — requires manual migration.
# Simulate schema evolution: add an "augmented" boolean field
class ExampleRecordV2(BaseRecord, table_name="example"): # Same table name!
dataset_id: str
score: float
sample_count: int
notes: str | None = None
augmented: bool | None = None # New field
backend.write([
ExampleRecordV2(run_uid="run_4", dataset_id="svhn", score=0.89, sample_count=73000, augmented=True),
])
# Old rows have None for 'augmented'; new row has the value
df = backend.query_sql("SELECT dataset_id, score, augmented FROM example ORDER BY dataset_id")
print(df)
shape: (4, 3) ┌────────────┬───────┬───────────┐ │ dataset_id ┆ score ┆ augmented │ │ --- ┆ --- ┆ --- │ │ str ┆ f64 ┆ bool │ ╞════════════╪═══════╪═══════════╡ │ cifar10 ┆ 0.92 ┆ null │ │ cifar10 ┆ 0.94 ┆ null │ │ mnist ┆ 0.98 ┆ null │ │ svhn ┆ 0.89 ┆ true │ └────────────┴───────┴───────────┘
The full workflow: store.write(runs)¶
In normal usage, you don't create records manually. If you're writing a new capability, you run the capability, then pass the run objects to store.write(). The store calls each run's extract() method and auto-populates the runs metadata table.
Let's trace what happens step by step.
Step 1: extract() projects capability outputs into flat records¶
Each CapabilityRunBase subclass implements extract() to select and aggregate the fields from its outputs that are useful for cross-run comparison. extract() produces a curated summary, not a full serialisation.
DataevalCleaningRun.extract() returns a single record per run. The full output contains raw per-image arrays (widths, heights, brightness values, outlier dictionaries, etc.), but the record distils these into ~20 aggregate scalars:
DataevalCleaningRecord(
run_uid=self.run_uid,
dataset_id="cifar10",
exact_duplicate_count=12,
exact_duplicate_ratio=0.00024,
image_outlier_count=47,
image_outlier_ratio=0.00094,
mean_width=480.0,
mean_brightness=0.48,
class_imbalance_ratio=1.0,
... # ~10 more scalar fields
)
MaiteEvaluationRun.extract() returns multiple records — one per output of Metric.compute(). The full output stores all metrics in a single dict; the store unpacks them into separate records for SQL filtering:
[
MaiteEvaluationRecord(run_uid=..., dataset_id="cifar10", model_id="resnet50",
metric_id="coco_metrics", output_key="map50",
output_value=0.45, scope="overall"),
MaiteEvaluationRecord(run_uid=..., dataset_id="cifar10", model_id="resnet50",
metric_id="coco_metrics", output_key="map75",
output_value=0.32, scope="overall"),
# Plus one record per class if class_metrics were computed:
MaiteEvaluationRecord(run_uid=..., dataset_id="cifar10", model_id="resnet50",
metric_id="coco_metrics", output_key="map50",
output_value=0.52, scope="class", class_name="person"),
MaiteEvaluationRecord(run_uid=..., dataset_id="cifar10", model_id="resnet50",
metric_id="coco_metrics", output_key="map50",
output_value=0.38, scope="class", class_name="car"),
]
DataevalFeasibilityRun.extract() returns a single record per run. The IC variant populates BER bounds only; the OD variant also includes instance counts and dataset health statistics:
# IC run (simple — two scalar BER bounds)
DataevalFeasibilityRecord(
run_uid=self.run_uid,
dataset_id="cifar10",
ber_upper=0.15,
ber_lower=0.08,
)
# OD run (includes health stats)
DataevalFeasibilityRecord(
run_uid=self.run_uid,
dataset_id="coco-val",
ber_upper=0.25,
ber_lower=0.12,
num_instances=500,
num_classes=10,
small_object_ratio=0.05,
truncated_bbox_ratio=0.03,
overlap_image_ratio=0.02,
health_warning_count=0,
)
DataevalShiftRun.extract() returns a single record per run. Shift is a two-dataset capability, so it uses reference_dataset_id and evaluation_dataset_id instead of the single dataset_id convention. Drift test results are stored as direct scalars; OOD per-sample arrays are aggregated into summary statistics:
DataevalShiftRecord(
run_uid=self.run_uid,
reference_dataset_id="coco-train",
evaluation_dataset_id="coco-val",
# Drift: 3 tests x (drifted, distance, p_val, threshold) + per-feature counts for CVM/KS
mmd_drifted=True,
mmd_distance=0.45,
mmd_p_val=0.01,
mmd_threshold=0.05,
cvm_drifted=True,
cvm_distance=0.38,
cvm_p_val=0.02,
cvm_threshold=0.005,
cvm_feature_drift_count=12,
ks_drifted=False,
ks_distance=0.12,
ks_p_val=0.15,
ks_threshold=0.005,
ks_feature_drift_count=3,
# OOD: aggregated from per-sample arrays
ood_count=3,
ood_total=50,
ood_ratio=0.06,
ood_mean_instance_score=0.72,
ood_std_instance_score=0.15,
ood_max_instance_score=1.05,
)
NrtkRobustnessRun.extract() returns multiple records — one per (theta_value, metric_key) pair. This is the same Entity-Attribute-Value pattern used by MaiteEvaluationRun. The is_primary flag marks rows for the capability's return_key metric:
[
NrtkRobustnessRecord(
run_uid=self.run_uid,
dataset_id="cifar10",
model_id="resnet50",
metric_id="coco_metrics",
perturber_class="BrightnessPerturber",
perturber_type="Brightness Perturber",
theta_key="factor",
theta_index=0,
theta_value=1.0,
metric_key="accuracy",
metric_value=0.95,
is_primary=True,
),
NrtkRobustnessRecord(
run_uid=self.run_uid,
dataset_id="cifar10",
model_id="resnet50",
metric_id="coco_metrics",
perturber_class="BrightnessPerturber",
perturber_type="Brightness Perturber",
theta_key="factor",
theta_index=0,
theta_value=1.0,
metric_key="f1_score",
metric_value=0.90,
is_primary=False,
),
# ... one record per (theta, metric_key) pair
]
Step 2: The runs table is auto-populated¶
When you call store.write([run1, run2]), the store also writes rows to the runs table — one row per (run_uid, entity_type, entity_id) combination:
| run_uid | capability_id | capability_table | entity_type | entity_id |
|---|---|---|---|---|
| a1b2... | checkmaite.core.DataevalCleaning | dataeval_cleaning | dataset | cifar10 |
| c3d4... | checkmaite.core.MaiteEvaluation | maite_evaluation | dataset | cifar10 |
| c3d4... | checkmaite.core.MaiteEvaluation | maite_evaluation | model | resnet50 |
| c3d4... | checkmaite.core.MaiteEvaluation | maite_evaluation | metric | map50 |
This table enables cross-capability queries filtered by any entity:
-- Find all capability tables that have results for a specific dataset
SELECT DISTINCT capability_table
FROM runs
WHERE entity_type = 'dataset' AND entity_id = 'cifar10'
Querying across capability runs¶
The two primary query patterns are:
1. Direct JOIN via dataset_id (single-dataset capabilities)
Both DataevalCleaningRecord and MaiteEvaluationRecord include a dataset_id field. This enables direct joins:
-- Correlate dataset quality with model accuracy
SELECT
d.dataset_id,
d.exact_duplicate_ratio,
d.image_outlier_ratio,
m.output_value AS accuracy
FROM dataeval_cleaning d
JOIN maite_evaluation m ON d.dataset_id = m.dataset_id
WHERE m.output_key = 'accuracy' AND m.scope = 'overall'
2. JOIN via the runs table (general case)
When you need to filter by model, metric, or any other entity:
-- Get all evaluation results for a specific model
SELECT e.*
FROM maite_evaluation e
JOIN runs r ON e.run_uid = r.run_uid
WHERE r.entity_type = 'model' AND r.entity_id = 'resnet50'
3. Correlate feasibility with dataset quality
-- Compare BER with cleaning metrics for each dataset
SELECT
f.dataset_id,
f.ber_upper,
f.ber_lower,
c.exact_duplicate_ratio,
c.image_outlier_ratio
FROM dataeval_feasibility f
JOIN dataeval_cleaning c ON f.dataset_id = c.dataset_id
Correlate drift with dataset feasibility
-- Compare drift detection with BER for the reference dataset
SELECT
s.reference_dataset_id,
s.mmd_drifted,
s.mmd_p_val,
f.ber_upper,
f.ber_lower
FROM dataeval_shift s
JOIN dataeval_feasibility f ON s.reference_dataset_id = f.dataset_id
Query robustness curves alongside dataset quality
-- Correlate model robustness with dataset cleaning metrics
SELECT
n.model_id,
n.perturber_type,
MIN(n.metric_value) AS worst_score,
c.image_outlier_ratio
FROM nrtk_robustness n
JOIN dataeval_cleaning c ON n.dataset_id = c.dataset_id
WHERE n.is_primary = true
GROUP BY n.model_id, n.perturber_type, c.image_outlier_ratio
Using the store from Python (Polars DataFrames)¶
query_sql() returns a Polars DataFrame, so you can chain SQL with Polars operations:
# SQL for filtering, Polars for transformation
df = backend.query_sql("SELECT * FROM example WHERE dataset_id = 'cifar10'")
# Continue with Polars API
print(df.select("score").describe())
shape: (9, 2) ┌────────────┬──────────┐ │ statistic ┆ score │ │ --- ┆ --- │ │ str ┆ f64 │ ╞════════════╪══════════╡ │ count ┆ 2.0 │ │ null_count ┆ 0.0 │ │ mean ┆ 0.93 │ │ std ┆ 0.014142 │ │ min ┆ 0.92 │ │ 25% ┆ 0.92 │ │ 50% ┆ 0.94 │ │ 75% ┆ 0.94 │ │ max ┆ 0.94 │ └────────────┴──────────┘
Using the store from external SQL tools¶
Note: This section is specific to the
ParquetBackend. Other backends (e.g. a future DuckDB or SQL database backend) would provide their own native access paths.
Because the Parquet backend writes plain Parquet files with only scalar columns, any tool that reads Parquet can query the store directly — no Python required.
DuckDB (CLI or any SQL client):
-- Point DuckDB at the store directory
SELECT * FROM read_parquet('./analytics_store/dataeval_cleaning/*.parquet');
-- Cross-capability join
SELECT d.dataset_id, d.exact_duplicate_ratio, m.output_value
FROM read_parquet('./analytics_store/dataeval_cleaning/*.parquet') d
JOIN read_parquet('./analytics_store/maite_evaluation/*.parquet') m
ON d.dataset_id = m.dataset_id
WHERE m.output_key = 'accuracy';
The store files are self-contained Parquet with native types — any Parquet reader in any language can consume them.
from checkmaite.core.analytics_store import BaseRecord
# Step 1: Define the record class
class MyCapabilityRecord(BaseRecord, table_name="my_capability"):
# Convention: include dataset_id for cross-capability JOINs
dataset_id: str
# Capability-specific fields (all must be scalar)
primary_metric: float
sample_count: int
status: str # e.g. "pass" or "fail"
# Step 2: Override extract() on the run class
# (shown as pseudocode — in practice this goes on your CapabilityRunBase subclass)
#
# def extract(self) -> list[MyCapabilityRecord]:
# return [
# MyCapabilityRecord(
# run_uid=self.run_uid,
# dataset_id=self.dataset_metadata[0]["id"],
# primary_metric=self.outputs.some_value,
# sample_count=len(self.outputs.results),
# status="pass" if self.outputs.some_value > 0.9 else "fail",
# )
# ]
print("Record class is valid:", MyCapabilityRecord.table_name)
Record class is valid: my_capability
Design guidance for extract():
- Summarise, don't serialise. The record should contain what an analyst needs to filter, group, and compare — not a dump of the full output.
- One record per logical entity for fixed-schema outputs (e.g.
DataevalCleaningreturns one record per dataset). - One record per variable-length item for EAV-style outputs (e.g.
MaiteEvaluationreturns one record per metric output). - Include
dataset_idif the capability operates on a single dataset. This is the primary JOIN key across capabilities. - Use
Optionalfor fields that may not always be present (e.g. target outliers only exist for object detection).
Implementing a custom StorageBackend¶
The StorageBackend protocol has four methods. Any class implementing them can replace ParquetBackend:
from collections.abc import Sequence
import polars as pl
from checkmaite.core.analytics_store import BaseRecord, StorageBackend
class DuckDBBackend:
"""Example: a DuckDB-backed storage backend."""
def __init__(self, db_path: str) -> None:
import duckdb
self.conn = duckdb.connect(db_path)
def write(self, records: Sequence[BaseRecord]) -> None:
# Group by table, create table if needed, INSERT
...
def list_tables(self) -> list[str]:
# SELECT table_name FROM information_schema.tables
...
def describe_table(self, table_name: str) -> dict[str, str]:
# DESCRIBE {table_name}
...
def query_sql(self, sql: str) -> pl.DataFrame:
# Execute SQL, return as Polars DataFrame
return self.conn.execute(sql).pl()
# Usage is identical:
# store = AnalyticsStore(DuckDBBackend("./analytics.duckdb"))
# store.write([run1, run2])
# store.query_sql("SELECT ...")
This is the intended scale pathway. The Parquet backend is the starting point; DuckDB, Delta Lake, or a SQL database is the destination when you need transactions, concurrency, or better query performance.
Part 4: File layout and external access¶
The Parquet backend produces this directory structure:
analytics_store/
dataeval_cleaning/
1706000000000_a1b2c3d4.parquet
1706000060000_e5f6a7b8.parquet
maite_evaluation/
1706000000000_c9d0e1f2.parquet
runs/
1706000000000_g3h4i5j6.parquet
Each write() call creates one file per table. File names are {timestamp_ms}_{uuid_8char}.parquet for uniqueness and chronological sorting.
The files are plain Parquet with scalar columns only — no custom metadata, no manifest files, no lock files, no binary references. Any tool that reads Parquet (DuckDB, Spark, pandas, pyarrow, Snowflake external tables) can read them directly.
# Inspect the files on disk
for p in sorted(Path(store_dir).rglob("*.parquet")):
print(f" {p.relative_to(store_dir)} ({p.stat().st_size:,} bytes)")
example/1779474735838_0ec8e733.parquet (2,337 bytes) example/1779474735871_078c806a.parquet (2,365 bytes)
Part 5: Record Schema Reference¶
Each capability that supports the analytics store defines a record class with scalar-only fields. All records share two common fields from BaseRecord:
| Field | Type | Description |
|---|---|---|
run_uid |
str | SHA-256 hash linking to the capability run |
created_at |
datetime | Timestamp when the record was created (auto-generated) |
Use store.describe_table("table_name") at runtime to inspect the schema of any table, or store.list_tables() to see which tables have data.
dataeval_cleaning¶
One record per dataset. Summarises dataset quality: duplicates, outliers, visual properties, and class balance.
| Field | Type | Description |
|---|---|---|
dataset_id |
str | Dataset identifier (cross-capability JOIN key) |
exact_duplicate_count |
int | Number of exact duplicate images |
exact_duplicate_ratio |
float | Fraction of exact duplicates |
near_duplicate_count |
int | Number of near-duplicate images |
near_duplicate_ratio |
float | Fraction of near duplicates |
image_outlier_count |
int | Number of image-level outliers |
image_outlier_ratio |
float | Fraction of image outliers |
class_count |
int | Number of unique classes |
label_count |
int | Total label count across all images |
image_count |
int | Total number of images |
target_outlier_count |
int | None | Target-level outlier count (OD only) |
target_outlier_ratio |
float | None | Fraction of target outliers (OD only) |
mean_width |
float | Mean image width |
mean_height |
float | Mean image height |
std_aspect_ratio |
float | Standard deviation of aspect ratios |
mean_brightness |
float | Mean image brightness |
mean_contrast |
float | Mean image contrast |
mean_sharpness |
float | Mean image sharpness |
class_imbalance_ratio |
float | Ratio of largest to smallest class |
min_class_image_count |
int | Smallest class size |
max_class_image_count |
int | Largest class size |
mean_labels_per_image |
float | Average labels per image |
dataeval_bias¶
One record per dataset. Summarises coverage, balance, and diversity metrics. Balance and diversity fields are None when the dataset has no usable metadata factors.
| Field | Type | Description |
|---|---|---|
dataset_id |
str | Dataset identifier (cross-capability JOIN key) |
coverage_total |
int | Total number of images |
coverage_uncovered_count |
int | Number of under-represented images |
coverage_uncovered_ratio |
float | Fraction of uncovered images |
coverage_radius |
float | Coverage radius used for detection |
balance_num_factors |
int | None | Number of metadata factors analysed |
balance_mean |
float | None | Mean balance score across factors |
balance_max |
float | None | Maximum balance score |
balance_factors_above_05 |
int | None | Factors with balance >= 0.5 |
diversity_num_factors |
int | None | Number of diversity factors |
diversity_mean |
float | None | Mean diversity index |
diversity_min |
float | None | Minimum diversity index |
diversity_factors_below_04 |
int | None | Factors with diversity < 0.4 |
maite_evaluation¶
One record per (output_key, scope) pair. Stores metric results in Entity-Attribute-Value format, with optional per-class breakdown.
| Field | Type | Description |
|---|---|---|
dataset_id |
str | Dataset identifier (cross-capability JOIN key) |
model_id |
str | Model identifier |
metric_id |
str | Metric identifier |
output_key |
str | Metric output key (e.g., "accuracy", "map50") |
output_value |
float | Metric value |
scope |
str | "overall" or "class" |
class_name |
str | None | Class name (when scope is "class") |
dataeval_feasibility¶
One record per dataset. Stores Bayes Error Rate bounds. OD-specific health stats are None for IC runs.
| Field | Type | Description |
|---|---|---|
dataset_id |
str | Dataset identifier (cross-capability JOIN key) |
ber_upper |
float | Upper bound on Bayes Error Rate |
ber_lower |
float | Lower bound on Bayes Error Rate |
num_instances |
int | None | Total valid instance crops (OD only) |
num_classes |
int | None | Number of unique classes (OD only) |
small_object_ratio |
float | None | Fraction of small objects (OD only) |
truncated_bbox_ratio |
float | None | Fraction of boundary-touching boxes (OD only) |
overlap_image_ratio |
float | None | Fraction of images with high-IoU box pairs (OD only) |
health_warning_count |
int | None | Number of health warnings (OD only) |
dataeval_shift¶
One record per run. Stores drift detection and OOD summary metrics. Uses two dataset IDs (reference and evaluation) instead of the single dataset_id convention.
| Field | Type | Description |
|---|---|---|
reference_dataset_id |
str | Reference (baseline) dataset identifier |
evaluation_dataset_id |
str | Evaluation (test) dataset identifier |
mmd_drifted |
bool | Whether MMD detected drift |
mmd_distance |
float | MMD test statistic |
mmd_p_val |
float | MMD p-value |
mmd_threshold |
float | MMD significance threshold |
cvm_drifted |
bool | Whether CVM detected drift |
cvm_distance |
float | CVM mean test statistic |
cvm_p_val |
float | CVM combined p-value |
cvm_threshold |
float | CVM significance threshold |
cvm_feature_drift_count |
int | Number of individually drifted features (CVM) |
ks_drifted |
bool | Whether KS detected drift |
ks_distance |
float | KS mean test statistic |
ks_p_val |
float | KS combined p-value |
ks_threshold |
float | KS significance threshold |
ks_feature_drift_count |
int | Number of individually drifted features (KS) |
ood_count |
int | Number of OOD samples in evaluation set |
ood_total |
int | Total samples in evaluation set |
ood_ratio |
float | Fraction of OOD samples |
ood_mean_instance_score |
float | Mean OOD instance score |
ood_std_instance_score |
float | Std dev of OOD instance scores |
ood_max_instance_score |
float | Maximum OOD instance score |
nrtk_robustness¶
One record per (theta_value, metric_key) pair. Stores per-perturbation-point metric values in Entity-Attribute-Value format, enabling full robustness curve reconstruction via SQL.
| Field | Type | Description |
|---|---|---|
dataset_id |
str | Dataset identifier (cross-capability JOIN key) |
model_id |
str | Model identifier |
metric_id |
str | Metric identifier |
perturber_class |
str | Perturber class name (e.g., "BrightnessPerturber") |
perturber_type |
str | Human-readable perturber label (e.g., "Brightness Perturber") |
theta_key |
str | Perturbation parameter name (e.g., "factor", "ksize") |
theta_index |
int | Ordinal position in the sweep (0-based) |
theta_value |
float | Parameter value at this perturbation level |
metric_key |
str | Metric output key (e.g., "accuracy", "f1_score") |
metric_value |
float | Score at this perturbation level |
is_primary |
bool | True when metric_key matches the capability's return_key |
Summary¶
| Aspect | Detail |
|---|---|
| Purpose | Query and compare capability results across runs |
| Content | Curated scalar summaries from extract() |
| Format | Scalar columns only (Parquet by default; pluggable via StorageBackend) |
| Query interface | SQL, Polars, any Parquet reader |
| Cross-run queries | Native (GROUP BY, JOIN, WHERE) |
| Non-Python access | Any Parquet-capable tool (DuckDB, Spark, etc.) |
| Populated by | store.write([run1, ...]) (explicit) |
| Deduplicated by | run_uid (on write) |
| Extensibility | StorageBackend protocol — swap Parquet for DuckDB, Delta Lake, Postgres, etc. |