Ray Simple Job Submission Tutorial¶
A short guide to running a checkmaite capability asynchronously with the lightweight ray-simple job backend.
When to use this backend¶
Use ray-simple for local notebooks, demos, and single-driver workflows where you want one Ray task per submitted capability run.
Important assumptions:
- job handles live only in the current Python process;
- every submission creates a new Ray task;
- completed results are written to the configured analytics store before
job.result()succeeds.
Setup¶
In [1]:
Copied!
import tempfile
import uuid
from pathlib import Path
from checkmaite.core.analytics_store import AnalyticsStore, ParquetBackend
from checkmaite.core.object_detection import DataevalCleaning
from checkmaite.core.object_detection.dataset_loaders import CocoDetectionDataset
from checkmaite.jobs import (
JobStatus,
configure_job_backend,
get_job,
list_jobs,
shutdown_job_backend,
submit_capability,
)
# Find the repository root whether the notebook is run from docs/tool-usage or elsewhere.
REPO_ROOT = next(
path for path in [Path.cwd(), *Path.cwd().parents]
if (path / "pyproject.toml").exists()
)
# Use the tiny COCO fixture included with the repository.
dataset_root = REPO_ROOT / "tests/data_for_tests/coco_dataset"
dataset_ann = dataset_root / "ann_file.json"
dataset = CocoDetectionDataset(
root=str(dataset_root),
ann_file=str(dataset_ann),
dataset_id="coco-job-tutorial",
)
store_dir = Path(tempfile.mkdtemp(prefix="checkmaite_jobs_")) / "analytics_store"
analytics_store_config = {"backend": "parquet", "uri": str(store_dir)}
print(f"Dataset: {dataset.metadata['id']} ({len(dataset)} images)")
print(f"Analytics store: {store_dir}")
import tempfile
import uuid
from pathlib import Path
from checkmaite.core.analytics_store import AnalyticsStore, ParquetBackend
from checkmaite.core.object_detection import DataevalCleaning
from checkmaite.core.object_detection.dataset_loaders import CocoDetectionDataset
from checkmaite.jobs import (
JobStatus,
configure_job_backend,
get_job,
list_jobs,
shutdown_job_backend,
submit_capability,
)
# Find the repository root whether the notebook is run from docs/tool-usage or elsewhere.
REPO_ROOT = next(
path for path in [Path.cwd(), *Path.cwd().parents]
if (path / "pyproject.toml").exists()
)
# Use the tiny COCO fixture included with the repository.
dataset_root = REPO_ROOT / "tests/data_for_tests/coco_dataset"
dataset_ann = dataset_root / "ann_file.json"
dataset = CocoDetectionDataset(
root=str(dataset_root),
ann_file=str(dataset_ann),
dataset_id="coco-job-tutorial",
)
store_dir = Path(tempfile.mkdtemp(prefix="checkmaite_jobs_")) / "analytics_store"
analytics_store_config = {"backend": "parquet", "uri": str(store_dir)}
print(f"Dataset: {dataset.metadata['id']} ({len(dataset)} images)")
print(f"Analytics store: {store_dir}")
/home/runner/work/checkmaite/checkmaite/.venv/lib/python3.10/site-packages/xaitk_saliency/__init__.py:3: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81. import pkg_resources
/home/runner/work/checkmaite/checkmaite/.venv/lib/python3.10/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html from .autonotebook import tqdm as notebook_tqdm
2026-05-22 18:42:35,403 INFO util.py:154 -- Missing packages: ['ipywidgets']. Run `pip install -U ipywidgets`, then restart the notebook server for rich notebook output.
Dataset: coco-job-tutorial (4 images) Analytics store: /tmp/checkmaite_jobs_vi9rlyra/analytics_store
Configure ray-simple¶
The analytics-store config is forwarded to the worker so the worker knows where to write completed run data.
In [2]:
Copied!
configure_job_backend(
"ray-simple",
address="local",
force_reinit=True,
analytics_store=analytics_store_config,
)
print("Configured ray-simple job backend.")
configure_job_backend(
"ray-simple",
address="local",
force_reinit=True,
analytics_store=analytics_store_config,
)
print("Configured ray-simple job backend.")
2026-05-22 18:42:40,567 INFO worker.py:2004 -- Started a local Ray instance. View the dashboard at 127.0.0.1:8265
Configured ray-simple job backend.
/home/runner/work/checkmaite/checkmaite/.venv/lib/python3.10/site-packages/ray/_private/worker.py:2052: FutureWarning: Tip: In future versions of Ray, Ray will no longer override accelerator visible devices env var if num_gpus=0 or num_gpus=None (default). To enable this behavior and turn off this error message, set RAY_ACCEL_ENV_VAR_OVERRIDE_ON_ZERO=0 warnings.warn(
Submit a capability job¶
In [3]:
Copied!
capability = DataevalCleaning()
job = submit_capability(
capability,
datasets=[dataset],
use_cache=False,
)
print(f"Job ID: {job.job_id}")
print(f"Initial status: {job.status}")
capability = DataevalCleaning()
job = submit_capability(
capability,
datasets=[dataset],
use_cache=False,
)
print(f"Job ID: {job.job_id}")
print(f"Initial status: {job.status}")
Job ID: 5a2e815c3eb242c88a0073100469c141 Initial status: JobStatus.PENDING
Wait for completion and get the result reference¶
job.result() returns a CapabilityRunRef, not the full capability output object.
In [4]:
Copied!
final_status = job.wait(timeout=120)
print(f"Final status: {final_status}")
ref = job.result(timeout=10)
print(f"Run UID: {ref.run_uid}")
print(f"Store URI: {ref.store_uri}")
final_status = job.wait(timeout=120)
print(f"Final status: {final_status}")
ref = job.result(timeout=10)
print(f"Run UID: {ref.run_uid}")
print(f"Store URI: {ref.store_uri}")
Final status: JobStatus.COMPLETED Run UID: 7a553ac2b7b42fd836159672b761d777fba055ad19f77f39cdf2c3c17b9234ed Store URI: /tmp/checkmaite_jobs_vi9rlyra/analytics_store/dataeval_cleaning/1779475366156_d9674754.parquet
List jobs remembered by this backend object¶
In [5]:
Copied!
for remembered_job in list_jobs(limit=10):
print(remembered_job.job_id, remembered_job.status)
print("Fetched by ID:", get_job(job.job_id).status)
for remembered_job in list_jobs(limit=10):
print(remembered_job.job_id, remembered_job.status)
print("Fetched by ID:", get_job(job.job_id).status)
5a2e815c3eb242c88a0073100469c141 JobStatus.COMPLETED Fetched by ID: JobStatus.COMPLETED
Query the analytics store¶
In [6]:
Copied!
store = AnalyticsStore(ParquetBackend(str(store_dir)))
print(f"Tables: {store.list_tables()}")
cleaning_results = store.query_sql("""
SELECT
dataset_id,
exact_duplicate_count,
image_outlier_count,
image_outlier_ratio,
mean_brightness
FROM dataeval_cleaning
""")
print(cleaning_results)
store = AnalyticsStore(ParquetBackend(str(store_dir)))
print(f"Tables: {store.list_tables()}")
cleaning_results = store.query_sql("""
SELECT
dataset_id,
exact_duplicate_count,
image_outlier_count,
image_outlier_ratio,
mean_brightness
FROM dataeval_cleaning
""")
print(cleaning_results)
Tables: ['dataeval_cleaning', 'runs'] shape: (1, 5) ┌───────────────────┬────────────────────┬───────────────────┬───────────────────┬─────────────────┐ │ dataset_id ┆ exact_duplicate_co ┆ image_outlier_cou ┆ image_outlier_rat ┆ mean_brightness │ │ --- ┆ unt ┆ nt ┆ io ┆ --- │ │ str ┆ --- ┆ --- ┆ --- ┆ f64 │ │ ┆ i64 ┆ i64 ┆ f64 ┆ │ ╞═══════════════════╪════════════════════╪═══════════════════╪═══════════════════╪═════════════════╡ │ coco-job-tutorial ┆ 0 ┆ 0 ┆ 0.0 ┆ 0.305882 │ └───────────────────┴────────────────────┴───────────────────┴───────────────────┴─────────────────┘
Shut down¶
In [7]:
Copied!
shutdown_job_backend(wait=True)
print("Job backend shut down.")
shutdown_job_backend(wait=True)
print("Job backend shut down.")
Job backend shut down.