Analytics store in distributed execution
The analytics store is straightforward in a single-process workflow:
- run a capability,
- write records,
- query them later from the same environment.
Distributed execution makes that more subtle because compute and orchestration no longer run in the same process.
Why distributed execution changes the problem
In local synchronous execution, the process that computes the run also already knows:
- where the analytics store lives,
- how to write to it,
- and how to read it back later.
In distributed job submission, those responsibilities are split:
- the client chooses the durable store location,
- the worker needs that information so it can persist results,
- and the client later needs a stable way to find the data that the worker wrote.
That is why configure_job_backend(...) requires explicit analytics-store
configuration:
configure_job_backend(
"ray",
analytics_store={
"backend": "parquet",
"uri": "./analytics_store",
},
)
The job backend forwards that store configuration to worker tasks. Workers then
build their own AnalyticsStore instance from the forwarded config rather than
guessing a local default.
For a focused configure_job_backend(...) reference, see
Job backend configuration.
End-to-end store flow
sequenceDiagram
participant Client
participant JobBackend as JobBackend
participant Worker
participant Store as AnalyticsStore
Client->>JobBackend: configure_job_backend(..., analytics_store=...)
Client->>JobBackend: submit_capability(...)
JobBackend->>Worker: send task + analytics_store config
Worker->>Store: build store from forwarded config
Worker->>Store: write completed run
Store-->>Worker: durable reference metadata
Worker-->>Client: CapabilityRunRef(store_uri=...)
Client->>Store: open same store later to query/read
The key point is that store configuration is part of job submission semantics, not an incidental local default.
What job submission expects from the store
The job-submission layer relies on a few store-level properties:
- workers can construct the store from the forwarded configuration,
- completed run data is written durably before a job reports success,
- repeated writes for the same logical completed run are safe,
- the worker can return a stable
CapabilityRunRef, includingstore_uri, and - the client can later use the same store location to query or read results.
Backend-specific storage mechanics, table layouts, deduplication keys, and concurrency guarantees belong to the analytics-store backend implementation.
Storage visibility matters
In a single-machine workflow, it is easy to forget that "where results are written" is itself configuration. Once workers run remotely, that assumption breaks down:
- a node-local default path may not be durable,
- a worker-local filesystem path may be meaningless to the client,
- and different machines may not share the same storage namespace.
Distributed execution therefore requires two explicit decisions:
- where workers should write, and
- how the client should later identify the written result.
The current job-submission path solves that by forwarding the configured
analytics store to workers and returning a CapabilityRunRef only after the
worker write succeeds.
Failure policy
The analytics store is the durable system of record for completed job results. That is why the current policy is strict:
- worker-side analytics-store write happens before success is returned,
- if the store write fails, the task fails,
- and the client observes
JobFailedErrorrather than a silent partial success.
A successful job should imply that durable analytics-store persistence succeeded.