Wire protocol

You don't need this to build workflows - the SDK handles all of it. This reference is for people building or porting an SDK, or integrating with the engine at the protocol level.

This is the contract between the engine and a runner (your code plus an SDK). It covers the five step operations - running a step, sleeping, waiting for an event, invoking a child workflow, and emitting an event - over HTTP. More transports are planned; the shapes here are designed to grow without breaking.

How the engine drives a workflow

The engine never holds your workflow in memory. It makes progress by calling your runner's /invoke endpoint, once per step, sending along the results of every step that has already completed. Your handler runs from the top each time: completed steps return their saved result, and the first unfinished step does real work and reports back. The engine saves that result and calls again, until the handler returns.

This is why a runner is stateless and a run survives an engine restart: all progress lives in the engine's store and is replayed to the runner on each call.

Endpoints

Method + path	On	Purpose
`POST /register`	engine	Runner announces its app, optional stable runner id, workflows, and invoke URL.
`POST /events`	engine	Ingest an event: resume any `waitForEvent` waiters and fan out to every workflow whose triggers match (optionally pinned to a runner). Returns `202 {runId?, woke, triggered}`.
`GET /runs`, `GET /runs/{id}`, `GET /runs/{id}/steps`	engine	Inspect runs and steps.
`GET /workflows`	engine	List registered workflow definitions (name, app, retry policy, and `scheduled` + `schedules[]` for cron workflows).
`GET /runners`	engine	List registered runners (id, app, url, runtime/version, last-seen, live). Filter `?app=`. Backs the console's connected-runners view.
`GET /events`, `GET /events/{id}`	engine	The event log: each ingested event with what it triggered/woke. Filter `?app=&name=&limit=`, newest first.
`GET /events/stream`	engine	Live tail of ingested events as Server-Sent Events (`text/event-stream`).
`POST /invoke`	runner	The engine drives one pass. Returns `200` (done) or `206` (more work).
`GET /connect`	engine	WebSocket upgrade for the Connect transport. A runner with no inbound URL dials this, registers, and receives invokes over the socket.

The runner's invoke URL is whatever it advertises in /register - unless it connects over the Connect transport (see below), in which case it has no inbound URL at all.

Messages

`POST /register` (runner → engine)

{
  "app": "order-app",
  "runner": "node-7",
  "url": "http://localhost:6773/invoke",
  "runtime": "bun",
  "language": "typescript",
  "version": "0.1.0",
  "protocolVersion": 1,
  "workflows": [
    {
      "name": "order.created",
      "retry": { "maxAttempts": 3 },
      "concurrency": { "limit": 5, "key": "accountId" }
    }
  ]
}

Field	Type	Description
`app`	`string`	App the runner serves.
`runner`	`string` (optional)	Stable runner id; omitted → the engine keys the endpoint by `url`.
`url`	`string`	The runner's `/invoke` endpoint.
`runtime` / `language` / `version`	`string` (optional)	Handshake metadata describing the runner (e.g. `bun` / `typescript` / the SDK version), surfaced in the console's connected-runners view. The SDK sends them automatically.
`protocolVersion`	`number` (optional)	The wire version the runner speaks (see Protocol version). Omitted → assumed compatible.
`workflows`	`object[]`	Each entry has a `name` plus optional `triggers`, `retry`, `onFailure` (`true` if the workflow has an onFailure handler), and flow-control fields (see Flow Control).
`workflows[].triggers`	`object[]` (optional)	What starts the workflow: event triggers `{ event, if? }` (`event` may end in a `*` wildcard; `if` is a CEL filter) and cron triggers `{ cron }`. Omitted → an implicit event trigger on the workflow name. See Triggers.

`POST /events` (caller → engine)

{ "name": "order.created", "app": "order-app", "runner": "node-7", "dedupeId": "evt-A1", "data": { "orderId": "A1" } }

Field	Type	Description
`name`	`string`	Event name; matched against every workflow's event triggers (exact or `*` wildcard) and resumes awaiting `waitForEvent` steps.
`app`	`string`	Target app.
`runner`	`string` (optional)	Present → pin the run to that runner; absent → anycast.
`dedupeId`	`string` (optional)	A repeat of the same id (per app) within 24h is dropped entirely - no waiters woken, no fan-out, no new log row - so an at-least-once caller can safely retry. The response is `202 { "deduped": true }`.
`data`	`JSON`	Event payload. Any valid JSON (object, array, or scalar); may be omitted.

name is required (non-blank) and length-bounded, as are app, runner, targetApp, and dedupeId; data, when present, must be valid JSON. A request that violates any of these is rejected with 400.

The 202 response reports what the event did:

{ "runId": "01H...", "woke": 0, "skipped": false, "dropped": false, "debounced": false, "batched": false, "deduped": false }

Field	Type	Description
`runId`	`string`	Set if the event triggered a run; mirrors the first matched workflow (sorted by name) for the common single-match case.
`woke`	`number`	`waitForEvent` runs resumed (broadcast).
`skipped`	`boolean`	A singleton `skip` policy dropped the trigger.
`dropped`	`boolean`	A `rateLimit` policy shed the trigger.
`debounced`	`boolean`	Coalesced into a debounce buffer.
`batched`	`boolean`	Buffered into a batch.
`deduped`	`boolean`	The event was a duplicate `dedupeId`, or a workflow's idempotency key was already seen in its window.
`triggered`	`object[]`	One entry per workflow the event matched (event triggers can fan out); each has `workflow`, `runId?`, and the gate booleans.

`POST /invoke` (engine → runner)

{
  "event": { "name": "order.created", "data": { "orderId": "A1" } },
  "steps": { "9f2b8c...": { "data": { "chargeId": "ch_A1" } } },
  "ctx": { "runId": "01H...", "workflow": "fulfillment", "attempt": 1, "app": "order-app", "runner": "node-7" }
}

Field	Type	Description
`event`	`object`	The triggering event (`name`, `data`); `name` is informational and may differ from the workflow.
`steps`	`object`	Memo map: hashed step id → its saved state.
`ctx.runId`	`string`	The run being replayed.
`ctx.workflow`	`string`	The dispatch key: the registered workflow name the runner routes to (distinct from `event.name`).
`ctx.attempt`	`number`	Run-level attempt counter.
`ctx.app`	`string`	The run's app.
`ctx.runner`	`string`	The run's pin; `""` for an anycast run.
`ctx.onFailure`	`boolean` (optional)	Set on an onFailure invocation; the runner dispatches to the workflow's onFailure handler.
`ctx.error`	`StepError` (optional)	The terminal error, present only when `ctx.onFailure` is set.
`ctx.traceparent`	`string` (optional)	W3C trace context of the engine's invoke span; the runner extracts it to nest its spans in the run's distributed trace. Rides the body (not a header) so it propagates the same over HTTP and the Connect WebSocket.

Each steps entry carries one of:

Field	Type	Description
`data`	`JSON`	A completed step's result.
`error`	`StepError`	A completed step that threw.
`pending`	`boolean`	Step already started (a parked sleep / wait / child); the runner blocks on it without re-running or re-emitting.

The runner replies 200 { "data": <result>, "logs": [LogLine, ...] } when the run completes, or 206 { "opcodes": [Opcode, ...], "logs": [LogLine, ...] } listing the steps discovered this pass. The logs array carries any ctx.log lines captured during the pass (see Logs); it is [] when none were emitted.

Opcode

{ "op": "StepRun", "id": "9f2b8c...", "name": "charge", "data": { "chargeId": "ch_A1" } }

Field	Type	Used by	Description
`op`	`enum`	all	`StepRun` \| `Sleep` \| `SleepUntil` \| `WaitForEvent` \| `RunWorkflow` \| `Emit` \| `Webhook`.
`id`	`string`	all	Hashed step id; the engine stores the result under this key.
`name`	`string`	all	Human-readable step id (for the console).
`data`	`JSON`	StepRun, Emit, Webhook	Step result / event payload / webhook body.
`error`	`StepError`	StepRun	Step failure.
`retriable`	`boolean` (optional)	StepRun	`false` fails the run now, skipping remaining attempts (NonRetriableError).
`retryAfterMs`	`number` (optional)	StepRun	Overrides the policy backoff for this retry (RetryAfterError).
`sleepMs`	`number`	Sleep	Duration in ms.
`sleepUntilMs`	`number`	SleepUntil	Absolute wake time (UTC epoch ms).
`eventName`	`string`	WaitForEvent, Emit	Awaited / emitted event name.
`timeoutMs`	`number`	WaitForEvent	Timeout in ms.
`childName`	`string`	RunWorkflow	Child workflow to invoke.
`childData`	`JSON`	RunWorkflow	Input passed to the child.
`webhookUrl`	`string`	Webhook	Destination URL for a `ctx.webhook.send`; the engine enqueues a durable outbound delivery to it carrying `data` (a custom send has no endpoint secret, so it is delivered unsigned).

StepError is { "message": string, "stack"?: string }.

Logs

A pass's response also carries the structured logs the handler emitted via ctx.log. The same array shape rides every status (200, 206, and 500 - logs up to a throw still ship), so a log is never lost to the path a pass took:

{ "level": "info", "message": "charging card", "fields": { "amount": 4200 }, "scope": "charge", "index": 0, "tsMs": 1718900000000 }

Field	Type	Description
`level`	`enum`	`debug` \| `info` \| `warn` \| `error`.
`message`	`string`	The log message.
`fields`	`object` (optional)	Structured fields. Sensitive keys are redacted engine-side before storage.
`scope`	`string`	The enclosing step name, or `@root` for a handler-level log.
`index`	`number`	A per-scope counter the engine uses (with `scope` + attempt) to give each line a replay-stable dedupe id.
`tsMs`	`number`	Runner wall clock (advisory).

Because the handler body re-runs on every pass, a top-level (@root) log re-emits each pass; the engine deduplicates it by (runId, attempt, scope, index) so it persists once. An in-step log only runs on the pass where its step executes, and its dedupe is keyed on the step's attempt, so a retried step's logs stay distinct per attempt. Read them back via GET /runs/{id}/logs.

Sleep.sleepMs is a duration, not an absolute time. The runner has no clock authority; it says "sleep 10s" and the engine resolves the wake time when it persists the sleep, which keeps the directive idempotent across passes. SleepUntil.sleepUntilMs is the absolute counterpart (UTC epoch ms) for step.sleepUntil: a fixed wall-clock target the engine stores verbatim as the deadline and parks against its own clock until it arrives. A target already in the past wakes on the next cycle.

A Webhook opcode (ctx.webhook.send) is a durable side effect, not a step result: the engine enqueues a signed outbound delivery to webhookUrl carrying data, then records the step. Like Emit, it is at-least-once on the wire but made exactly-once by the deterministic step id, so a replayed pass never re-sends. See the webhooks guide.

A single pass can return several opcodes: a handler that runs steps with Promise.all discovers the whole batch at once, and the 206 array carries them all (sequential steps are just the one-opcode case). The engine persists every opcode, parks at the earliest deadline across the batch, and re-invokes as each branch is ready; the handler proceeds once all are memoized. A terminal failure in one branch fails the run and cancels its in-flight siblings. A step the engine has already started but not finished comes back as "pending": true in the memo - the runner must not re-run or re-emit it, so a parked branch's timer survives re-invokes its siblings trigger.

Step id hashing

hashedStepId = lowercase_hex( SHA-256( utf8(stepId) ) )

The runner hashes the human-readable step id (e.g. "charge") to produce the steps map key and the Opcode.id; the engine treats it as an opaque key. Implementations in different languages must produce byte-identical output. A step id reused within one run is disambiguated runner-side before hashing by a positional suffix ("x", then "x:1", "x:2", ...), so each occurrence gets a distinct key; see the full spec for the exact scheme.

Routing (`{app, runner?}`)

A run is owned by an app and executed by one of that app's registered runners (an app may have many). The runner id is the routing handle:

Anycast (no runner): each invoke goes to any one registered runner of the app (random pick). Runners are stateless and the full step memo is resent every invoke, so different passes may safely hit different replicas. If none is registered yet, the run parks and retries (the event-before-register race self-heals).
Pinned (runner set on the event): routed only to that runner id. If it isn't registered, the run fails fast - a pin to a non-existent runner is a caller error in v0. Bounded waiting and offline drain for pinned runs are a future addition.

A runner that omits an id in /register is keyed by its URL. A child workflow inherits its parent's pin only in the same app. ctx.runner carries the pin to your handler (empty for anycast).

Connect transport (WebSocket)

Two transports drive a runner, and the choice is invisible to your workflow code:

Serve (default): you stand up an HTTP server and the engine POSTs your /invoke URL. Simple, but the runner must be inbound-reachable.
Connect: your runner dials the engine over a WebSocket (GET /connect) and receives invokes on that socket, so it needs no inbound address - the model for an agent on a node behind NAT. In the SDK this is connect({ engineUrl, app, runner?, workflows }) instead of serve(...) + register(...).

The execution model is identical (same step-memoization replay); only the connection direction differs. Connect uses durablex's own protocol (subprotocol durablex.connect.v0, a generic {type, id, payload} envelope, message types hello / runner.register / invoke / invoke.result), and the stable runner id is part of the handshake, so a connected runner is pinnable by {app, runner} exactly like an HTTP one. The hello and runner.register frames also carry the wire version (see Protocol version). A dropped socket is detected by heartbeat and the runner is evicted from routing until it reconnects (the SDK reconnects automatically).

Status codes

Code	Meaning
`200`	Handler returned. Body `{ data, logs }` carries the final result. The run succeeds.
`206`	Handler emitted new opcodes and isn't finished. Body is `{ opcodes, logs }`.
`4xx`	Non-retriable error (bad request, unknown workflow). The engine fails the run.
`5xx`	Retriable transport error. The engine retries the invoke with backoff on a fixed transient budget, then fails the run.

A step failure is different from a transport error: the runner reports it as a 206 whose opcode carries error, and the engine retries that step per the workflow's retry policy. Per-step retries and the transient transport budget are counted separately.

The engine caps the invoke response at 1 MiB (the same wire-message limit the connect transport applies to a result frame), so a runaway runner can't exhaust engine memory with an unbounded result; a response over the cap fails the invoke. Large step state is offloaded by API, not carried inline.

When a run fails terminally and its workflow registered onFailure: true, the engine marks the run failed and spawns a separate follow-on run invoked with ctx.onFailure + ctx.error to run the onFailure handler. The failed run is retained: queryable via GET /runs?status=failed and redrivable via POST /runs/{id}/replay. See Retries.

Protocol version

The engine and runner share a single integer wire version (currently 1; v1 wrapped the invoke response in an object on every status so logs ride alongside the result - v0 returned a bare {data} on 200 and a bare opcode array on 206). Each side advertises it and checks the peer's at every boundary, so a breaking wire change fails loudly instead of misparsing:

Boundary	Carried as
HTTP invoke (engine → runner)	the `X-Durablex-Protocol` header
HTTP register (runner → engine)	`RegisterRequest.protocolVersion`
Connect handshake	`protocolVersion` on the `hello` and `runner.register` frames

The rule is lenient on absence, strict on a present mismatch: a peer that sends no version is assumed compatible, so the field is additive and never breaks an older peer, but a version that is present and differs is rejected (400 on the HTTP paths; the socket closes on Connect). The SDK sets this for you - you only encounter it if a runner and engine are on incompatible releases.

Signing

Every request between engine and runner carries an integrity header. Today this is a pass-through stub behind a signing interface, so real HMAC-SHA256 can drop in later without changing call sites.

Wire protocol

On this page