Wire protocol

The HTTP contract between the Durablex engine and a runner.

You don't need this to build workflows - the SDK handles all of it. This reference is for people building or porting an SDK, or integrating with the engine at the protocol level.

This is the contract between the engine and a runner (your code plus an SDK). It covers the five step operations - running a step, sleeping, waiting for an event, invoking a child workflow, and emitting an event - over HTTP. More transports are planned; the shapes here are designed to grow without breaking.

How the engine drives a workflow

The engine never holds your workflow in memory. It makes progress by calling your runner's /invoke endpoint, once per step, sending along the results of every step that has already completed. Your handler runs from the top each time: completed steps return their saved result, and the first unfinished step does real work and reports back. The engine saves that result and calls again, until the handler returns.

This is why a runner is stateless and a run survives an engine restart: all progress lives in the engine's store and is replayed to the runner on each call.

Endpoints

Method + pathOnPurpose
POST /registerengineRunner announces its app, optional stable runner id, workflows, and invoke URL.
POST /eventsengineIngest an event: resume any waitForEvent waiters and fan out to every workflow whose triggers match (optionally pinned to a runner). Returns 202 {runId?, woke, triggered}.
GET /runs, GET /runs/{id}, GET /runs/{id}/stepsengineInspect runs and steps.
GET /workflowsengineList registered workflow definitions (name, app, retry policy, and scheduled + schedules[] for cron workflows).
GET /runnersengineList registered runners (id, app, url, runtime/version, last-seen, live). Filter ?app=. Backs the console's connected-runners view.
GET /events, GET /events/{id}engineThe event log: each ingested event with what it triggered/woke. Filter ?app=&name=&limit=, newest first.
GET /events/streamengineLive tail of ingested events as Server-Sent Events (text/event-stream).
POST /invokerunnerThe engine drives one pass. Returns 200 (done) or 206 (more work).
GET /connectengineWebSocket upgrade for the Connect transport. A runner with no inbound URL dials this, registers, and receives invokes over the socket.

The runner's invoke URL is whatever it advertises in /register - unless it connects over the Connect transport (see below), in which case it has no inbound URL at all.

Messages

POST /register (runner → engine)

{
  "app": "order-app",
  "runner": "node-7",
  "url": "http://localhost:6773/invoke",
  "runtime": "bun",
  "language": "typescript",
  "version": "0.1.0",
  "protocolVersion": 1,
  "workflows": [
    {
      "name": "order.created",
      "retry": { "maxAttempts": 3 },
      "concurrency": { "limit": 5, "key": "accountId" }
    }
  ]
}
FieldTypeDescription
appstringApp the runner serves.
runnerstring (optional)Stable runner id; omitted → the engine keys the endpoint by url.
urlstringThe runner's /invoke endpoint.
runtime / language / versionstring (optional)Handshake metadata describing the runner (e.g. bun / typescript / the SDK version), surfaced in the console's connected-runners view. The SDK sends them automatically.
protocolVersionnumber (optional)The wire version the runner speaks (see Protocol version). Omitted → assumed compatible.
workflowsobject[]Each entry has a name plus optional triggers, retry, onFailure (true if the workflow has an onFailure handler), and flow-control fields (see Flow Control).
workflows[].triggersobject[] (optional)What starts the workflow: event triggers { event, if? } (event may end in a * wildcard; if is a CEL filter) and cron triggers { cron }. Omitted → an implicit event trigger on the workflow name. See Triggers.

POST /events (caller → engine)

{ "name": "order.created", "app": "order-app", "runner": "node-7", "dedupeId": "evt-A1", "data": { "orderId": "A1" } }
FieldTypeDescription
namestringEvent name; matched against every workflow's event triggers (exact or * wildcard) and resumes awaiting waitForEvent steps.
appstringTarget app.
runnerstring (optional)Present → pin the run to that runner; absent → anycast.
dedupeIdstring (optional)A repeat of the same id (per app) within 24h is dropped entirely - no waiters woken, no fan-out, no new log row - so an at-least-once caller can safely retry. The response is 202 { "deduped": true }.
dataJSONEvent payload. Any valid JSON (object, array, or scalar); may be omitted.

name is required (non-blank) and length-bounded, as are app, runner, targetApp, and dedupeId; data, when present, must be valid JSON. A request that violates any of these is rejected with 400.

The 202 response reports what the event did:

{ "runId": "01H...", "woke": 0, "skipped": false, "dropped": false, "debounced": false, "batched": false, "deduped": false }
FieldTypeDescription
runIdstringSet if the event triggered a run; mirrors the first matched workflow (sorted by name) for the common single-match case.
wokenumberwaitForEvent runs resumed (broadcast).
skippedbooleanA singleton skip policy dropped the trigger.
droppedbooleanA rateLimit policy shed the trigger.
debouncedbooleanCoalesced into a debounce buffer.
batchedbooleanBuffered into a batch.
dedupedbooleanThe event was a duplicate dedupeId, or a workflow's idempotency key was already seen in its window.
triggeredobject[]One entry per workflow the event matched (event triggers can fan out); each has workflow, runId?, and the gate booleans.

POST /invoke (engine → runner)

{
  "event": { "name": "order.created", "data": { "orderId": "A1" } },
  "steps": { "9f2b8c...": { "data": { "chargeId": "ch_A1" } } },
  "ctx": { "runId": "01H...", "workflow": "fulfillment", "attempt": 1, "app": "order-app", "runner": "node-7" }
}
FieldTypeDescription
eventobjectThe triggering event (name, data); name is informational and may differ from the workflow.
stepsobjectMemo map: hashed step id → its saved state.
ctx.runIdstringThe run being replayed.
ctx.workflowstringThe dispatch key: the registered workflow name the runner routes to (distinct from event.name).
ctx.attemptnumberRun-level attempt counter.
ctx.appstringThe run's app.
ctx.runnerstringThe run's pin; "" for an anycast run.
ctx.onFailureboolean (optional)Set on an onFailure invocation; the runner dispatches to the workflow's onFailure handler.
ctx.errorStepError (optional)The terminal error, present only when ctx.onFailure is set.
ctx.traceparentstring (optional)W3C trace context of the engine's invoke span; the runner extracts it to nest its spans in the run's distributed trace. Rides the body (not a header) so it propagates the same over HTTP and the Connect WebSocket.

Each steps entry carries one of:

FieldTypeDescription
dataJSONA completed step's result.
errorStepErrorA completed step that threw.
pendingbooleanStep already started (a parked sleep / wait / child); the runner blocks on it without re-running or re-emitting.

The runner replies 200 { "data": <result>, "logs": [LogLine, ...] } when the run completes, or 206 { "opcodes": [Opcode, ...], "logs": [LogLine, ...] } listing the steps discovered this pass. The logs array carries any ctx.log lines captured during the pass (see Logs); it is [] when none were emitted.

Opcode

{ "op": "StepRun", "id": "9f2b8c...", "name": "charge", "data": { "chargeId": "ch_A1" } }
FieldTypeUsed byDescription
openumallStepRun | Sleep | SleepUntil | WaitForEvent | RunWorkflow | Emit | Webhook.
idstringallHashed step id; the engine stores the result under this key.
namestringallHuman-readable step id (for the console).
dataJSONStepRun, Emit, WebhookStep result / event payload / webhook body.
errorStepErrorStepRunStep failure.
retriableboolean (optional)StepRunfalse fails the run now, skipping remaining attempts (NonRetriableError).
retryAfterMsnumber (optional)StepRunOverrides the policy backoff for this retry (RetryAfterError).
sleepMsnumberSleepDuration in ms.
sleepUntilMsnumberSleepUntilAbsolute wake time (UTC epoch ms).
eventNamestringWaitForEvent, EmitAwaited / emitted event name.
timeoutMsnumberWaitForEventTimeout in ms.
childNamestringRunWorkflowChild workflow to invoke.
childDataJSONRunWorkflowInput passed to the child.
webhookUrlstringWebhookDestination URL for a ctx.webhook.send; the engine enqueues a durable outbound delivery to it carrying data (a custom send has no endpoint secret, so it is delivered unsigned).

StepError is { "message": string, "stack"?: string }.

Logs

A pass's response also carries the structured logs the handler emitted via ctx.log. The same array shape rides every status (200, 206, and 500 - logs up to a throw still ship), so a log is never lost to the path a pass took:

{ "level": "info", "message": "charging card", "fields": { "amount": 4200 }, "scope": "charge", "index": 0, "tsMs": 1718900000000 }
FieldTypeDescription
levelenumdebug | info | warn | error.
messagestringThe log message.
fieldsobject (optional)Structured fields. Sensitive keys are redacted engine-side before storage.
scopestringThe enclosing step name, or @root for a handler-level log.
indexnumberA per-scope counter the engine uses (with scope + attempt) to give each line a replay-stable dedupe id.
tsMsnumberRunner wall clock (advisory).

Because the handler body re-runs on every pass, a top-level (@root) log re-emits each pass; the engine deduplicates it by (runId, attempt, scope, index) so it persists once. An in-step log only runs on the pass where its step executes, and its dedupe is keyed on the step's attempt, so a retried step's logs stay distinct per attempt. Read them back via GET /runs/{id}/logs.

Sleep.sleepMs is a duration, not an absolute time. The runner has no clock authority; it says "sleep 10s" and the engine resolves the wake time when it persists the sleep, which keeps the directive idempotent across passes. SleepUntil.sleepUntilMs is the absolute counterpart (UTC epoch ms) for step.sleepUntil: a fixed wall-clock target the engine stores verbatim as the deadline and parks against its own clock until it arrives. A target already in the past wakes on the next cycle.

A Webhook opcode (ctx.webhook.send) is a durable side effect, not a step result: the engine enqueues a signed outbound delivery to webhookUrl carrying data, then records the step. Like Emit, it is at-least-once on the wire but made exactly-once by the deterministic step id, so a replayed pass never re-sends. See the webhooks guide.

A single pass can return several opcodes: a handler that runs steps with Promise.all discovers the whole batch at once, and the 206 array carries them all (sequential steps are just the one-opcode case). The engine persists every opcode, parks at the earliest deadline across the batch, and re-invokes as each branch is ready; the handler proceeds once all are memoized. A terminal failure in one branch fails the run and cancels its in-flight siblings. A step the engine has already started but not finished comes back as "pending": true in the memo - the runner must not re-run or re-emit it, so a parked branch's timer survives re-invokes its siblings trigger.

Step id hashing

hashedStepId = lowercase_hex( SHA-256( utf8(stepId) ) )

The runner hashes the human-readable step id (e.g. "charge") to produce the steps map key and the Opcode.id; the engine treats it as an opaque key. Implementations in different languages must produce byte-identical output. A step id reused within one run is disambiguated runner-side before hashing by a positional suffix ("x", then "x:1", "x:2", ...), so each occurrence gets a distinct key; see the full spec for the exact scheme.

Routing ({app, runner?})

A run is owned by an app and executed by one of that app's registered runners (an app may have many). The runner id is the routing handle:

  • Anycast (no runner): each invoke goes to any one registered runner of the app (random pick). Runners are stateless and the full step memo is resent every invoke, so different passes may safely hit different replicas. If none is registered yet, the run parks and retries (the event-before-register race self-heals).
  • Pinned (runner set on the event): routed only to that runner id. If it isn't registered, the run fails fast - a pin to a non-existent runner is a caller error in v0. Bounded waiting and offline drain for pinned runs are a future addition.

A runner that omits an id in /register is keyed by its URL. A child workflow inherits its parent's pin only in the same app. ctx.runner carries the pin to your handler (empty for anycast).

Connect transport (WebSocket)

Two transports drive a runner, and the choice is invisible to your workflow code:

  • Serve (default): you stand up an HTTP server and the engine POSTs your /invoke URL. Simple, but the runner must be inbound-reachable.
  • Connect: your runner dials the engine over a WebSocket (GET /connect) and receives invokes on that socket, so it needs no inbound address - the model for an agent on a node behind NAT. In the SDK this is connect({ engineUrl, app, runner?, workflows }) instead of serve(...) + register(...).

The execution model is identical (same step-memoization replay); only the connection direction differs. Connect uses durablex's own protocol (subprotocol durablex.connect.v0, a generic {type, id, payload} envelope, message types hello / runner.register / invoke / invoke.result), and the stable runner id is part of the handshake, so a connected runner is pinnable by {app, runner} exactly like an HTTP one. The hello and runner.register frames also carry the wire version (see Protocol version). A dropped socket is detected by heartbeat and the runner is evicted from routing until it reconnects (the SDK reconnects automatically).

Status codes

CodeMeaning
200Handler returned. Body { data, logs } carries the final result. The run succeeds.
206Handler emitted new opcodes and isn't finished. Body is { opcodes, logs }.
4xxNon-retriable error (bad request, unknown workflow). The engine fails the run.
5xxRetriable transport error. The engine retries the invoke with backoff on a fixed transient budget, then fails the run.

A step failure is different from a transport error: the runner reports it as a 206 whose opcode carries error, and the engine retries that step per the workflow's retry policy. Per-step retries and the transient transport budget are counted separately.

The engine caps the invoke response at 1 MiB (the same wire-message limit the connect transport applies to a result frame), so a runaway runner can't exhaust engine memory with an unbounded result; a response over the cap fails the invoke. Large step state is offloaded by API, not carried inline.

When a run fails terminally and its workflow registered onFailure: true, the engine marks the run failed and spawns a separate follow-on run invoked with ctx.onFailure + ctx.error to run the onFailure handler. The failed run is retained: queryable via GET /runs?status=failed and redrivable via POST /runs/{id}/replay. See Retries.

Protocol version

The engine and runner share a single integer wire version (currently 1; v1 wrapped the invoke response in an object on every status so logs ride alongside the result - v0 returned a bare {data} on 200 and a bare opcode array on 206). Each side advertises it and checks the peer's at every boundary, so a breaking wire change fails loudly instead of misparsing:

BoundaryCarried as
HTTP invoke (engine → runner)the X-Durablex-Protocol header
HTTP register (runner → engine)RegisterRequest.protocolVersion
Connect handshakeprotocolVersion on the hello and runner.register frames

The rule is lenient on absence, strict on a present mismatch: a peer that sends no version is assumed compatible, so the field is additive and never breaks an older peer, but a version that is present and differs is rejected (400 on the HTTP paths; the socket closes on Connect). The SDK sets this for you - you only encounter it if a runner and engine are on incompatible releases.

Signing

Every request between engine and runner carries an integrity header. Today this is a pass-through stub behind a signing interface, so real HMAC-SHA256 can drop in later without changing call sites.

On this page