docs: Updated the graph agents docs to discuss parallelization and clarify fan-out

2026-05-20 16:49:23 -06:00
parent 93801ea16e
commit c88b5be6da
+428 -13
@@ -11,6 +11,9 @@ Graph agents are best for workflows that:
- Mix LLM calls with deterministic steps (scripts, user prompts)
- Need explicit human-in-the-loop checkpoints
- Benefit from per-step model / tool / temperature overrides
- **Fan out work across parallel branches:** Research three sources at once,
map an LLM call over a runtime-determined list of inputs, then merge results
back through declared reducers (see [Parallel Execution](#parallel-execution))
If you just want an agent that takes a goal and figures out the steps on its
own, stick with a regular [agent](Agents).
@@ -73,6 +76,11 @@ settings:
log_state_snapshots: true # log state JSON before each node executes
validate_before_run: true # run the graph validator on startup
timeout: 600 # optional overall timeout in seconds
max_concurrency: 8 # cap on simultaneously running parallel branches; default 8
reducers: # optional; declares how parallel branches merge writes
results: append # see "Parallel Execution" below
cost_usd: sum
initial_state: # optional seed state for the run
topic: "auth"
@@ -108,6 +116,16 @@ nodes:
- **`initial_state`:** A JSON-compatible object. Values are seeded into
state before any node runs and are referenced from any node via `{{key}}`
templates.
- **`max_concurrency`:** Caps the number of branches that can execute
concurrently in any single parallel super-step. Default 8. Per-`map` nodes
can override this with their own `max_concurrency` field. The graph-wide
cap is a safety net against LLM rate-limit blowups when a `next: [a, b, c]`
fan-out is followed by deeper parallelism. Must be `>= 1`.
- **`reducers`:** Declares how state-key writes from concurrent branches are
combined. Required for any state key that two or more parallel branches
write in the same super-step (the validator catches missing reducers at
load time). See [Parallel Execution](#parallel-execution) for the full
reducer reference and merge semantics.
### `{{initial_prompt}}`: Automatically Seeded
@@ -134,20 +152,26 @@ You do not need to (and should not) put `initial_prompt` in `initial_state` as i
# Node Types
There are seven node types: **agent**, **script**, **approval**, **input**,
**llm**, **rag**, and **end**. Every node has these common fields:
There are eight node types: **agent**, **script**, **approval**, **input**,
**llm**, **rag**, **map**, and **end**. Every node has these common fields:
```yaml
my_node:
id: my_node # must match the map key
type: <one of the seven>
type: <one of the eight>
description: optional # free-form
next: another_node # optional default next node; semantics vary per type
# OR
next: [a, b, c] # fan out to multiple branches in parallel
```
The `next` field defines the default routing edge. Node types interpret it
differently (some types ignore it in favor of internal routing; see each type
below).
below). **`next:` is polymorphic**: a single string (`next: foo`) routes
sequentially; a list (`next: [a, b, c]`) declares a static fan-out. Every
target runs as a parallel branch in the next super-step. Approval and input
nodes cannot use the list form (the validator rejects parallel user
prompts). See [Parallel Execution](#parallel-execution) for the full model.
---
@@ -494,6 +518,99 @@ embedding/chunking cost is paid once, at load time.
---
## map
Dynamic fan-out over a runtime-determined list. The `map` node reads a JSON
array out of state (via the `over:` template), spawns one parallel
sub-branch per item invoking the same branch node, and collects each
sub-branch's output into a state array in input order.
This is how a graph agent runs "research these N subjects in parallel" when
N is determined at run-time. For a fixed-cardinality fan-out (you know the
exact set of branches at YAML-write time), use `next: [a, b, c]` on the
preceding node instead.
```yaml
fan_out_subjects:
id: fan_out_subjects
type: map
over: "{{subjects}}" # required; template must resolve to a JSON array
as: subject # required; bound name inside each sub-branch
branch: research_subject # required; node id to invoke once per item
output_key: output # optional; state key the branch writes (default "output")
collect_into: research_results # required; state key to receive the array of outputs
max_concurrency: 5 # optional; overrides settings.max_concurrency
next: rank # required; where to go after the map collects
research_subject: # the branch node (atomic — runs N times)
id: research_subject
type: llm
prompt: "Research {{subject}} in depth."
state_updates:
output: "{{output}}" # branch must write to `output_key`
# Note: no `next:` on a map branch; map branches are atomic
```
### Field reference
- **`over`:** A template expression that must resolve to a JSON array of
items. Each item becomes one sub-branch invocation. If the array is
empty, the map runs zero sub-branches and writes `[]` to `collect_into`
(which is correct semantically. I.e. mapping over nothing yields nothing).
- **`as`:** The state key under which each item is bound inside its
sub-branch. The branch node then references it via `{{<as>}}` in its
prompt / templates.
- **`branch`:** The id of the node to invoke per item. The branch's
outputs across all sub-branches are aggregated into `collect_into`.
- **`output_key`:** The state key each sub-branch must write its result
to. Default `"output"`. The map executor reads this from each
sub-branch's state, extracts the value, and inserts it into the
`collect_into` array.
- **`collect_into`:** The state key in the **parent's** state that
receives the array of all sub-branch outputs, **in input-list order
regardless of which sub-branch finished first**. This is the map's
user-visible determinism contract.
- **`max_concurrency`:** Per-map cap on concurrent sub-branches; falls
back to `settings.max_concurrency` (default 8) when unset. Must be
`>= 1` if set.
- **`next`:** Required. Where the parent super-step continues after the
map collects.
### Branch node constraints
Map branches are **atomic**. One node, one execution per item, no
chaining inside the branch. The validator enforces several rules at load
time:
- Branch type must be `llm`, `agent`, `rag`, or `script`. Approval, input,
end, or another map node as the branch is a load-time error.
- Branch must **not** have `next:` declared. Anything chained off the
branch belongs after the map (downstream of the `collect_into` write).
- Branch's `state_updates` keys must be a subset of `output_key`. The
branch can write to its own `output_key` and nowhere else.
- Branch must **not** declare `output_schema`. Top-level schema-properties
would auto-merge into state, polluting it across N parallel branches.
Use explicit `state_updates: { output: "{{output}}" }` instead.
If a sub-branch fails to write `output_key`, the map node errors loudly
(no silent missing-output behavior).
### Empty `over`
If `over` resolves to `[]` (an empty array), the map invokes zero
sub-branches and writes an empty array to `collect_into`. This is **not**
an error. It's the correct semantic for "map over nothing yields
nothing". If you want to reject empty `over` as an error, gate it with a
script node upstream.
### Nested map nodes are not supported in v1
A map branch cannot itself be another map node. The validator rejects this
at load time. If you need M * N parallel work, restructure to a single map
over a flattened list.
---
## end
Terminates execution and returns a final result.
@@ -589,15 +706,16 @@ Nodes route via three mechanisms in priority order:
### Routing requirements per node type
| Node type | Needs `next`? |
|-------------|---------------------------------------------------------------------------------------------------|
| `agent` | **Yes** - `next` is required (unless the agent node is unreachable). Error at runtime if missing. |
| `script` | Either `_next` from script output OR static `next` (or `fallback` on failure). Error if neither. |
| `approval` | No - routing is via `routes` and `on_other`. `next` is ignored. |
| `input` | **Yes** - `next` is the success route. |
| `llm` | **Yes** - `next` is the success route (and the default for failures without `fallback`). |
| `rag` | **Yes** - `next` is required. Error at runtime if missing. |
| `end` | No - terminal. |
| Node type | Needs `next`? | Supports fan-out (`next: [...]`)? |
|-------------|---------------------------------------------------------------------------------------------------|-----------------------------------|
| `agent` | **Yes** - `next` is required (unless the agent node is unreachable). Error at runtime if missing. | Yes |
| `script` | Either `_next` from script output OR static `next` (or `fallback` on failure). Error if neither. | Yes (when `_next` is not emitted) |
| `approval` | No - routing is via `routes` and `on_other`. `next` is ignored. | No - forbidden by validator |
| `input` | **Yes** - `next` is the success route. | No - forbidden by validator |
| `llm` | **Yes** - `next` is the success route (and the default for failures without `fallback`). | Yes (success path; failure with `fallback` routes to single target) |
| `rag` | **Yes** - `next` is required. Error at runtime if missing. | Yes |
| `map` | **Yes** - `next` is where the parent super-step continues after the map collects. | Yes |
| `end` | No - terminal. | n/a |
### Tolerant-fail contract
@@ -613,6 +731,254 @@ their failures propagate as graph failures.
---
# Parallel Execution
Graph agents support two flavors of parallel execution:
- **Static fan-out:** A node declares `next: [a, b, c]` to dispatch to
multiple branches at once.
- **Dynamic fan-out:** A `map` node spawns one parallel sub-branch per
item in a runtime-determined list.
Both share the same underlying execution model (a **super-step
scheduler**) and the same merge primitive (**declared reducers**).
## The super-step model
The executor advances the graph in discrete **super-steps**. Each
super-step starts with a **frontier** (a set of nodes to run), spawns all
of them concurrently via `tokio::spawn`, waits for every branch to
finish, merges their writes, and computes the next frontier from each
branch's routing decision.
```
super-step 1: frontier = { triage } -> triage runs alone
super-step 2: frontier = { fetch_web, fetch_db } -> both run in parallel
super-step 3: frontier = { synthesize } -> single node, sequential
super-step 4: frontier = { done } -> terminal
```
This is the same Bulk Synchronous Parallel model that
[LangGraph](https://langchain-ai.github.io/langgraph/) uses for the same
problem. Each super-step is **transactional**: if any branch in a
super-step errors, the entire super-step is rolled back. That means no partial
writes leak into live state.
## Static fan-out
The simplest parallel pattern: write a list in `next:`.
```yaml
triage:
type: llm
prompt: "Decide which sources to consult."
next: [fetch_web, fetch_local, fetch_docs] # three branches run in parallel
```
After `triage` completes, the next super-step runs `fetch_web`,
`fetch_local`, and `fetch_docs` concurrently. Each is a normal node that
does its work, writes to state, and routes to its own `next:`. When all
three branches converge on the same target (e.g. `next: synthesize`), the
join happens automatically: the executor dedups the next frontier, so
`synthesize` runs once after all three predecessors finish.
```yaml
fetch_web: { type: llm, prompt: "Search web for {{topic}}.", next: synthesize }
fetch_local: { type: rag, documents: ["./knowledge/"], next: synthesize }
fetch_docs: { type: llm, prompt: "Cite docs for {{topic}}.", next: synthesize }
synthesize: # runs once, after all three finish
type: llm
prompt: "Combine: {{web}}, {{local}}, {{docs}}"
```
**Which node types support static fan-out?** `agent`, `script`, `llm`,
`rag`, and `map`. Approval and input nodes cannot fan out (the validator
rejects them as immediate fan-out targets, meaning N concurrent user prompts
would be unusable).
## Dynamic fan-out via `map`
When the number of branches depends on runtime data (the count of search
results, the items returned by a script, etc.), use the `map` node type.
See the [map node](#map) section above for a full reference. Briefly:
```yaml
fan_out_subjects:
type: map
over: "{{subjects}}" # list determined at run-time
as: subject
branch: research_subject
collect_into: research_results # array, in input order
next: rank
```
The map executor spawns one sub-branch per `subjects` item, each
invoking `research_subject` with `{{subject}}` bound to the item.
Outputs are collected into `research_results` **in input order**
regardless of which sub-branch finished first.
## Reducers
When two or more parallel branches write to the same state key in the
same super-step, the executor merges their writes through a declared
**reducer**. Without a reducer for a contended key, the load-time
validator errors with a clear message naming the conflicting writers.
Declare reducers at the graph root:
```yaml
reducers:
sources: append # see table below
cost_usd: sum
context: concat
config: merge
```
### Built-in reducers
| Reducer | Semantics | Required types |
|-------------|---------------------------------------------------------------------------|-----------------------------------------|
| `append` | Push each branch's value onto an array (creates `[v1]` from nothing, then appends `v2`, `v3`, …) | Any value type |
| `extend` | Concatenate arrays (`[a, b] + [c, d] = [a, b, c, d]`) | Both sides must be Array |
| `concat` | Join strings with `\n` separator | Both sides must be String |
| `sum` | Numeric addition | Both sides must be Number |
| `max` | Numeric max | Both sides must be Number |
| `min` | Numeric min | Both sides must be Number |
| `merge` | Object union (incoming wins on key collision) | Both sides must be Object |
| `overwrite` | Last write wins (explicit opt-in to non-deterministic behavior) | Any value type |
Integer typing is preserved through numeric reducers when possible;
e.g. `sum` of `5` and `7` stays `Number::i64(12)`, not `Number::f64(12.0)`.
### Determinism
The merge order is deterministic across runs: branches are sorted by
`(node_id, invocation_index)` before reducers are applied. For static
fan-out the node ids are distinct so the ordering is clear. For map
sub-branches all sharing the same `branch:` node id, the
`invocation_index` is the input-list position, so order matches the
`over:` list.
This means non-commutative reducers (`concat`, `merge`) produce
reproducible output across runs.
### Single-writer keys don't need reducers
A state key written by only one branch in a super-step doesn't require a
declaration. Only keys with 2+ writers in the same super-step trigger
the "missing reducer" load-time error.
### Validator-enforced
When the graph is loaded, the validator computes each node's **write set**
(`state_updates` keys unioned with the `output_schema` top-level properties for LLM /
agent nodes) and intersects them across parallel groups. Any key with
multiple writers and no reducer in the `reducers:` block is rejected at
load time with a message naming all the writers:
```
nodes [retrieve_web, retrieve_docs] all write key 'summary' in the same
parallel super-step but no reducer is declared for 'summary'. Add
`reducers: { summary: <reducer> }` at the graph root, or rename one
node's output.
```
## Branch isolation
Each parallel branch runs against an **independent state fork**. Writes
to one branch's state don't affect siblings. At the super-step
boundary, the executor extracts each branch's writes (the diff between
the branch's final state and the snapshot at super-step start) and
merges them via the reducer pipeline.
This means:
- Branches can freely mutate `output`, internal counters, etc. without
worrying about siblings
- Race conditions are impossible since there is no shared mutable state during
the parallel phase
- Branches cannot communicate with each other mid-super-step. If you
need that pattern, sequentialize the work or write it as a single
multi-step subgraph
## Multi-branch UX
In a TTY:
- Each parallel branch gets its own labeled spinner (e.g. `[fetch_web]
running… 2.3s`), drawn side-by-side via
[indicatif](https://docs.rs/indicatif/).
- When a branch completes successfully, its spinner finalizes to
`✓ done (2.3s)`. On failure: `✗ failed (2.3s) — <error excerpt>`.
- Map sub-branches are labeled `[<branch_id>[<idx>]]` so multiple
invocations of the same branch node stay distinguishable.
In non-TTY mode (CI, piped output), spinners are suppressed entirely so
no spinner garbage in captured logs.
## Streaming behavior
When a single node runs (sequential super-step, `frontier.len() == 1`),
LLM tokens stream to stdout normally and thus the user sees them as they
arrive.
When multiple branches run concurrently (multi-node super-step or any
map fan-out), **token streaming is suppressed across all parallel
branches**. Tokens are still buffered internally so the full response
reaches state via the normal `output_schema` / `state_updates` pathway;
they just don't print mid-flight. This avoids the interleaved-tokens
mess that would otherwise happen with N concurrent LLM streams writing
to one stdout.
After the super-step joins, downstream nodes resume normal streaming.
## Error propagation
Errors in a parallel super-step are transactional:
- If any branch returns an error and isn't recovered by its node's
`fallback:`, the **entire super-step is aborted**. This means no partial writes
applied, and the error propagates up.
- The error message includes the failing branch's node id, e.g.
`at node 'worker_b': simulated failure`.
- Sibling branches that already completed are dropped (their writes are
discarded along with the failing branch's).
This matches LangGraph's transactional super-step semantics.
## Multi-End rejection
If two or more parallel branches in the same super-step both route to
End nodes, the executor errors:
```
super-step ended with multiple End targets (end_a, end_b). Fan-out
branches must converge at a join node before terminating. To fix: route
all parallel branches to a single shared next-node, then terminate from
there.
```
This is intentional. Terminating from two places in one super-step
almost always indicates a graph-design mistake (the user probably meant
for both branches to feed into a shared end node).
## Approval and input nodes are forbidden in fan-out
The validator rejects any approval or input node that's an immediate
fan-out target (`next: [approve, ...]` or as a map's `branch:`). The
reason is UX: ten concurrent "type your answer" prompts would
fundamentally not work in a CLI. Put approval / input nodes **after**
the join, downstream of the merge.
## What sequential graphs see
If your graph has no fan-out and no map node, **nothing changes**. The
super-step scheduler runs sequentially (one node per super-step).
The parallel infrastructure is opt-in via `next: [...]` or a `map` node.
---
# Structured Output (`output_schema`)
Both `llm` and `agent` nodes can specify an `output_schema` field: a JSON
@@ -797,6 +1163,25 @@ startup.
- `llm` node referencing an unknown tool or `mcp:<server>` in its `tools`
whitelist, or an unknown `model`. Validated against the agent's tool,
MCP-server, and model sets
- **Parallel write collision without a reducer**: two or more nodes in the
same parallel super-step (the immediate `next: [...]` targets of one
source node) write to the same state key, and the `reducers:` block has
no entry for that key. Error names all the colliding writers
- **Approval or input as a fan-out target**: the immediate target of a
`next: [...]` is an `approval` or `input` node. These would queue N
concurrent user prompts, which is a UX disaster. So they're rejected at load time
- **Map branch violates strict mode**: the validator enforces that a
`map` node's `branch:` target (1) is `llm`, `agent`, `rag`, or `script`
(not approval/input/end/another map); (2) has no `next:` declared;
(3) writes only to its `output_key` via `state_updates` (if any); (4)
has no `output_schema` declared
- **Script in a parallel branch with no `state_updates`**: a script node
whose immediate predecessor is a fan-out has unknown writes from the
validator's perspective (scripts can emit arbitrary JSON to state).
Declare an empty `state_updates: {}` to acknowledge known writes, or
add explicit declarations
- **`max_concurrency` set to 0**: either `settings.max_concurrency: 0` or
`MapNode.max_concurrency: 0` is rejected (would deadlock the semaphore)
**Warnings (printed, execution continues)**:
@@ -933,6 +1318,36 @@ A short, honest list of things that bite people:
- **Script extensions are exactly `.sh`, `.py`, `.ts`**. No JavaScript,
no Ruby, no Lua. Python must be available as `python3` and TypeScript
requires `npx tsx` on PATH.
- **Token streaming is suppressed during parallel super-steps**.
Sequential graphs stream tokens to stdout normally; multi-branch
super-steps (`next: [...]` fan-out or any map fan-out) buffer tokens
silently and emit only the final outputs after the join. Avoids the
interleaved-tokens mess that would otherwise mix N LLM streams into
one stdout. Log lines (node entry / routing decisions) still appear,
just without the per-token output.
- **Map sub-branches always run in Silent mode** even when there's only
one item in `over:`. If you want streaming visibility for a 1-element
map, restructure the graph to avoid the map entirely.
- **`max_loop_iterations` doesn't count map sub-branches**. The
per-node visit cap fires for cycle / runaway-loop detection on graph
edges. A `map` over a 1,000-element `over:` runs the branch 1,000
times but increments the branch's loop count by zero. The `over:`
array bounds the iteration count, not the cycle detector.
- **Map sub-branches' tool-call history doesn't accumulate** across
invocations. Each sub-branch gets a fresh `ToolCallTracker`. The
parent's tool-call history is unchanged after the map completes (the
branch RequestContext clones are discarded at super-step boundary).
- **A reducer's type-mismatch error fires at runtime, not load time**.
(e.g. `sum` declared on a key but a branch writes a string). The
validator can't tell statically what type a node will write, so the
reducer apply phase errors loudly with a message like
`reducer 'sum' on key 'cost' requires numeric values, got String("forty two")`.
- **`fallback:` distinguishes failure from success cleanly now**. LLM
nodes (and RAG via no-fallback default) return a typed outcome
(`Continue` for success, `FellBack(target)` for failure-with-fallback),
so a fan-out LLM with `fallback:` matching one of its `next:` targets
works correctly. The fallback path routes only to the fallback, not
all Many targets.
---