docs: Updated the graph agents docs to discuss parallelization and clarify fan-out

2026-05-20 16:49:23 -06:00
parent 93801ea16e
commit c88b5be6da
1 changed files with 428 additions and 13 deletions
@@ -11,6 +11,9 @@ Graph agents are best for workflows that:
 - Mix LLM calls with deterministic steps (scripts, user prompts)
 - Need explicit human-in-the-loop checkpoints
 - Benefit from per-step model / tool / temperature overrides
+- **Fan out work across parallel branches:** Research three sources at once,
+  map an LLM call over a runtime-determined list of inputs, then merge results
+  back through declared reducers (see [Parallel Execution](#parallel-execution))

 If you just want an agent that takes a goal and figures out the steps on its
 own, stick with a regular [agent](Agents).
@@ -73,6 +76,11 @@ settings:
  log_state_snapshots: true    # log state JSON before each node executes
  validate_before_run: true    # run the graph validator on startup
  timeout: 600                 # optional overall timeout in seconds
+  max_concurrency: 8           # cap on simultaneously running parallel branches; default 8
+
+reducers:                      # optional; declares how parallel branches merge writes
+  results: append              # see "Parallel Execution" below
+  cost_usd: sum

 initial_state:                 # optional seed state for the run
  topic: "auth"
@@ -108,6 +116,16 @@ nodes:
 - **`initial_state`:** A JSON-compatible object. Values are seeded into
  state before any node runs and are referenced from any node via `{{key}}`
  templates.
+- **`max_concurrency`:** Caps the number of branches that can execute
+  concurrently in any single parallel super-step. Default 8. Per-`map` nodes
+  can override this with their own `max_concurrency` field. The graph-wide
+  cap is a safety net against LLM rate-limit blowups when a `next: [a, b, c]`
+  fan-out is followed by deeper parallelism. Must be `>= 1`.
+- **`reducers`:** Declares how state-key writes from concurrent branches are
+  combined. Required for any state key that two or more parallel branches
+  write in the same super-step (the validator catches missing reducers at
+  load time). See [Parallel Execution](#parallel-execution) for the full
+  reducer reference and merge semantics.

 ### `{{initial_prompt}}`: Automatically Seeded

@@ -134,20 +152,26 @@ You do not need to (and should not) put `initial_prompt` in `initial_state` as i

 # Node Types

-There are seven node types: **agent**, **script**, **approval**, **input**,
-**llm**, **rag**, and **end**. Every node has these common fields:
+There are eight node types: **agent**, **script**, **approval**, **input**,
+**llm**, **rag**, **map**, and **end**. Every node has these common fields:

 ```yaml
 my_node:
  id: my_node               # must match the map key
-  type: <one of the seven>
+  type: <one of the eight>
  description: optional      # free-form
  next: another_node         # optional default next node; semantics vary per type
+  # OR
+  next: [a, b, c]            # fan out to multiple branches in parallel
 ```

 The `next` field defines the default routing edge. Node types interpret it
 differently (some types ignore it in favor of internal routing; see each type
-below).
+below). **`next:` is polymorphic**: a single string (`next: foo`) routes
+sequentially; a list (`next: [a, b, c]`) declares a static fan-out. Every
+target runs as a parallel branch in the next super-step. Approval and input
+nodes cannot use the list form (the validator rejects parallel user
+prompts). See [Parallel Execution](#parallel-execution) for the full model.

 ---

@@ -494,6 +518,99 @@ embedding/chunking cost is paid once, at load time.

 ---

+## map
+
+Dynamic fan-out over a runtime-determined list. The `map` node reads a JSON
+array out of state (via the `over:` template), spawns one parallel
+sub-branch per item invoking the same branch node, and collects each
+sub-branch's output into a state array in input order.
+
+This is how a graph agent runs "research these N subjects in parallel" when
+N is determined at run-time. For a fixed-cardinality fan-out (you know the
+exact set of branches at YAML-write time), use `next: [a, b, c]` on the
+preceding node instead.
+
+```yaml
+fan_out_subjects:
+  id: fan_out_subjects
+  type: map
+  over: "{{subjects}}"            # required; template must resolve to a JSON array
+  as: subject                     # required; bound name inside each sub-branch
+  branch: research_subject        # required; node id to invoke once per item
+  output_key: output              # optional; state key the branch writes (default "output")
+  collect_into: research_results  # required; state key to receive the array of outputs
+  max_concurrency: 5              # optional; overrides settings.max_concurrency
+  next: rank                      # required; where to go after the map collects
+
+research_subject:                 # the branch node (atomic — runs N times)
+  id: research_subject
+  type: llm
+  prompt: "Research {{subject}} in depth."
+  state_updates:
+    output: "{{output}}"          # branch must write to `output_key`
+  # Note: no `next:` on a map branch; map branches are atomic
+```
+
+### Field reference
+
+- **`over`:** A template expression that must resolve to a JSON array of
+  items. Each item becomes one sub-branch invocation. If the array is
+  empty, the map runs zero sub-branches and writes `[]` to `collect_into`
+  (which is correct semantically. I.e. mapping over nothing yields nothing).
+- **`as`:** The state key under which each item is bound inside its
+  sub-branch. The branch node then references it via `{{<as>}}` in its
+  prompt / templates.
+- **`branch`:** The id of the node to invoke per item. The branch's
+  outputs across all sub-branches are aggregated into `collect_into`.
+- **`output_key`:** The state key each sub-branch must write its result
+  to. Default `"output"`. The map executor reads this from each
+  sub-branch's state, extracts the value, and inserts it into the
+  `collect_into` array.
+- **`collect_into`:** The state key in the **parent's** state that
+  receives the array of all sub-branch outputs, **in input-list order
+  regardless of which sub-branch finished first**. This is the map's
+  user-visible determinism contract.
+- **`max_concurrency`:** Per-map cap on concurrent sub-branches; falls
+  back to `settings.max_concurrency` (default 8) when unset. Must be
+  `>= 1` if set.
+- **`next`:** Required. Where the parent super-step continues after the
+  map collects.
+
+### Branch node constraints
+
+Map branches are **atomic**. One node, one execution per item, no
+chaining inside the branch. The validator enforces several rules at load
+time:
+
+- Branch type must be `llm`, `agent`, `rag`, or `script`. Approval, input,
+  end, or another map node as the branch is a load-time error.
+- Branch must **not** have `next:` declared. Anything chained off the
+  branch belongs after the map (downstream of the `collect_into` write).
+- Branch's `state_updates` keys must be a subset of `output_key`. The
+  branch can write to its own `output_key` and nowhere else.
+- Branch must **not** declare `output_schema`. Top-level schema-properties
+  would auto-merge into state, polluting it across N parallel branches.
+  Use explicit `state_updates: { output: "{{output}}" }` instead.
+
+If a sub-branch fails to write `output_key`, the map node errors loudly
+(no silent missing-output behavior).
+
+### Empty `over`
+
+If `over` resolves to `[]` (an empty array), the map invokes zero
+sub-branches and writes an empty array to `collect_into`. This is **not**
+an error. It's the correct semantic for "map over nothing yields
+nothing". If you want to reject empty `over` as an error, gate it with a
+script node upstream.
+
+### Nested map nodes are not supported in v1
+
+A map branch cannot itself be another map node. The validator rejects this
+at load time. If you need M * N parallel work, restructure to a single map
+over a flattened list.
+
+---
+
 ## end

 Terminates execution and returns a final result.
@@ -589,15 +706,16 @@ Nodes route via three mechanisms in priority order:

 ### Routing requirements per node type

-| Node type   | Needs `next`?                                                                                     |
-|-------------|---------------------------------------------------------------------------------------------------|
-| `agent`     | **Yes** - `next` is required (unless the agent node is unreachable). Error at runtime if missing. |
-| `script`    | Either `_next` from script output OR static `next` (or `fallback` on failure). Error if neither.  |
-| `approval`  | No - routing is via `routes` and `on_other`. `next` is ignored.                                   |
-| `input`     | **Yes** - `next` is the success route.                                                            |
-| `llm`       | **Yes** - `next` is the success route (and the default for failures without `fallback`).          |
-| `rag`       | **Yes** - `next` is required. Error at runtime if missing.                                        |
-| `end`       | No - terminal.                                                                                    |
+| Node type   | Needs `next`?                                                                                     | Supports fan-out (`next: [...]`)? |
+|-------------|---------------------------------------------------------------------------------------------------|-----------------------------------|
+| `agent`     | **Yes** - `next` is required (unless the agent node is unreachable). Error at runtime if missing. | Yes                               |
+| `script`    | Either `_next` from script output OR static `next` (or `fallback` on failure). Error if neither.  | Yes (when `_next` is not emitted) |
+| `approval`  | No - routing is via `routes` and `on_other`. `next` is ignored.                                   | No - forbidden by validator       |
+| `input`     | **Yes** - `next` is the success route.                                                            | No - forbidden by validator       |
+| `llm`       | **Yes** - `next` is the success route (and the default for failures without `fallback`).          | Yes (success path; failure with `fallback` routes to single target) |
+| `rag`       | **Yes** - `next` is required. Error at runtime if missing.                                        | Yes                               |
+| `map`       | **Yes** - `next` is where the parent super-step continues after the map collects.                 | Yes                               |
+| `end`       | No - terminal.                                                                                    | n/a                               |

 ### Tolerant-fail contract

@@ -613,6 +731,254 @@ their failures propagate as graph failures.

 ---

+# Parallel Execution
+
+Graph agents support two flavors of parallel execution:
+
+- **Static fan-out:** A node declares `next: [a, b, c]` to dispatch to
+  multiple branches at once.
+- **Dynamic fan-out:** A `map` node spawns one parallel sub-branch per
+  item in a runtime-determined list.
+
+Both share the same underlying execution model (a **super-step
+scheduler**) and the same merge primitive (**declared reducers**).
+
+## The super-step model
+
+The executor advances the graph in discrete **super-steps**. Each
+super-step starts with a **frontier** (a set of nodes to run), spawns all
+of them concurrently via `tokio::spawn`, waits for every branch to
+finish, merges their writes, and computes the next frontier from each
+branch's routing decision.
+
+```
+super-step 1: frontier = { triage }              -> triage runs alone
+super-step 2: frontier = { fetch_web, fetch_db } -> both run in parallel
+super-step 3: frontier = { synthesize }          -> single node, sequential
+super-step 4: frontier = { done }                -> terminal
+```
+
+This is the same Bulk Synchronous Parallel model that
+[LangGraph](https://langchain-ai.github.io/langgraph/) uses for the same
+problem. Each super-step is **transactional**: if any branch in a
+super-step errors, the entire super-step is rolled back. That means no partial
+writes leak into live state.
+
+## Static fan-out
+
+The simplest parallel pattern: write a list in `next:`.
+
+```yaml
+triage:
+  type: llm
+  prompt: "Decide which sources to consult."
+  next: [fetch_web, fetch_local, fetch_docs]   # three branches run in parallel
+```
+
+After `triage` completes, the next super-step runs `fetch_web`,
+`fetch_local`, and `fetch_docs` concurrently. Each is a normal node that
+does its work, writes to state, and routes to its own `next:`. When all
+three branches converge on the same target (e.g. `next: synthesize`), the
+join happens automatically: the executor dedups the next frontier, so
+`synthesize` runs once after all three predecessors finish.
+
+```yaml
+fetch_web:   { type: llm, prompt: "Search web for {{topic}}.", next: synthesize }
+fetch_local: { type: rag, documents: ["./knowledge/"], next: synthesize }
+fetch_docs:  { type: llm, prompt: "Cite docs for {{topic}}.", next: synthesize }
+
+synthesize:                          # runs once, after all three finish
+  type: llm
+  prompt: "Combine: {{web}}, {{local}}, {{docs}}"
+```
+
+**Which node types support static fan-out?** `agent`, `script`, `llm`,
+`rag`, and `map`. Approval and input nodes cannot fan out (the validator
+rejects them as immediate fan-out targets, meaning N concurrent user prompts
+would be unusable).
+
+## Dynamic fan-out via `map`
+
+When the number of branches depends on runtime data (the count of search
+results, the items returned by a script, etc.), use the `map` node type.
+See the [map node](#map) section above for a full reference. Briefly:
+
+```yaml
+fan_out_subjects:
+  type: map
+  over: "{{subjects}}"            # list determined at run-time
+  as: subject
+  branch: research_subject
+  collect_into: research_results  # array, in input order
+  next: rank
+```
+
+The map executor spawns one sub-branch per `subjects` item, each
+invoking `research_subject` with `{{subject}}` bound to the item.
+Outputs are collected into `research_results` **in input order**
+regardless of which sub-branch finished first.
+
+## Reducers
+
+When two or more parallel branches write to the same state key in the
+same super-step, the executor merges their writes through a declared
+**reducer**. Without a reducer for a contended key, the load-time
+validator errors with a clear message naming the conflicting writers.
+
+Declare reducers at the graph root:
+
+```yaml
+reducers:
+  sources: append      # see table below
+  cost_usd: sum
+  context: concat
+  config: merge
+```
+
+### Built-in reducers
+
+| Reducer     | Semantics                                                                 | Required types                          |
+|-------------|---------------------------------------------------------------------------|-----------------------------------------|
+| `append`    | Push each branch's value onto an array (creates `[v1]` from nothing, then appends `v2`, `v3`, …) | Any value type |
+| `extend`    | Concatenate arrays (`[a, b] + [c, d] = [a, b, c, d]`)                     | Both sides must be Array                |
+| `concat`    | Join strings with `\n` separator                                          | Both sides must be String               |
+| `sum`       | Numeric addition                                                          | Both sides must be Number               |
+| `max`       | Numeric max                                                               | Both sides must be Number               |
+| `min`       | Numeric min                                                               | Both sides must be Number               |
+| `merge`     | Object union (incoming wins on key collision)                             | Both sides must be Object               |
+| `overwrite` | Last write wins (explicit opt-in to non-deterministic behavior)           | Any value type                          |
+
+Integer typing is preserved through numeric reducers when possible; 
+e.g. `sum` of `5` and `7` stays `Number::i64(12)`, not `Number::f64(12.0)`.
+
+### Determinism
+
+The merge order is deterministic across runs: branches are sorted by
+`(node_id, invocation_index)` before reducers are applied. For static
+fan-out the node ids are distinct so the ordering is clear. For map
+sub-branches all sharing the same `branch:` node id, the
+`invocation_index` is the input-list position, so order matches the
+`over:` list.
+
+This means non-commutative reducers (`concat`, `merge`) produce
+reproducible output across runs.
+
+### Single-writer keys don't need reducers
+
+A state key written by only one branch in a super-step doesn't require a
+declaration. Only keys with 2+ writers in the same super-step trigger
+the "missing reducer" load-time error.
+
+### Validator-enforced
+
+When the graph is loaded, the validator computes each node's **write set**
+(`state_updates` keys unioned with the `output_schema` top-level properties for LLM /
+agent nodes) and intersects them across parallel groups. Any key with
+multiple writers and no reducer in the `reducers:` block is rejected at
+load time with a message naming all the writers:
+
+```
+nodes [retrieve_web, retrieve_docs] all write key 'summary' in the same
+parallel super-step but no reducer is declared for 'summary'. Add
+`reducers: { summary: <reducer> }` at the graph root, or rename one
+node's output.
+```
+
+## Branch isolation
+
+Each parallel branch runs against an **independent state fork**. Writes
+to one branch's state don't affect siblings. At the super-step
+boundary, the executor extracts each branch's writes (the diff between
+the branch's final state and the snapshot at super-step start) and
+merges them via the reducer pipeline.
+
+This means:
+
+- Branches can freely mutate `output`, internal counters, etc. without
+  worrying about siblings
+- Race conditions are impossible since there is no shared mutable state during
+  the parallel phase
+- Branches cannot communicate with each other mid-super-step. If you
+  need that pattern, sequentialize the work or write it as a single
+  multi-step subgraph
+
+## Multi-branch UX
+
+In a TTY:
+
+- Each parallel branch gets its own labeled spinner (e.g. `[fetch_web]
+  running… 2.3s`), drawn side-by-side via
+  [indicatif](https://docs.rs/indicatif/).
+- When a branch completes successfully, its spinner finalizes to
+  `✓ done (2.3s)`. On failure: `✗ failed (2.3s) — <error excerpt>`.
+- Map sub-branches are labeled `[<branch_id>[<idx>]]` so multiple
+  invocations of the same branch node stay distinguishable.
+
+In non-TTY mode (CI, piped output), spinners are suppressed entirely so
+no spinner garbage in captured logs.
+
+## Streaming behavior
+
+When a single node runs (sequential super-step, `frontier.len() == 1`),
+LLM tokens stream to stdout normally and thus the user sees them as they
+arrive.
+
+When multiple branches run concurrently (multi-node super-step or any
+map fan-out), **token streaming is suppressed across all parallel
+branches**. Tokens are still buffered internally so the full response
+reaches state via the normal `output_schema` / `state_updates` pathway;
+they just don't print mid-flight. This avoids the interleaved-tokens
+mess that would otherwise happen with N concurrent LLM streams writing
+to one stdout.
+
+After the super-step joins, downstream nodes resume normal streaming.
+
+## Error propagation
+
+Errors in a parallel super-step are transactional:
+
+- If any branch returns an error and isn't recovered by its node's
+  `fallback:`, the **entire super-step is aborted**. This means no partial writes
+  applied, and the error propagates up.
+- The error message includes the failing branch's node id, e.g.
+  `at node 'worker_b': simulated failure`.
+- Sibling branches that already completed are dropped (their writes are
+  discarded along with the failing branch's).
+
+This matches LangGraph's transactional super-step semantics.
+
+## Multi-End rejection
+
+If two or more parallel branches in the same super-step both route to
+End nodes, the executor errors:
+
+```
+super-step ended with multiple End targets (end_a, end_b). Fan-out
+branches must converge at a join node before terminating. To fix: route
+all parallel branches to a single shared next-node, then terminate from
+there.
+```
+
+This is intentional. Terminating from two places in one super-step
+almost always indicates a graph-design mistake (the user probably meant
+for both branches to feed into a shared end node).
+
+## Approval and input nodes are forbidden in fan-out
+
+The validator rejects any approval or input node that's an immediate
+fan-out target (`next: [approve, ...]` or as a map's `branch:`). The
+reason is UX: ten concurrent "type your answer" prompts would
+fundamentally not work in a CLI. Put approval / input nodes **after**
+the join, downstream of the merge.
+
+## What sequential graphs see
+
+If your graph has no fan-out and no map node, **nothing changes**. The
+super-step scheduler runs sequentially (one node per super-step).
+The parallel infrastructure is opt-in via `next: [...]` or a `map` node.
+
+---
+
 # Structured Output (`output_schema`)

 Both `llm` and `agent` nodes can specify an `output_schema` field: a JSON
@@ -797,6 +1163,25 @@ startup.
 - `llm` node referencing an unknown tool or `mcp:<server>` in its `tools`
  whitelist, or an unknown `model`. Validated against the agent's tool,
  MCP-server, and model sets
+- **Parallel write collision without a reducer**: two or more nodes in the
+  same parallel super-step (the immediate `next: [...]` targets of one
+  source node) write to the same state key, and the `reducers:` block has
+  no entry for that key. Error names all the colliding writers
+- **Approval or input as a fan-out target**: the immediate target of a
+  `next: [...]` is an `approval` or `input` node. These would queue N
+  concurrent user prompts, which is a UX disaster. So they're rejected at load time
+- **Map branch violates strict mode**: the validator enforces that a
+  `map` node's `branch:` target (1) is `llm`, `agent`, `rag`, or `script`
+  (not approval/input/end/another map); (2) has no `next:` declared;
+  (3) writes only to its `output_key` via `state_updates` (if any); (4)
+  has no `output_schema` declared
+- **Script in a parallel branch with no `state_updates`**: a script node
+  whose immediate predecessor is a fan-out has unknown writes from the
+  validator's perspective (scripts can emit arbitrary JSON to state).
+  Declare an empty `state_updates: {}` to acknowledge known writes, or
+  add explicit declarations
+- **`max_concurrency` set to 0**: either `settings.max_concurrency: 0` or
+  `MapNode.max_concurrency: 0` is rejected (would deadlock the semaphore)

 **Warnings (printed, execution continues)**:

@@ -933,6 +1318,36 @@ A short, honest list of things that bite people:
 - **Script extensions are exactly `.sh`, `.py`, `.ts`**. No JavaScript,
  no Ruby, no Lua. Python must be available as `python3` and TypeScript
  requires `npx tsx` on PATH.
+- **Token streaming is suppressed during parallel super-steps**.
+  Sequential graphs stream tokens to stdout normally; multi-branch
+  super-steps (`next: [...]` fan-out or any map fan-out) buffer tokens
+  silently and emit only the final outputs after the join. Avoids the
+  interleaved-tokens mess that would otherwise mix N LLM streams into
+  one stdout. Log lines (node entry / routing decisions) still appear,
+  just without the per-token output.
+- **Map sub-branches always run in Silent mode** even when there's only
+  one item in `over:`. If you want streaming visibility for a 1-element
+  map, restructure the graph to avoid the map entirely.
+- **`max_loop_iterations` doesn't count map sub-branches**. The
+  per-node visit cap fires for cycle / runaway-loop detection on graph
+  edges. A `map` over a 1,000-element `over:` runs the branch 1,000
+  times but increments the branch's loop count by zero. The `over:`
+  array bounds the iteration count, not the cycle detector.
+- **Map sub-branches' tool-call history doesn't accumulate** across
+  invocations. Each sub-branch gets a fresh `ToolCallTracker`. The
+  parent's tool-call history is unchanged after the map completes (the
+  branch RequestContext clones are discarded at super-step boundary).
+- **A reducer's type-mismatch error fires at runtime, not load time**.
+  (e.g. `sum` declared on a key but a branch writes a string). The
+  validator can't tell statically what type a node will write, so the
+  reducer apply phase errors loudly with a message like
+  `reducer 'sum' on key 'cost' requires numeric values, got String("forty two")`.
+- **`fallback:` distinguishes failure from success cleanly now**. LLM
+  nodes (and RAG via no-fallback default) return a typed outcome
+  (`Continue` for success, `FellBack(target)` for failure-with-fallback),
+  so a fan-out LLM with `fallback:` matching one of its `next:` targets
+  works correctly. The fallback path routes only to the fallback, not
+  all Many targets.

 ---