diff --git a/Graph-Agents.md b/Graph-Agents.md index a33d6f5..c4f5d4e 100644 --- a/Graph-Agents.md +++ b/Graph-Agents.md @@ -11,6 +11,9 @@ Graph agents are best for workflows that: - Mix LLM calls with deterministic steps (scripts, user prompts) - Need explicit human-in-the-loop checkpoints - Benefit from per-step model / tool / temperature overrides +- **Fan out work across parallel branches:** Research three sources at once, + map an LLM call over a runtime-determined list of inputs, then merge results + back through declared reducers (see [Parallel Execution](#parallel-execution)) If you just want an agent that takes a goal and figures out the steps on its own, stick with a regular [agent](Agents). @@ -73,6 +76,11 @@ settings: log_state_snapshots: true # log state JSON before each node executes validate_before_run: true # run the graph validator on startup timeout: 600 # optional overall timeout in seconds + max_concurrency: 8 # cap on simultaneously running parallel branches; default 8 + +reducers: # optional; declares how parallel branches merge writes + results: append # see "Parallel Execution" below + cost_usd: sum initial_state: # optional seed state for the run topic: "auth" @@ -108,6 +116,16 @@ nodes: - **`initial_state`:** A JSON-compatible object. Values are seeded into state before any node runs and are referenced from any node via `{{key}}` templates. +- **`max_concurrency`:** Caps the number of branches that can execute + concurrently in any single parallel super-step. Default 8. Per-`map` nodes + can override this with their own `max_concurrency` field. The graph-wide + cap is a safety net against LLM rate-limit blowups when a `next: [a, b, c]` + fan-out is followed by deeper parallelism. Must be `>= 1`. +- **`reducers`:** Declares how state-key writes from concurrent branches are + combined. Required for any state key that two or more parallel branches + write in the same super-step (the validator catches missing reducers at + load time). See [Parallel Execution](#parallel-execution) for the full + reducer reference and merge semantics. ### `{{initial_prompt}}`: Automatically Seeded @@ -134,20 +152,26 @@ You do not need to (and should not) put `initial_prompt` in `initial_state` as i # Node Types -There are seven node types: **agent**, **script**, **approval**, **input**, -**llm**, **rag**, and **end**. Every node has these common fields: +There are eight node types: **agent**, **script**, **approval**, **input**, +**llm**, **rag**, **map**, and **end**. Every node has these common fields: ```yaml my_node: id: my_node # must match the map key - type: + type: description: optional # free-form next: another_node # optional default next node; semantics vary per type + # OR + next: [a, b, c] # fan out to multiple branches in parallel ``` The `next` field defines the default routing edge. Node types interpret it differently (some types ignore it in favor of internal routing; see each type -below). +below). **`next:` is polymorphic**: a single string (`next: foo`) routes +sequentially; a list (`next: [a, b, c]`) declares a static fan-out. Every +target runs as a parallel branch in the next super-step. Approval and input +nodes cannot use the list form (the validator rejects parallel user +prompts). See [Parallel Execution](#parallel-execution) for the full model. --- @@ -494,6 +518,99 @@ embedding/chunking cost is paid once, at load time. --- +## map + +Dynamic fan-out over a runtime-determined list. The `map` node reads a JSON +array out of state (via the `over:` template), spawns one parallel +sub-branch per item invoking the same branch node, and collects each +sub-branch's output into a state array in input order. + +This is how a graph agent runs "research these N subjects in parallel" when +N is determined at run-time. For a fixed-cardinality fan-out (you know the +exact set of branches at YAML-write time), use `next: [a, b, c]` on the +preceding node instead. + +```yaml +fan_out_subjects: + id: fan_out_subjects + type: map + over: "{{subjects}}" # required; template must resolve to a JSON array + as: subject # required; bound name inside each sub-branch + branch: research_subject # required; node id to invoke once per item + output_key: output # optional; state key the branch writes (default "output") + collect_into: research_results # required; state key to receive the array of outputs + max_concurrency: 5 # optional; overrides settings.max_concurrency + next: rank # required; where to go after the map collects + +research_subject: # the branch node (atomic — runs N times) + id: research_subject + type: llm + prompt: "Research {{subject}} in depth." + state_updates: + output: "{{output}}" # branch must write to `output_key` + # Note: no `next:` on a map branch; map branches are atomic +``` + +### Field reference + +- **`over`:** A template expression that must resolve to a JSON array of + items. Each item becomes one sub-branch invocation. If the array is + empty, the map runs zero sub-branches and writes `[]` to `collect_into` + (which is correct semantically. I.e. mapping over nothing yields nothing). +- **`as`:** The state key under which each item is bound inside its + sub-branch. The branch node then references it via `{{}}` in its + prompt / templates. +- **`branch`:** The id of the node to invoke per item. The branch's + outputs across all sub-branches are aggregated into `collect_into`. +- **`output_key`:** The state key each sub-branch must write its result + to. Default `"output"`. The map executor reads this from each + sub-branch's state, extracts the value, and inserts it into the + `collect_into` array. +- **`collect_into`:** The state key in the **parent's** state that + receives the array of all sub-branch outputs, **in input-list order + regardless of which sub-branch finished first**. This is the map's + user-visible determinism contract. +- **`max_concurrency`:** Per-map cap on concurrent sub-branches; falls + back to `settings.max_concurrency` (default 8) when unset. Must be + `>= 1` if set. +- **`next`:** Required. Where the parent super-step continues after the + map collects. + +### Branch node constraints + +Map branches are **atomic**. One node, one execution per item, no +chaining inside the branch. The validator enforces several rules at load +time: + +- Branch type must be `llm`, `agent`, `rag`, or `script`. Approval, input, + end, or another map node as the branch is a load-time error. +- Branch must **not** have `next:` declared. Anything chained off the + branch belongs after the map (downstream of the `collect_into` write). +- Branch's `state_updates` keys must be a subset of `output_key`. The + branch can write to its own `output_key` and nowhere else. +- Branch must **not** declare `output_schema`. Top-level schema-properties + would auto-merge into state, polluting it across N parallel branches. + Use explicit `state_updates: { output: "{{output}}" }` instead. + +If a sub-branch fails to write `output_key`, the map node errors loudly +(no silent missing-output behavior). + +### Empty `over` + +If `over` resolves to `[]` (an empty array), the map invokes zero +sub-branches and writes an empty array to `collect_into`. This is **not** +an error. It's the correct semantic for "map over nothing yields +nothing". If you want to reject empty `over` as an error, gate it with a +script node upstream. + +### Nested map nodes are not supported in v1 + +A map branch cannot itself be another map node. The validator rejects this +at load time. If you need M * N parallel work, restructure to a single map +over a flattened list. + +--- + ## end Terminates execution and returns a final result. @@ -589,15 +706,16 @@ Nodes route via three mechanisms in priority order: ### Routing requirements per node type -| Node type | Needs `next`? | -|-------------|---------------------------------------------------------------------------------------------------| -| `agent` | **Yes** - `next` is required (unless the agent node is unreachable). Error at runtime if missing. | -| `script` | Either `_next` from script output OR static `next` (or `fallback` on failure). Error if neither. | -| `approval` | No - routing is via `routes` and `on_other`. `next` is ignored. | -| `input` | **Yes** - `next` is the success route. | -| `llm` | **Yes** - `next` is the success route (and the default for failures without `fallback`). | -| `rag` | **Yes** - `next` is required. Error at runtime if missing. | -| `end` | No - terminal. | +| Node type | Needs `next`? | Supports fan-out (`next: [...]`)? | +|-------------|---------------------------------------------------------------------------------------------------|-----------------------------------| +| `agent` | **Yes** - `next` is required (unless the agent node is unreachable). Error at runtime if missing. | Yes | +| `script` | Either `_next` from script output OR static `next` (or `fallback` on failure). Error if neither. | Yes (when `_next` is not emitted) | +| `approval` | No - routing is via `routes` and `on_other`. `next` is ignored. | No - forbidden by validator | +| `input` | **Yes** - `next` is the success route. | No - forbidden by validator | +| `llm` | **Yes** - `next` is the success route (and the default for failures without `fallback`). | Yes (success path; failure with `fallback` routes to single target) | +| `rag` | **Yes** - `next` is required. Error at runtime if missing. | Yes | +| `map` | **Yes** - `next` is where the parent super-step continues after the map collects. | Yes | +| `end` | No - terminal. | n/a | ### Tolerant-fail contract @@ -613,6 +731,254 @@ their failures propagate as graph failures. --- +# Parallel Execution + +Graph agents support two flavors of parallel execution: + +- **Static fan-out:** A node declares `next: [a, b, c]` to dispatch to + multiple branches at once. +- **Dynamic fan-out:** A `map` node spawns one parallel sub-branch per + item in a runtime-determined list. + +Both share the same underlying execution model (a **super-step +scheduler**) and the same merge primitive (**declared reducers**). + +## The super-step model + +The executor advances the graph in discrete **super-steps**. Each +super-step starts with a **frontier** (a set of nodes to run), spawns all +of them concurrently via `tokio::spawn`, waits for every branch to +finish, merges their writes, and computes the next frontier from each +branch's routing decision. + +``` +super-step 1: frontier = { triage } -> triage runs alone +super-step 2: frontier = { fetch_web, fetch_db } -> both run in parallel +super-step 3: frontier = { synthesize } -> single node, sequential +super-step 4: frontier = { done } -> terminal +``` + +This is the same Bulk Synchronous Parallel model that +[LangGraph](https://langchain-ai.github.io/langgraph/) uses for the same +problem. Each super-step is **transactional**: if any branch in a +super-step errors, the entire super-step is rolled back. That means no partial +writes leak into live state. + +## Static fan-out + +The simplest parallel pattern: write a list in `next:`. + +```yaml +triage: + type: llm + prompt: "Decide which sources to consult." + next: [fetch_web, fetch_local, fetch_docs] # three branches run in parallel +``` + +After `triage` completes, the next super-step runs `fetch_web`, +`fetch_local`, and `fetch_docs` concurrently. Each is a normal node that +does its work, writes to state, and routes to its own `next:`. When all +three branches converge on the same target (e.g. `next: synthesize`), the +join happens automatically: the executor dedups the next frontier, so +`synthesize` runs once after all three predecessors finish. + +```yaml +fetch_web: { type: llm, prompt: "Search web for {{topic}}.", next: synthesize } +fetch_local: { type: rag, documents: ["./knowledge/"], next: synthesize } +fetch_docs: { type: llm, prompt: "Cite docs for {{topic}}.", next: synthesize } + +synthesize: # runs once, after all three finish + type: llm + prompt: "Combine: {{web}}, {{local}}, {{docs}}" +``` + +**Which node types support static fan-out?** `agent`, `script`, `llm`, +`rag`, and `map`. Approval and input nodes cannot fan out (the validator +rejects them as immediate fan-out targets, meaning N concurrent user prompts +would be unusable). + +## Dynamic fan-out via `map` + +When the number of branches depends on runtime data (the count of search +results, the items returned by a script, etc.), use the `map` node type. +See the [map node](#map) section above for a full reference. Briefly: + +```yaml +fan_out_subjects: + type: map + over: "{{subjects}}" # list determined at run-time + as: subject + branch: research_subject + collect_into: research_results # array, in input order + next: rank +``` + +The map executor spawns one sub-branch per `subjects` item, each +invoking `research_subject` with `{{subject}}` bound to the item. +Outputs are collected into `research_results` **in input order** +regardless of which sub-branch finished first. + +## Reducers + +When two or more parallel branches write to the same state key in the +same super-step, the executor merges their writes through a declared +**reducer**. Without a reducer for a contended key, the load-time +validator errors with a clear message naming the conflicting writers. + +Declare reducers at the graph root: + +```yaml +reducers: + sources: append # see table below + cost_usd: sum + context: concat + config: merge +``` + +### Built-in reducers + +| Reducer | Semantics | Required types | +|-------------|---------------------------------------------------------------------------|-----------------------------------------| +| `append` | Push each branch's value onto an array (creates `[v1]` from nothing, then appends `v2`, `v3`, …) | Any value type | +| `extend` | Concatenate arrays (`[a, b] + [c, d] = [a, b, c, d]`) | Both sides must be Array | +| `concat` | Join strings with `\n` separator | Both sides must be String | +| `sum` | Numeric addition | Both sides must be Number | +| `max` | Numeric max | Both sides must be Number | +| `min` | Numeric min | Both sides must be Number | +| `merge` | Object union (incoming wins on key collision) | Both sides must be Object | +| `overwrite` | Last write wins (explicit opt-in to non-deterministic behavior) | Any value type | + +Integer typing is preserved through numeric reducers when possible; +e.g. `sum` of `5` and `7` stays `Number::i64(12)`, not `Number::f64(12.0)`. + +### Determinism + +The merge order is deterministic across runs: branches are sorted by +`(node_id, invocation_index)` before reducers are applied. For static +fan-out the node ids are distinct so the ordering is clear. For map +sub-branches all sharing the same `branch:` node id, the +`invocation_index` is the input-list position, so order matches the +`over:` list. + +This means non-commutative reducers (`concat`, `merge`) produce +reproducible output across runs. + +### Single-writer keys don't need reducers + +A state key written by only one branch in a super-step doesn't require a +declaration. Only keys with 2+ writers in the same super-step trigger +the "missing reducer" load-time error. + +### Validator-enforced + +When the graph is loaded, the validator computes each node's **write set** +(`state_updates` keys unioned with the `output_schema` top-level properties for LLM / +agent nodes) and intersects them across parallel groups. Any key with +multiple writers and no reducer in the `reducers:` block is rejected at +load time with a message naming all the writers: + +``` +nodes [retrieve_web, retrieve_docs] all write key 'summary' in the same +parallel super-step but no reducer is declared for 'summary'. Add +`reducers: { summary: }` at the graph root, or rename one +node's output. +``` + +## Branch isolation + +Each parallel branch runs against an **independent state fork**. Writes +to one branch's state don't affect siblings. At the super-step +boundary, the executor extracts each branch's writes (the diff between +the branch's final state and the snapshot at super-step start) and +merges them via the reducer pipeline. + +This means: + +- Branches can freely mutate `output`, internal counters, etc. without + worrying about siblings +- Race conditions are impossible since there is no shared mutable state during + the parallel phase +- Branches cannot communicate with each other mid-super-step. If you + need that pattern, sequentialize the work or write it as a single + multi-step subgraph + +## Multi-branch UX + +In a TTY: + +- Each parallel branch gets its own labeled spinner (e.g. `[fetch_web] + running… 2.3s`), drawn side-by-side via + [indicatif](https://docs.rs/indicatif/). +- When a branch completes successfully, its spinner finalizes to + `✓ done (2.3s)`. On failure: `✗ failed (2.3s) — `. +- Map sub-branches are labeled `[[]]` so multiple + invocations of the same branch node stay distinguishable. + +In non-TTY mode (CI, piped output), spinners are suppressed entirely so +no spinner garbage in captured logs. + +## Streaming behavior + +When a single node runs (sequential super-step, `frontier.len() == 1`), +LLM tokens stream to stdout normally and thus the user sees them as they +arrive. + +When multiple branches run concurrently (multi-node super-step or any +map fan-out), **token streaming is suppressed across all parallel +branches**. Tokens are still buffered internally so the full response +reaches state via the normal `output_schema` / `state_updates` pathway; +they just don't print mid-flight. This avoids the interleaved-tokens +mess that would otherwise happen with N concurrent LLM streams writing +to one stdout. + +After the super-step joins, downstream nodes resume normal streaming. + +## Error propagation + +Errors in a parallel super-step are transactional: + +- If any branch returns an error and isn't recovered by its node's + `fallback:`, the **entire super-step is aborted**. This means no partial writes + applied, and the error propagates up. +- The error message includes the failing branch's node id, e.g. + `at node 'worker_b': simulated failure`. +- Sibling branches that already completed are dropped (their writes are + discarded along with the failing branch's). + +This matches LangGraph's transactional super-step semantics. + +## Multi-End rejection + +If two or more parallel branches in the same super-step both route to +End nodes, the executor errors: + +``` +super-step ended with multiple End targets (end_a, end_b). Fan-out +branches must converge at a join node before terminating. To fix: route +all parallel branches to a single shared next-node, then terminate from +there. +``` + +This is intentional. Terminating from two places in one super-step +almost always indicates a graph-design mistake (the user probably meant +for both branches to feed into a shared end node). + +## Approval and input nodes are forbidden in fan-out + +The validator rejects any approval or input node that's an immediate +fan-out target (`next: [approve, ...]` or as a map's `branch:`). The +reason is UX: ten concurrent "type your answer" prompts would +fundamentally not work in a CLI. Put approval / input nodes **after** +the join, downstream of the merge. + +## What sequential graphs see + +If your graph has no fan-out and no map node, **nothing changes**. The +super-step scheduler runs sequentially (one node per super-step). +The parallel infrastructure is opt-in via `next: [...]` or a `map` node. + +--- + # Structured Output (`output_schema`) Both `llm` and `agent` nodes can specify an `output_schema` field: a JSON @@ -797,6 +1163,25 @@ startup. - `llm` node referencing an unknown tool or `mcp:` in its `tools` whitelist, or an unknown `model`. Validated against the agent's tool, MCP-server, and model sets +- **Parallel write collision without a reducer**: two or more nodes in the + same parallel super-step (the immediate `next: [...]` targets of one + source node) write to the same state key, and the `reducers:` block has + no entry for that key. Error names all the colliding writers +- **Approval or input as a fan-out target**: the immediate target of a + `next: [...]` is an `approval` or `input` node. These would queue N + concurrent user prompts, which is a UX disaster. So they're rejected at load time +- **Map branch violates strict mode**: the validator enforces that a + `map` node's `branch:` target (1) is `llm`, `agent`, `rag`, or `script` + (not approval/input/end/another map); (2) has no `next:` declared; + (3) writes only to its `output_key` via `state_updates` (if any); (4) + has no `output_schema` declared +- **Script in a parallel branch with no `state_updates`**: a script node + whose immediate predecessor is a fan-out has unknown writes from the + validator's perspective (scripts can emit arbitrary JSON to state). + Declare an empty `state_updates: {}` to acknowledge known writes, or + add explicit declarations +- **`max_concurrency` set to 0**: either `settings.max_concurrency: 0` or + `MapNode.max_concurrency: 0` is rejected (would deadlock the semaphore) **Warnings (printed, execution continues)**: @@ -933,6 +1318,36 @@ A short, honest list of things that bite people: - **Script extensions are exactly `.sh`, `.py`, `.ts`**. No JavaScript, no Ruby, no Lua. Python must be available as `python3` and TypeScript requires `npx tsx` on PATH. +- **Token streaming is suppressed during parallel super-steps**. + Sequential graphs stream tokens to stdout normally; multi-branch + super-steps (`next: [...]` fan-out or any map fan-out) buffer tokens + silently and emit only the final outputs after the join. Avoids the + interleaved-tokens mess that would otherwise mix N LLM streams into + one stdout. Log lines (node entry / routing decisions) still appear, + just without the per-token output. +- **Map sub-branches always run in Silent mode** even when there's only + one item in `over:`. If you want streaming visibility for a 1-element + map, restructure the graph to avoid the map entirely. +- **`max_loop_iterations` doesn't count map sub-branches**. The + per-node visit cap fires for cycle / runaway-loop detection on graph + edges. A `map` over a 1,000-element `over:` runs the branch 1,000 + times but increments the branch's loop count by zero. The `over:` + array bounds the iteration count, not the cycle detector. +- **Map sub-branches' tool-call history doesn't accumulate** across + invocations. Each sub-branch gets a fresh `ToolCallTracker`. The + parent's tool-call history is unchanged after the map completes (the + branch RequestContext clones are discarded at super-step boundary). +- **A reducer's type-mismatch error fires at runtime, not load time**. + (e.g. `sum` declared on a key but a branch writes a string). The + validator can't tell statically what type a node will write, so the + reducer apply phase errors loudly with a message like + `reducer 'sum' on key 'cost' requires numeric values, got String("forty two")`. +- **`fallback:` distinguishes failure from success cleanly now**. LLM + nodes (and RAG via no-fallback default) return a typed outcome + (`Continue` for success, `FellBack(target)` for failure-with-fallback), + so a fan-out LLM with `fallback:` matching one of its `next:` targets + works correctly. The fallback path routes only to the fallback, not + all Many targets. ---