docs: updated graph agent docs to reflect improved llm node failure behavior

2026-05-22 18:33:35 -06:00
parent d64231280c
commit 687b0abb32
+25 -21
@@ -428,17 +428,18 @@ whitelist is enforced against global tools, agent custom tools, and MCP
alike. Each entry is validated at startup against the active agent's tool
list; an unknown entry is a startup error.
### Tolerant-fail routing
### Failure routing
| Outcome | Routes to |
|------------------------------------------|----------------------------|
| Success | `next` |
| Failure WITH `fallback` set | `fallback` |
| Failure WITHOUT `fallback` | `next` (output is "LLM node failed: ...") |
| Outcome | Behavior |
|------------------------------------------|-------------------------------------------------------------------------------------------|
| Success | Routes via `next`. |
| Failure with `fallback` set | Routes via `fallback`; `state_updates` are still applied; `{{output}}` holds the error. |
| Failure without `fallback` | **Graph fails at this node** with a clear error message naming the underlying cause. |
`state_updates` are always applied (success or failure). On failure,
`{{output}}` resolves to an error description so downstream nodes can detect
it.
`state_updates` are always applied when the node has a `fallback` route
(success or failure). On failure with no `fallback`, the graph aborts before
downstream nodes run, so downstream `{{output}}` references never see error
strings; the upstream cause is reported instead.
### Retries (`max_attempts`)
@@ -746,22 +747,24 @@ Nodes route via three mechanisms in priority order:
| `script` | Either `_next` from script output OR static `next` (or `fallback` on failure). Error if neither. | Yes (when `_next` is not emitted) |
| `approval` | No - routing is via `routes` and `on_other`. `next` is ignored. | No - forbidden by validator |
| `input` | **Yes** - `next` is the success route. | No - forbidden by validator |
| `llm` | **Yes** - `next` is the success route (and the default for failures without `fallback`). | Yes (success path; failure with `fallback` routes to single target) |
| `llm` | **Yes** - `next` is the success route. Failures without `fallback` halt the graph. | Yes (success path; failure with `fallback` routes to single target) |
| `rag` | **Yes** - `next` is required. Error at runtime if missing. | Yes |
| `map` | **Yes** - `next` is where the parent super-step continues after the map collects. | Yes |
| `end` | No - terminal. | n/a |
### Tolerant-fail contract
### Failure-handling contract
Currently honored by `script` and `llm` nodes:
| Node type | Success | Failure with `fallback` | Failure without `fallback` |
|----------------------------|-----------------------------|-------------------------|-----------------------------------------------------------|
| `llm` | Routes via `next` | Routes via `fallback` | **Graph fails at this node** (use `fallback:` to recover) |
| `script` | Routes via `next` | Routes via `fallback` | Routes via `next`; `{{output}}` holds the error |
| `agent` / `input` | Routes via `next` | n/a (no `fallback`) | Graph fails at this node |
| `rag` / `map` / `approval` | Routes via configured edges | n/a | Graph fails at this node |
- Success -> default routing
- Failure with `fallback` set -> `fallback` target
- Failure without `fallback` -> default routing, with the error description
exposed in state so the next node can react
`agent` and `input` nodes do NOT have a tolerant-fail `fallback` path;
their failures propagate as graph failures.
**Note:** LLM node failures halt the graph with
a clear error message. This prevents downstream nodes from running against
garbage state when an upstream LLM call fails (HTTP 4xx/5xx, timeout,
structured-extraction error, etc.).
---
@@ -1127,8 +1130,9 @@ Schema:
### Tolerant-fail for extraction
- **LLM node**: extraction failure = node failure -> routes via `fallback`
or `next`.
- **LLM node**: extraction failure = node failure. Routes via `fallback`
if declared; otherwise the graph fails at this node with the extractor
error message.
- **Agent node**: extraction failure propagates as a graph error (agent
nodes have no `fallback`).