feat: created new graph-based deep-research agent

2026-05-21 11:27:55 -06:00
parent 738b600fa6
commit 4e88cebe28
13 changed files with 1037 additions and 0 deletions
@@ -0,0 +1,274 @@
+# deep-research
+
+A deep web research agent, built as a Loki graph agent. It plans an
+investigation, decomposes it into sub-questions researched in
+parallel, grounds the work in a local knowledge corpus, vets the
+credibility of cited sources, runs a reflexion self-critique loop to
+revise weak findings, delegates the final write-up to a focused
+sub-agent, checks that the cited sources are reachable, and gates the
+result behind human approval.
+
+Unlike a regular agent (which takes a goal and improvises the steps),
+this agent runs a fixed graph: every request goes through the same
+`plan -> parallel research -> vet -> critique -> synthesize -> verify -> approve`
+pipeline.
+
+This agent is also the **canonical reference for the Loki graph
+system**: it exercises every node type (`script`, `llm`, `rag`, `map`,
+`agent`, `input`, `approval`, `end`) and both static fan-out and
+dynamic `map` fan-out. If you are learning how to build a graph
+agent, this is the file to read alongside the
+[Graph-Agents wiki](https://github.com/Dark-Alex-17/loki/wiki/Graph-Agents).
+
+## Workflow
+
+17 nodes. `->` is the static route; a script node can also route
+dynamically via `_next`. The `▶▶` line is a parallel super-step —
+those branches run concurrently:
+
+```
+parse_request (script)              -> bootstrap_research   (or -> ask_topic if no topic)
+ask_topic (input)                   -> bootstrap_research
+bootstrap_research (script)         -> [plan, knowledge_lookup]   ▶▶ parallel
+plan (llm + output_schema)          -> research_each_question
+knowledge_lookup (rag)              -> research_each_question
+research_each_question (map)        -> combine_findings    (spawns one branch per question)
+  └─ research_one_question (llm)    (atomic; runs N×, joins at map)
+combine_findings (script)           -> vet_sources
+vet_sources (llm + custom tool)     -> critique
+critique (llm)                      -> reflexion_gate
+reflexion_gate (script)             -> synthesize  (or -> research_each_question: reflexion loop)
+synthesize (agent: report-writer)   -> verify_sources
+verify_sources (script)             -> approve
+approve (approval)                  -> end_accepted          ("accept")
+                                    -> end_rejected          ("reject")
+                                    -> incorporate_feedback   (any free-form answer)
+incorporate_feedback (script)       -> research_each_question (the human-feedback loop)
+```
+
+### Node-type breakdown
+
+| Type | Nodes |
+|---|---|
+| `script` (Python) | `parse_request`, `bootstrap_research`, `combine_findings`, `reflexion_gate`, `verify_sources`, `incorporate_feedback` |
+| `llm` (tools: `[]`) | `plan`, `critique` |
+| `llm` (with tool whitelist) | `research_one_question`, `vet_sources` |
+| `rag` | `knowledge_lookup` — local corpus retrieval |
+| `map` | `research_each_question` — dynamic fan-out per sub-question |
+| `agent` | `synthesize` — spawns the `report-writer` sub-agent |
+| `input` | `ask_topic` |
+| `approval` | `approve` |
+| `end` | `end_accepted`, `end_rejected` |
+
+## Parallel execution
+
+The graph has two parallel super-steps where Loki's BSP scheduler runs
+branches concurrently.
+
+**1. Context loading (`plan` ‖ `knowledge_lookup`)** — after
+`bootstrap_research`, the LLM planner (which decomposes the topic into
+sub-questions) and the RAG retrieval over the local `knowledge/`
+corpus run side by side. They write disjoint state keys (`plan` writes
+`research_plan` and `questions`; `knowledge_lookup` writes
+`local_context` and `local_sources`) so no reducer is needed.
+
+**2. Per-question research (`research_each_question` map)** — the
+plan emits a `questions` array (3-5 entries, enforced by its
+`output_schema`). The `map` node spawns one parallel branch per
+question (`max_concurrency: 3`). Each branch is an isolated
+`research_one_question` LLM invocation with web tools, instructed to
+investigate exactly its assigned question. Outputs collect into
+`question_findings` in input order, then `combine_findings` joins
+them into a single `findings` Markdown document for downstream nodes.
+
+`settings.max_concurrency: 4` is the graph-wide cap; the per-`map`
+override (`max_concurrency: 3` on `research_each_question`) is
+deliberately lower to leave headroom for the planner's tool calls
+running alongside RAG.
+
+## Local knowledge corpus
+
+`knowledge_lookup` is a `rag` node — it runs hybrid (vector + keyword)
+retrieval over every file in `knowledge/`. The directory ships with a
+small `research-style-notes.md` so the RAG node has something to
+retrieve against on a clean install; drop your own Markdown notes,
+PDFs, or text files into `knowledge/` to bias the research toward
+your local context.
+
+The knowledge base is built once, at agent-load time, into
+`~/.config/loki/agents/deep-research/knowledge_lookup.yaml`. Because
+the node fully specifies its build config (`embedding_model`,
+`chunk_size`, `chunk_overlap`), the build is non-interactive. Delete
+that cached file after adding or changing knowledge to force a
+rebuild.
+
+## Sub-agent: report-writer
+
+The `synthesize` node is an `agent` node that spawns the
+`report-writer` sub-agent (`assets/agents/report-writer/`). This is
+the agent-as-tool pattern: the orchestrating graph delegates the
+writing phase to a focused sub-agent dedicated to coherent prose,
+while the research phase uses different (typically cheaper) LLM nodes
+for fast-and-many-question investigation.
+
+The `report-writer` sub-agent has no tools — it cannot access the
+web, cannot search, and cannot invent facts. It reads only the
+findings it is given and produces a final Markdown report preserving
+every inline citation. See `assets/agents/report-writer/README.md`
+for details.
+
+## Tools and tool scoping
+
+This agent demonstrates Loki's three tool sources and how an `llm`
+node's `tools:` whitelist scopes them per node.
+
+The agent's full tool universe, declared in `graph.yaml`:
+
+- **Global tools** (`global_tools`): `web_search_loki`,
+  `fetch_url_via_curl`, `search_arxiv` - Loki's built-in tool scripts.
+- **MCP server** (`mcp_servers`): `ddg-search` - a DuckDuckGo web
+  search MCP server. Referenced in a whitelist as `mcp:ddg-search`.
+- **Custom agent tool** (`tools.sh`): `classify_source` - a
+  deterministic source-credibility classifier shipped with this agent.
+
+No node receives all of these. Each `llm` node's `tools:` whitelist
+narrows the universe to exactly what that step needs:
+
+| Node | `tools:` whitelist | Draws from |
+|---|---|---|
+| `plan`, `critique` | `[]` | nothing - pure reasoning |
+| `research_one_question` | `web_search_loki`, `fetch_url_via_curl`, `search_arxiv`, `mcp:ddg-search` | global tools + MCP |
+| `vet_sources` | `classify_source` | the custom tool only |
+
+`research_one_question` (each parallel branch of the map) can search
+and fetch but cannot classify sources; `vet_sources` can classify
+sources but cannot touch the web. That separation is the point of the
+`tools:` whitelist: a node gets only the tools its job calls for,
+never the agent's full set.
+
+The `classify_source` custom tool (`tools.sh`) takes a URL and returns
+a credibility tier (government, academic, preprint, organization,
+unverified) derived from the host and top-level domain. It is
+deterministic - exactly the kind of logic a tool should own rather than
+the LLM guessing.
+
+Web search may require API-key configuration; see the
+[Tools](https://github.com/Dark-Alex-17/loki/wiki/Tools) docs.
+`fetch_url_via_curl`, `search_arxiv`, and `classify_source` work
+without a key.
+
+## Setup
+
+`research_one_question` (each parallel branch of the `map`) uses the
+`ddg-search` MCP server via `mcp:ddg-search`. It is one of Loki's
+default MCP servers; make sure it is registered in
+`~/.config/loki/mcp.json` (run `loki --install mcp_config` to restore
+the default template if it is missing). If `ddg-search` is unavailable,
+the branches still have their global web-search tools to fall back on.
+
+The `synthesize` node spawns the `report-writer` sub-agent. Both
+agents ship with `loki agents install`; if you install one manually,
+install both so the agent reference resolves.
+
+## Reflexion
+
+The agent has two loops, both built with script nodes that route via
+`_next`. The engine allows back-edges at runtime; the validator only
+rejects cycles built from static `next` / `routes` edges, so script
+`_next` loops are always allowed.
+
+**Automated reflexion loop.** After the parallel research map and
+`vet_sources`, the `critique` node reviews the merged findings
+against the research plan and the source credibility assessment, and
+emits `VERDICT: PASS` or `VERDICT: REVISE` with specific feedback.
+`reflexion_gate.py` then:
+
+- `PASS` -> continue to `synthesize`.
+- `REVISE`, budget remaining -> loop back to `research_each_question`,
+  with the critique injected as `research_feedback` so every parallel
+  branch sees it on the retry.
+- `REVISE`, budget spent -> continue to `synthesize` anyway (the human
+  approval step is the final backstop).
+
+The budget is `MAX_REFLEXION_REVISIONS` in `reflexion_gate.py`
+(default 2, so the research map runs at most 3 times per pass).
+
+**Human-feedback loop.** At `approve` the user answers `accept`,
+`reject`, or types their own feedback. A free-form answer routes via
+the approval node's `on_other` to `incorporate_feedback.py`, which
+folds that text into `research_feedback` and loops back to
+`research_each_question` for another parallel pass.
+
+`settings.max_loop_iterations` (40) is the engine's infinite-loop
+backstop: it caps the total visits to any single node.
+
+## Running
+
+```sh
+loki agents install                  # ships deep-research
+loki -a deep-research "How does HTTP/3 differ from HTTP/2?"
+loki -a deep-research "Recent advances in solid-state batteries"
+loki -a deep-research                # no prompt -> triggers ask_topic
+```
+
+## Anti-hallucination
+
+- `research_one_question` (each map branch) is instructed to back
+  every claim with a real retrieved source and never to fabricate
+  URLs, titles, or DOIs.
+- `vet_sources` classifies every cited source so weak sources are
+  visible to the critique step.
+- `critique` independently reviews the merged findings and sends weak
+  or uncited work back for another parallel research pass.
+- `synthesize` (the `report-writer` sub-agent) is grounded: it may use
+  only the gathered findings and must keep each claim's inline source.
+  It has no tools and cannot browse the web.
+- `verify_sources` probes every cited URL / DOI with an HTTP HEAD
+  request and reports which are unreachable, so the human reviewer
+  sees broken citations before approving.
+
+## Customizing
+
+- **Loop budget.** `MAX_REFLEXION_REVISIONS` in `reflexion_gate.py`.
+- **Map concurrency.** The `research_each_question` node's
+  `max_concurrency: 3` caps simultaneous web-research branches.
+  Raise to investigate more questions in parallel; lower to be gentle
+  on rate-limited providers.
+- **Per-node model.** Add `model: anthropic:...` to any `llm` node.
+  Cheap models work well for `plan` / `critique` / `vet_sources`; the
+  heavy intelligence is needed in `research_one_question` and the
+  `report-writer` sub-agent.
+- **Tool scope.** Narrow the `research_one_question` node's `tools:`
+  list to constrain where each branch looks (for example, drop
+  `web_search_loki` and `mcp:ddg-search` to force arXiv-only
+  research).
+- **Local knowledge.** Drop files into `knowledge/` to bias every
+  research branch toward your local context (see the *Local
+  knowledge corpus* section above).
+- **Different writer.** Replace `agent: report-writer` on the
+  `synthesize` node with the name of any other agent. The
+  orchestrator does not care what kind of agent the writer is.
+- **Skip approval.** Point both `approve` routes at `end_accepted`,
+  or wire `verify_sources` straight to an `end` node.
+
+## Files
+
+```
+assets/agents/deep-research/
+  graph.yaml                    - agent config + 17-node workflow
+  tools.sh                      - classify_source custom tool
+  README.md                     - this file
+  knowledge/
+    README.md                   - corpus-format notes
+    research-style-notes.md     - starter knowledge file (replace with your notes)
+  scripts/
+    parse_request.py            - _next: bootstrap_research, or ask_topic if no topic
+    bootstrap_research.py       - fan-out source: next [plan, knowledge_lookup]
+    combine_findings.py         - joins map output (question_findings) into findings
+    reflexion_gate.py           - _next: research_each_question (revise) or synthesize
+    verify_sources.py           - HTTP HEAD on cited URLs / DOIs
+    incorporate_feedback.py     - _next: research_each_question, with user feedback
+```
+
+See also `assets/agents/report-writer/` — the sub-agent the
+`synthesize` node spawns.