coyote/assets/agents/deep-research/README.md

# deep-research

A deep web research agent, built as a Loki graph agent. It plans an
investigation, decomposes it into sub-questions researched in
parallel, grounds the work in a local knowledge corpus, vets the
credibility of cited sources, runs a reflexion self-critique loop to
revise weak findings, delegates the final write-up to a focused
sub-agent, checks that the cited sources are reachable, and gates the
result behind human approval.

Unlike a regular agent (which takes a goal and improvises the steps),
this agent runs a fixed graph: every request goes through the same
`plan -> parallel research -> vet -> critique -> synthesize -> verify -> approve`
pipeline.

This agent is also the **canonical reference for the Loki graph
system**: it exercises every node type (`script`, `llm`, `rag`, `map`,
`agent`, `input`, `approval`, `end`) and both static fan-out and
dynamic `map` fan-out. If you are learning how to build a graph
agent, this is the file to read alongside the
[Graph-Agents wiki](https://github.com/Dark-Alex-17/loki/wiki/Graph-Agents).

## Workflow

17 nodes. `->` is the static route; a script node can also route
dynamically via `_next`. The `▶▶` line is a parallel super-step —
those branches run concurrently:

```
parse_request (script)              -> bootstrap_research   (or -> ask_topic if no topic)
ask_topic (input)                   -> bootstrap_research
bootstrap_research (script)         -> [plan, knowledge_lookup]   ▶▶ parallel
plan (llm + output_schema)          -> research_each_question
knowledge_lookup (rag)              -> research_each_question
research_each_question (map)        -> combine_findings    (spawns one branch per question)
  └─ research_one_question (llm)    (atomic; runs N×, joins at map)
combine_findings (script)           -> vet_sources
vet_sources (llm + custom tool)     -> critique
critique (llm)                      -> reflexion_gate
reflexion_gate (script)             -> synthesize  (or -> research_each_question: reflexion loop)
synthesize (agent: report-writer)   -> verify_sources
verify_sources (script)             -> approve
approve (approval)                  -> end_accepted          ("accept")
                                    -> end_rejected          ("reject")
                                    -> incorporate_feedback   (any free-form answer)
incorporate_feedback (script)       -> research_each_question (the human-feedback loop)
```

### Node-type breakdown

| Type | Nodes |
|---|---|
| `script` (Python) | `parse_request`, `bootstrap_research`, `combine_findings`, `reflexion_gate`, `verify_sources`, `incorporate_feedback` |
| `llm` (tools: `[]`) | `plan`, `critique` |
| `llm` (with tool whitelist) | `research_one_question`, `vet_sources` |
| `rag` | `knowledge_lookup` — local corpus retrieval |
| `map` | `research_each_question` — dynamic fan-out per sub-question |
| `agent` | `synthesize` — spawns the `report-writer` sub-agent |
| `input` | `ask_topic` |
| `approval` | `approve` |
| `end` | `end_accepted`, `end_rejected` |

## Parallel execution

The graph has two parallel super-steps where Loki's BSP scheduler runs
branches concurrently.

**1. Context loading (`plan` ‖ `knowledge_lookup`)** — after
`bootstrap_research`, the LLM planner (which decomposes the topic into
sub-questions) and the RAG retrieval over the local `knowledge/`
corpus run side by side. They write disjoint state keys (`plan` writes
`research_plan` and `questions`; `knowledge_lookup` writes
`local_context` and `local_sources`) so no reducer is needed.

**2. Per-question research (`research_each_question` map)** — the
plan emits a `questions` array (3-5 entries, enforced by its
`output_schema`). The `map` node spawns one parallel branch per
question (`max_concurrency: 3`). Each branch is an isolated
`research_one_question` LLM invocation with web tools, instructed to
investigate exactly its assigned question. Outputs collect into
`question_findings` in input order, then `combine_findings` joins
them into a single `findings` Markdown document for downstream nodes.

`settings.max_concurrency: 4` is the graph-wide cap; the per-`map`
override (`max_concurrency: 3` on `research_each_question`) is
deliberately lower to leave headroom for the planner's tool calls
running alongside RAG.

## Local knowledge corpus

`knowledge_lookup` is a `rag` node — it runs hybrid (vector + keyword)
retrieval over every file in `knowledge/`. The directory ships with a
small `research-style-notes.md` so the RAG node has something to
retrieve against on a clean install; drop your own Markdown notes,
PDFs, or text files into `knowledge/` to bias the research toward
your local context.

The knowledge base is built once, at agent-load time, into
`~/.config/loki/agents/deep-research/knowledge_lookup.yaml`. Because
the node fully specifies its build config (`embedding_model`,
`chunk_size`, `chunk_overlap`), the build is non-interactive. Delete
that cached file after adding or changing knowledge to force a
rebuild.

## Sub-agent: report-writer

The `synthesize` node is an `agent` node that spawns the
`report-writer` sub-agent (`assets/agents/report-writer/`). This is
the agent-as-tool pattern: the orchestrating graph delegates the
writing phase to a focused sub-agent dedicated to coherent prose,
while the research phase uses different (typically cheaper) LLM nodes
for fast-and-many-question investigation.

The `report-writer` sub-agent has no tools — it cannot access the
web, cannot search, and cannot invent facts. It reads only the
findings it is given and produces a final Markdown report preserving
every inline citation. See `assets/agents/report-writer/README.md`
for details.

## Tools and tool scoping

This agent demonstrates Loki's three tool sources and how an `llm`
node's `tools:` whitelist scopes them per node.

The agent's full tool universe, declared in `graph.yaml`:

- **Global tools** (`global_tools`): `web_search_loki`,
  `fetch_url_via_curl`, `search_arxiv` - Loki's built-in tool scripts.
- **MCP server** (`mcp_servers`): `ddg-search` - a DuckDuckGo web
  search MCP server. Referenced in a whitelist as `mcp:ddg-search`.
- **Custom agent tool** (`tools.sh`): `classify_source` - a
  deterministic source-credibility classifier shipped with this agent.

No node receives all of these. Each `llm` node's `tools:` whitelist
narrows the universe to exactly what that step needs:

| Node | `tools:` whitelist | Draws from |
|---|---|---|
| `plan`, `critique` | `[]` | nothing - pure reasoning |
| `research_one_question` | `web_search_loki`, `fetch_url_via_curl`, `search_arxiv`, `mcp:ddg-search` | global tools + MCP |
| `vet_sources` | `classify_source` | the custom tool only |

`research_one_question` (each parallel branch of the map) can search
and fetch but cannot classify sources; `vet_sources` can classify
sources but cannot touch the web. That separation is the point of the
`tools:` whitelist: a node gets only the tools its job calls for,
never the agent's full set.

The `classify_source` custom tool (`tools.sh`) takes a URL and returns
a credibility tier (government, academic, preprint, organization,
unverified) derived from the host and top-level domain. It is
deterministic - exactly the kind of logic a tool should own rather than
the LLM guessing.

Web search may require API-key configuration; see the
[Tools](https://github.com/Dark-Alex-17/loki/wiki/Tools) docs.
`fetch_url_via_curl`, `search_arxiv`, and `classify_source` work
without a key.

## Setup

`research_one_question` (each parallel branch of the `map`) uses the
`ddg-search` MCP server via `mcp:ddg-search`. It is one of Loki's
default MCP servers; make sure it is registered in
`~/.config/loki/mcp.json` (run `loki --install mcp_config` to restore
the default template if it is missing). If `ddg-search` is unavailable,
the branches still have their global web-search tools to fall back on.

The `synthesize` node spawns the `report-writer` sub-agent. Both
agents ship with `loki agents install`; if you install one manually,
install both so the agent reference resolves.

## Reflexion

The agent has two loops, both built with script nodes that route via
`_next`. The engine allows back-edges at runtime; the validator only
rejects cycles built from static `next` / `routes` edges, so script
`_next` loops are always allowed.

**Automated reflexion loop.** After the parallel research map and
`vet_sources`, the `critique` node reviews the merged findings
against the research plan and the source credibility assessment, and
emits `VERDICT: PASS` or `VERDICT: REVISE` with specific feedback.
`reflexion_gate.py` then:

- `PASS` -> continue to `synthesize`.
- `REVISE`, budget remaining -> loop back to `research_each_question`,
  with the critique injected as `research_feedback` so every parallel
  branch sees it on the retry.
- `REVISE`, budget spent -> continue to `synthesize` anyway (the human
  approval step is the final backstop).

The budget is `MAX_REFLEXION_REVISIONS` in `reflexion_gate.py`
(default 2, so the research map runs at most 3 times per pass).

**Human-feedback loop.** At `approve` the user answers `accept`,
`reject`, or types their own feedback. A free-form answer routes via
the approval node's `on_other` to `incorporate_feedback.py`, which
folds that text into `research_feedback` and loops back to
`research_each_question` for another parallel pass.

`settings.max_loop_iterations` (40) is the engine's infinite-loop
backstop: it caps the total visits to any single node.

## Running

```sh
loki agents install                  # ships deep-research
loki -a deep-research "How does HTTP/3 differ from HTTP/2?"
loki -a deep-research "Recent advances in solid-state batteries"
loki -a deep-research                # no prompt -> triggers ask_topic
```

## Anti-hallucination

- `research_one_question` (each map branch) is instructed to back
  every claim with a real retrieved source and never to fabricate
  URLs, titles, or DOIs.
- `vet_sources` classifies every cited source so weak sources are
  visible to the critique step.
- `critique` independently reviews the merged findings and sends weak
  or uncited work back for another parallel research pass.
- `synthesize` (the `report-writer` sub-agent) is grounded: it may use
  only the gathered findings and must keep each claim's inline source.
  It has no tools and cannot browse the web.
- `verify_sources` probes every cited URL / DOI with an HTTP HEAD
  request and reports which are unreachable, so the human reviewer
  sees broken citations before approving.

## Customizing

- **Loop budget.** `MAX_REFLEXION_REVISIONS` in `reflexion_gate.py`.
- **Map concurrency.** The `research_each_question` node's
  `max_concurrency: 3` caps simultaneous web-research branches.
  Raise to investigate more questions in parallel; lower to be gentle
  on rate-limited providers.
- **Per-node model.** Add `model: anthropic:...` to any `llm` node.
  Cheap models work well for `plan` / `critique` / `vet_sources`; the
  heavy intelligence is needed in `research_one_question` and the
  `report-writer` sub-agent.
- **Tool scope.** Narrow the `research_one_question` node's `tools:`
  list to constrain where each branch looks (for example, drop
  `web_search_loki` and `mcp:ddg-search` to force arXiv-only
  research).
- **Local knowledge.** Drop files into `knowledge/` to bias every
  research branch toward your local context (see the *Local
  knowledge corpus* section above).
- **Different writer.** Replace `agent: report-writer` on the
  `synthesize` node with the name of any other agent. The
  orchestrator does not care what kind of agent the writer is.
- **Skip approval.** Point both `approve` routes at `end_accepted`,
  or wire `verify_sources` straight to an `end` node.

## Files

```
assets/agents/deep-research/
  graph.yaml                    - agent config + 17-node workflow
  tools.sh                      - classify_source custom tool
  README.md                     - this file
  knowledge/
    README.md                   - corpus-format notes
    research-style-notes.md     - starter knowledge file (replace with your notes)
  scripts/
    parse_request.py            - _next: bootstrap_research, or ask_topic if no topic
    bootstrap_research.py       - fan-out source: next [plan, knowledge_lookup]
    combine_findings.py         - joins map output (question_findings) into findings
    reflexion_gate.py           - _next: research_each_question (revise) or synthesize
    verify_sources.py           - HTTP HEAD on cited URLs / DOIs
    incorporate_feedback.py     - _next: research_each_question, with user feedback
```

See also `assets/agents/report-writer/` — the sub-agent the
`synthesize` node spawns.