feat: created new graph-based deep-research agent

2026-05-21 11:27:55 -06:00
parent 597f823bdf
commit d81d233527
13 changed files with 1037 additions and 0 deletions
@@ -0,0 +1,274 @@
+# deep-research
+
+A deep web research agent, built as a Loki graph agent. It plans an
+investigation, decomposes it into sub-questions researched in
+parallel, grounds the work in a local knowledge corpus, vets the
+credibility of cited sources, runs a reflexion self-critique loop to
+revise weak findings, delegates the final write-up to a focused
+sub-agent, checks that the cited sources are reachable, and gates the
+result behind human approval.
+
+Unlike a regular agent (which takes a goal and improvises the steps),
+this agent runs a fixed graph: every request goes through the same
+`plan -> parallel research -> vet -> critique -> synthesize -> verify -> approve`
+pipeline.
+
+This agent is also the **canonical reference for the Loki graph
+system**: it exercises every node type (`script`, `llm`, `rag`, `map`,
+`agent`, `input`, `approval`, `end`) and both static fan-out and
+dynamic `map` fan-out. If you are learning how to build a graph
+agent, this is the file to read alongside the
+[Graph-Agents wiki](https://github.com/Dark-Alex-17/loki/wiki/Graph-Agents).
+
+## Workflow
+
+17 nodes. `->` is the static route; a script node can also route
+dynamically via `_next`. The `▶▶` line is a parallel super-step —
+those branches run concurrently:
+
+```
+parse_request (script)              -> bootstrap_research   (or -> ask_topic if no topic)
+ask_topic (input)                   -> bootstrap_research
+bootstrap_research (script)         -> [plan, knowledge_lookup]   ▶▶ parallel
+plan (llm + output_schema)          -> research_each_question
+knowledge_lookup (rag)              -> research_each_question
+research_each_question (map)        -> combine_findings    (spawns one branch per question)
+  └─ research_one_question (llm)    (atomic; runs N×, joins at map)
+combine_findings (script)           -> vet_sources
+vet_sources (llm + custom tool)     -> critique
+critique (llm)                      -> reflexion_gate
+reflexion_gate (script)             -> synthesize  (or -> research_each_question: reflexion loop)
+synthesize (agent: report-writer)   -> verify_sources
+verify_sources (script)             -> approve
+approve (approval)                  -> end_accepted          ("accept")
+                                    -> end_rejected          ("reject")
+                                    -> incorporate_feedback   (any free-form answer)
+incorporate_feedback (script)       -> research_each_question (the human-feedback loop)
+```
+
+### Node-type breakdown
+
+| Type | Nodes |
+|---|---|
+| `script` (Python) | `parse_request`, `bootstrap_research`, `combine_findings`, `reflexion_gate`, `verify_sources`, `incorporate_feedback` |
+| `llm` (tools: `[]`) | `plan`, `critique` |
+| `llm` (with tool whitelist) | `research_one_question`, `vet_sources` |
+| `rag` | `knowledge_lookup` — local corpus retrieval |
+| `map` | `research_each_question` — dynamic fan-out per sub-question |
+| `agent` | `synthesize` — spawns the `report-writer` sub-agent |
+| `input` | `ask_topic` |
+| `approval` | `approve` |
+| `end` | `end_accepted`, `end_rejected` |
+
+## Parallel execution
+
+The graph has two parallel super-steps where Loki's BSP scheduler runs
+branches concurrently.
+
+**1. Context loading (`plan` ‖ `knowledge_lookup`)** — after
+`bootstrap_research`, the LLM planner (which decomposes the topic into
+sub-questions) and the RAG retrieval over the local `knowledge/`
+corpus run side by side. They write disjoint state keys (`plan` writes
+`research_plan` and `questions`; `knowledge_lookup` writes
+`local_context` and `local_sources`) so no reducer is needed.
+
+**2. Per-question research (`research_each_question` map)** — the
+plan emits a `questions` array (3-5 entries, enforced by its
+`output_schema`). The `map` node spawns one parallel branch per
+question (`max_concurrency: 3`). Each branch is an isolated
+`research_one_question` LLM invocation with web tools, instructed to
+investigate exactly its assigned question. Outputs collect into
+`question_findings` in input order, then `combine_findings` joins
+them into a single `findings` Markdown document for downstream nodes.
+
+`settings.max_concurrency: 4` is the graph-wide cap; the per-`map`
+override (`max_concurrency: 3` on `research_each_question`) is
+deliberately lower to leave headroom for the planner's tool calls
+running alongside RAG.
+
+## Local knowledge corpus
+
+`knowledge_lookup` is a `rag` node — it runs hybrid (vector + keyword)
+retrieval over every file in `knowledge/`. The directory ships with a
+small `research-style-notes.md` so the RAG node has something to
+retrieve against on a clean install; drop your own Markdown notes,
+PDFs, or text files into `knowledge/` to bias the research toward
+your local context.
+
+The knowledge base is built once, at agent-load time, into
+`~/.config/loki/agents/deep-research/knowledge_lookup.yaml`. Because
+the node fully specifies its build config (`embedding_model`,
+`chunk_size`, `chunk_overlap`), the build is non-interactive. Delete
+that cached file after adding or changing knowledge to force a
+rebuild.
+
+## Sub-agent: report-writer
+
+The `synthesize` node is an `agent` node that spawns the
+`report-writer` sub-agent (`assets/agents/report-writer/`). This is
+the agent-as-tool pattern: the orchestrating graph delegates the
+writing phase to a focused sub-agent dedicated to coherent prose,
+while the research phase uses different (typically cheaper) LLM nodes
+for fast-and-many-question investigation.
+
+The `report-writer` sub-agent has no tools — it cannot access the
+web, cannot search, and cannot invent facts. It reads only the
+findings it is given and produces a final Markdown report preserving
+every inline citation. See `assets/agents/report-writer/README.md`
+for details.
+
+## Tools and tool scoping
+
+This agent demonstrates Loki's three tool sources and how an `llm`
+node's `tools:` whitelist scopes them per node.
+
+The agent's full tool universe, declared in `graph.yaml`:
+
+- **Global tools** (`global_tools`): `web_search_loki`,
+  `fetch_url_via_curl`, `search_arxiv` - Loki's built-in tool scripts.
+- **MCP server** (`mcp_servers`): `ddg-search` - a DuckDuckGo web
+  search MCP server. Referenced in a whitelist as `mcp:ddg-search`.
+- **Custom agent tool** (`tools.sh`): `classify_source` - a
+  deterministic source-credibility classifier shipped with this agent.
+
+No node receives all of these. Each `llm` node's `tools:` whitelist
+narrows the universe to exactly what that step needs:
+
+| Node | `tools:` whitelist | Draws from |
+|---|---|---|
+| `plan`, `critique` | `[]` | nothing - pure reasoning |
+| `research_one_question` | `web_search_loki`, `fetch_url_via_curl`, `search_arxiv`, `mcp:ddg-search` | global tools + MCP |
+| `vet_sources` | `classify_source` | the custom tool only |
+
+`research_one_question` (each parallel branch of the map) can search
+and fetch but cannot classify sources; `vet_sources` can classify
+sources but cannot touch the web. That separation is the point of the
+`tools:` whitelist: a node gets only the tools its job calls for,
+never the agent's full set.
+
+The `classify_source` custom tool (`tools.sh`) takes a URL and returns
+a credibility tier (government, academic, preprint, organization,
+unverified) derived from the host and top-level domain. It is
+deterministic - exactly the kind of logic a tool should own rather than
+the LLM guessing.
+
+Web search may require API-key configuration; see the
+[Tools](https://github.com/Dark-Alex-17/loki/wiki/Tools) docs.
+`fetch_url_via_curl`, `search_arxiv`, and `classify_source` work
+without a key.
+
+## Setup
+
+`research_one_question` (each parallel branch of the `map`) uses the
+`ddg-search` MCP server via `mcp:ddg-search`. It is one of Loki's
+default MCP servers; make sure it is registered in
+`~/.config/loki/mcp.json` (run `loki --install mcp_config` to restore
+the default template if it is missing). If `ddg-search` is unavailable,
+the branches still have their global web-search tools to fall back on.
+
+The `synthesize` node spawns the `report-writer` sub-agent. Both
+agents ship with `loki agents install`; if you install one manually,
+install both so the agent reference resolves.
+
+## Reflexion
+
+The agent has two loops, both built with script nodes that route via
+`_next`. The engine allows back-edges at runtime; the validator only
+rejects cycles built from static `next` / `routes` edges, so script
+`_next` loops are always allowed.
+
+**Automated reflexion loop.** After the parallel research map and
+`vet_sources`, the `critique` node reviews the merged findings
+against the research plan and the source credibility assessment, and
+emits `VERDICT: PASS` or `VERDICT: REVISE` with specific feedback.
+`reflexion_gate.py` then:
+
+- `PASS` -> continue to `synthesize`.
+- `REVISE`, budget remaining -> loop back to `research_each_question`,
+  with the critique injected as `research_feedback` so every parallel
+  branch sees it on the retry.
+- `REVISE`, budget spent -> continue to `synthesize` anyway (the human
+  approval step is the final backstop).
+
+The budget is `MAX_REFLEXION_REVISIONS` in `reflexion_gate.py`
+(default 2, so the research map runs at most 3 times per pass).
+
+**Human-feedback loop.** At `approve` the user answers `accept`,
+`reject`, or types their own feedback. A free-form answer routes via
+the approval node's `on_other` to `incorporate_feedback.py`, which
+folds that text into `research_feedback` and loops back to
+`research_each_question` for another parallel pass.
+
+`settings.max_loop_iterations` (40) is the engine's infinite-loop
+backstop: it caps the total visits to any single node.
+
+## Running
+
+```sh
+loki agents install                  # ships deep-research
+loki -a deep-research "How does HTTP/3 differ from HTTP/2?"
+loki -a deep-research "Recent advances in solid-state batteries"
+loki -a deep-research                # no prompt -> triggers ask_topic
+```
+
+## Anti-hallucination
+
+- `research_one_question` (each map branch) is instructed to back
+  every claim with a real retrieved source and never to fabricate
+  URLs, titles, or DOIs.
+- `vet_sources` classifies every cited source so weak sources are
+  visible to the critique step.
+- `critique` independently reviews the merged findings and sends weak
+  or uncited work back for another parallel research pass.
+- `synthesize` (the `report-writer` sub-agent) is grounded: it may use
+  only the gathered findings and must keep each claim's inline source.
+  It has no tools and cannot browse the web.
+- `verify_sources` probes every cited URL / DOI with an HTTP HEAD
+  request and reports which are unreachable, so the human reviewer
+  sees broken citations before approving.
+
+## Customizing
+
+- **Loop budget.** `MAX_REFLEXION_REVISIONS` in `reflexion_gate.py`.
+- **Map concurrency.** The `research_each_question` node's
+  `max_concurrency: 3` caps simultaneous web-research branches.
+  Raise to investigate more questions in parallel; lower to be gentle
+  on rate-limited providers.
+- **Per-node model.** Add `model: anthropic:...` to any `llm` node.
+  Cheap models work well for `plan` / `critique` / `vet_sources`; the
+  heavy intelligence is needed in `research_one_question` and the
+  `report-writer` sub-agent.
+- **Tool scope.** Narrow the `research_one_question` node's `tools:`
+  list to constrain where each branch looks (for example, drop
+  `web_search_loki` and `mcp:ddg-search` to force arXiv-only
+  research).
+- **Local knowledge.** Drop files into `knowledge/` to bias every
+  research branch toward your local context (see the *Local
+  knowledge corpus* section above).
+- **Different writer.** Replace `agent: report-writer` on the
+  `synthesize` node with the name of any other agent. The
+  orchestrator does not care what kind of agent the writer is.
+- **Skip approval.** Point both `approve` routes at `end_accepted`,
+  or wire `verify_sources` straight to an `end` node.
+
+## Files
+
+```
+assets/agents/deep-research/
+  graph.yaml                    - agent config + 17-node workflow
+  tools.sh                      - classify_source custom tool
+  README.md                     - this file
+  knowledge/
+    README.md                   - corpus-format notes
+    research-style-notes.md     - starter knowledge file (replace with your notes)
+  scripts/
+    parse_request.py            - _next: bootstrap_research, or ask_topic if no topic
+    bootstrap_research.py       - fan-out source: next [plan, knowledge_lookup]
+    combine_findings.py         - joins map output (question_findings) into findings
+    reflexion_gate.py           - _next: research_each_question (revise) or synthesize
+    verify_sources.py           - HTTP HEAD on cited URLs / DOIs
+    incorporate_feedback.py     - _next: research_each_question, with user feedback
+```
+
+See also `assets/agents/report-writer/` — the sub-agent the
+`synthesize` node spawns.
@@ -0,0 +1,294 @@
+name: deep-research
+description: |
+  Deep web research workflow. Plans an investigation, decomposes it
+  into sub-questions researched in parallel, grounds the work in a
+  local knowledge corpus, vets the credibility of cited sources, runs
+  a reflexion self-critique loop to revise weak or incomplete findings,
+  delegates the final write-up to a focused sub-agent, checks that the
+  cited sources are reachable, and gates the result behind human
+  approval. A reviewer's free-form feedback at the approval step feeds
+  back into another research pass.
+
+  This is the canonical Loki graph-agent reference: it exercises every
+  node type (script, llm, rag, map, agent, input, approval, end) and
+  both static fan-out and dynamic map fan-out.
+
+version: "1.0"
+
+temperature: 0.0
+
+global_tools:
+  - web_search_loki.sh
+  - fetch_url_via_curl.sh
+  - search_arxiv.sh
+
+mcp_servers:
+  - ddg-search
+
+conversation_starters:
+  - "How does HTTP/3 differ from HTTP/2?"
+  - "Summarize recent advances in solid-state battery chemistry"
+
+settings:
+  max_loop_iterations: 40
+  log_state_snapshots: false
+  validate_before_run: true
+  max_concurrency: 4
+
+initial_state:
+  research_feedback: ""
+  research_attempts: 0
+  local_context: ""
+  local_sources: ""
+
+start: parse_request
+
+nodes:
+
+  parse_request:
+    id: parse_request
+    type: script
+    script: scripts/parse_request.py
+    next: bootstrap_research
+
+  ask_topic:
+    id: ask_topic
+    type: input
+    question: "What would you like me to research?"
+    validation: "len(input) > 0"
+    state_updates:
+      topic: "{{input}}"
+    next: bootstrap_research
+
+  bootstrap_research:
+    id: bootstrap_research
+    type: script
+    script: scripts/bootstrap_research.py
+    next: [plan, knowledge_lookup]
+
+  plan:
+    id: plan
+    type: llm
+    instructions: |
+      You are a research planner. Given a topic, produce a focused
+      research plan and decompose it into 3-5 specific sub-questions
+      that can each be researched independently in parallel.
+
+      The plan is a short narrative naming the key questions and the
+      kinds of sources that would be authoritative. The sub-questions
+      are precise, self-contained queries (each one is sent on its own
+      to a separate research worker, so they must be answerable
+      without each other's context).
+    prompt: "Research topic: {{topic}}"
+    tools: []
+    output_schema:
+      type: object
+      properties:
+        research_plan:
+          type: string
+          description: A short plan narrative.
+        questions:
+          type: array
+          items: { type: string }
+          minItems: 1
+          maxItems: 6
+          description: 3-5 specific, self-contained sub-questions.
+      required: [research_plan, questions]
+    next: research_each_question
+
+  knowledge_lookup:
+    id: knowledge_lookup
+    type: rag
+    documents:
+      - ./knowledge/
+    query: "{{topic}}"
+    top_k: 6
+    embedding_model: openai:text-embedding-3-small
+    chunk_size: 1000
+    chunk_overlap: 100
+    state_updates:
+      local_context: "{{output.context}}"
+      local_sources: "{{output.sources}}"
+    next: research_each_question
+
+  research_each_question:
+    id: research_each_question
+    type: map
+    over: "{{questions}}"
+    as: question
+    branch: research_one_question
+    collect_into: question_findings
+    max_concurrency: 3
+    next: combine_findings
+
+  research_one_question:
+    id: research_one_question
+    type: llm
+    instructions: |
+      You are a web research assistant. Investigate the SINGLE question
+      given to you using your tools: search the web, fetch and read
+      pages, and search arXiv for academic sources.
+
+      Rules:
+        - Every factual claim must be backed by a real source you
+          actually retrieved. Never fabricate URLs, page titles,
+          authors, or DOIs.
+        - Prefer primary and authoritative sources over aggregators.
+        - Where sources disagree, report the disagreement rather than
+          papering over it.
+        - Put the URL (or DOI) inline next to each claim it supports.
+
+      Return organized findings in plain text. Do not include
+      meta-commentary about the process.
+    prompt: |
+      Research question: {{question}}
+
+      Local context that may help:
+      {{local_context}}
+
+      {{research_feedback}}
+    tools:
+      - web_search_loki
+      - fetch_url_via_curl
+      - search_arxiv
+      - mcp:ddg-search
+    max_iterations: 10
+    max_attempts: 2
+    temperature: 0.1
+
+  combine_findings:
+    id: combine_findings
+    type: script
+    script: scripts/combine_findings.py
+    next: vet_sources
+
+  vet_sources:
+    id: vet_sources
+    type: llm
+    instructions: |
+      You assess the credibility of the sources cited in a set of
+      research findings. For every distinct source URL in the findings,
+      call the `classify_source` tool to get its credibility tier. Then
+      summarize: which claims rest on HIGH-credibility sources, and
+      which rest on PREPRINT or UNVERIFIED sources and so need
+      corroboration. Do NOT do any new research -- assess only what is
+      already cited.
+    prompt: |
+      Findings to assess:
+      {{findings}}
+    tools:
+      - classify_source
+    max_iterations: 15
+    state_updates:
+      source_assessment: "{{output}}"
+    next: critique
+
+  critique:
+    id: critique
+    type: llm
+    instructions: |
+      You are a meticulous research reviewer. Judge whether the
+      findings below are good enough to synthesize a complete,
+      well-supported report that answers the research plan.
+
+      Mark the findings REVISE if ANY of these hold:
+        - A research-plan question is unanswered or only weakly
+          addressed.
+        - A factual claim has no source, or cites a source that looks
+          fabricated.
+        - The findings lean on a single source where corroboration is
+          needed.
+        - A key claim rests only on a PREPRINT or UNVERIFIED source,
+          per the source credibility assessment below.
+        - An obvious counter-perspective or recent development is
+          missing.
+      Otherwise mark them PASS.
+
+      Respond in EXACTLY this format, nothing else:
+
+      VERDICT: <PASS or REVISE>
+      FEEDBACK: <if REVISE, be specific and actionable -- name the gaps
+      and what kind of source would close them; if PASS, write "none">
+    prompt: |
+      Research plan:
+      {{research_plan}}
+
+      Findings under review:
+      {{findings}}
+
+      Source credibility assessment:
+      {{source_assessment}}
+    tools: []
+    state_updates:
+      critique: "{{output}}"
+    next: reflexion_gate
+
+  reflexion_gate:
+    id: reflexion_gate
+    type: script
+    script: scripts/reflexion_gate.py
+    next: synthesize
+
+  synthesize:
+    id: synthesize
+    type: agent
+    agent: report-writer
+    prompt: |
+      Research topic: {{topic}}
+
+      Findings (organized by sub-question, with inline citations):
+      {{findings}}
+
+      Source credibility assessment:
+      {{source_assessment}}
+
+      Produce the final report following your instructions.
+    timeout: 300
+    state_updates:
+      report: "{{output}}"
+    next: verify_sources
+
+  verify_sources:
+    id: verify_sources
+    type: script
+    script: scripts/verify_sources.py
+    next: approve
+
+  approve:
+    id: approve
+    type: approval
+    question: |
+      Research report on: {{topic}}
+
+      {{report}}
+
+      ----
+      {{source_check}}
+      ----
+
+      Accept this report? Pick "accept" or "reject", or type specific
+      feedback to send the research back for another pass.
+    options:
+      - "accept"
+      - "reject"
+    routes:
+      "accept": end_accepted
+      "reject": end_rejected
+    on_other: incorporate_feedback
+    state_updates:
+      decision: "{{choice}}"
+
+  incorporate_feedback:
+    id: incorporate_feedback
+    type: script
+    script: scripts/incorporate_feedback.py
+
+  end_accepted:
+    id: end_accepted
+    type: end
+    output: "{{report}}"
+
+  end_rejected:
+    id: end_rejected
+    type: end
+    output: "Research on '{{topic}}' was rejected and discarded."
@@ -0,0 +1,23 @@
+# Local knowledge corpus for deep-research
+
+The `knowledge_lookup` node in `graph.yaml` is a `rag` node that runs
+hybrid (vector + keyword) retrieval over every file in this directory.
+Drop your own notes, papers (PDFs), Markdown docs, or text files here
+and they will be indexed into a per-agent knowledge base on first run.
+
+Loki supports common file types out of the box: `.md`, `.txt`, `.pdf`,
+`.html`, and others. Subdirectories are walked recursively.
+
+A small starter file (`research-style-notes.md`) ships so the RAG
+node has something non-empty to retrieve against on a clean install.
+Replace or extend it with your own materials to bias the research
+phase toward your local context.
+
+To force the knowledge base to rebuild after you add or change files,
+delete the cached index:
+
+```sh
+rm ~/.config/loki/agents/deep-research/knowledge_lookup.yaml
+```
+
+The next run will rebuild from the current contents of this directory.
@@ -0,0 +1,49 @@
+# Research style notes
+
+These are general principles the `deep-research` agent should keep in
+mind regardless of topic. Replace this file with your own notes if you
+want to bias retrieval toward your local context.
+
+## What "good research" means here
+
+- **Every factual claim cites a source you actually retrieved.** Never
+  fabricate URLs, page titles, authors, or DOIs.
+- **Primary sources beat aggregators.** Prefer the original paper, the
+  RFC, the standards body, or the manufacturer over a blog summarizing
+  them.
+- **Corroboration matters where stakes are high.** If a single source
+  makes a strong claim, look for a second independent source before
+  taking it as established.
+- **Disagreement is information, not noise.** If two credible sources
+  disagree, report the disagreement and the reasoning on each side.
+- **Old does not mean wrong.** A 2014 RFC is still authoritative if no
+  newer one has obsoleted it; check before assuming a source is stale.
+
+## Source-tier heuristics
+
+The `vet_sources` node uses these rough tiers to weigh credibility.
+The custom tool `classify_source` (see `tools.sh`) implements this
+deterministically by hostname / TLD.
+
+- **HIGH:** government domains (`.gov`, `.mil`), academic institutions
+  (`.edu`, university subdomains), peer-reviewed journals, standards
+  bodies (IETF/RFCs, W3C, ISO, IEEE, NIST), and primary documents from
+  the entities being researched (e.g. a vendor's official spec page).
+- **PREPRINT:** arXiv, bioRxiv, medRxiv, SSRN. Useful but not yet
+  peer-reviewed; treat numeric claims with extra caution.
+- **ORGANIZATION:** established nonprofits, standards-adjacent groups,
+  industry consortia. Reliable for their stated mission but may have a
+  perspective.
+- **UNVERIFIED:** general web pages, blogs, news aggregators, social
+  media. Useful for leads but should not be the only source for a
+  factual claim.
+
+## Common pitfalls to flag in critique
+
+- A claim cited only to a PREPRINT or UNVERIFIED source on a numeric
+  or contested point.
+- A research-plan question that the findings address only obliquely.
+- "Findings" that paraphrase a single source three times rather than
+  triangulating.
+- Citation collisions where two sources are listed but turn out to
+  be the same study reported via different aggregators.
@@ -0,0 +1,18 @@
+#!/usr/bin/env python3
+"""Fan-out source for context loading.
+
+Has no logic of its own. Exists so the static `next: [plan, knowledge_lookup]`
+list on this node fans out into two parallel branches (the LLM planner and
+the RAG knowledge lookup) as a single super-step. The validator requires
+declared parallel-branch script outputs, so we emit an empty JSON object
+explicitly here.
+"""
+import json
+
+
+def main():
+    print(json.dumps({}))
+
+
+if __name__ == "__main__":
+    main()
@@ -0,0 +1,39 @@
+#!/usr/bin/env python3
+"""Join the per-question map outputs into a single `findings` string.
+
+The `research_each_question` map writes `question_findings` (an array,
+one entry per sub-question, in input order). Downstream nodes
+(`vet_sources`, `critique`, `synthesize`) read `{{findings}}` as a
+single block, so this script renders the array as a Markdown document
+with one section per question.
+"""
+import json
+import os
+
+
+def load_state():
+    path = os.environ.get("GRAPH_STATE_FILE")
+    if path:
+        with open(path) as f:
+            return json.load(f)
+    return json.loads(os.environ.get("GRAPH_STATE", "{}"))
+
+
+def main():
+    state = load_state()
+    questions = state.get("questions") or []
+    per_question = state.get("question_findings") or []
+
+    sections = []
+    for idx, q in enumerate(questions):
+        body = per_question[idx] if idx < len(per_question) else ""
+        if isinstance(body, dict) or isinstance(body, list):
+            body = json.dumps(body, indent=2)
+        sections.append(f"## {q}\n\n{body}")
+
+    findings = "\n\n".join(sections) if sections else "No findings gathered."
+    print(json.dumps({"findings": findings}))
+
+
+if __name__ == "__main__":
+    main()
@@ -0,0 +1,41 @@
+#!/usr/bin/env python3
+"""Fold a reviewer's free-form feedback back into the research loop.
+
+Runs when the user answers the approval step with their own text
+instead of "accept" or "reject". That text (saved by the approval node
+as `decision`) becomes `research_feedback`, and the graph loops back to
+`research_each_question` for another informed pass (each sub-question is
+re-researched in parallel with the new feedback in context). The
+reflexion counter is reset so the user-driven pass gets a fresh revision
+budget.
+
+Routing (`_next`): always research_each_question.
+"""
+import json
+import os
+
+
+def load_state():
+    path = os.environ.get("GRAPH_STATE_FILE")
+    if path:
+        with open(path) as f:
+            return json.load(f)
+    return json.loads(os.environ.get("GRAPH_STATE", "{}"))
+
+
+def main():
+    state = load_state()
+    feedback = (state.get("decision") or "").strip()
+    output = {
+        "_next": "research_each_question",
+        "research_attempts": 0,
+        "research_feedback": (
+            "The user reviewed the report and asked for changes. Treat "
+            "this as the top priority for the next pass:\n\n" + feedback
+        ),
+    }
+    print(json.dumps(output))
+
+
+if __name__ == "__main__":
+    main()
@@ -0,0 +1,35 @@
+#!/usr/bin/env python3
+"""Entry router for deep-research.
+
+Reads the caller's prompt from state. If it contains a usable research
+topic, stores it as `topic` and falls through to the static `next`
+(plan). If the prompt is empty, routes to `ask_topic` so the user can
+supply one interactively.
+
+Routing (`_next`):
+  - prompt present -> (no _next; static next: plan)
+  - prompt empty   -> ask_topic
+"""
+import json
+import os
+
+
+def load_state():
+    path = os.environ.get("GRAPH_STATE_FILE")
+    if path:
+        with open(path) as f:
+            return json.load(f)
+    return json.loads(os.environ.get("GRAPH_STATE", "{}"))
+
+
+def main():
+    state = load_state()
+    prompt = (state.get("initial_prompt") or "").strip()
+    if prompt:
+        print(json.dumps({"topic": prompt}))
+    else:
+        print(json.dumps({"_next": "ask_topic"}))
+
+
+if __name__ == "__main__":
+    main()
@@ -0,0 +1,76 @@
+#!/usr/bin/env python3
+"""Reflexion gate for deep-research.
+
+Runs after `critique` has reviewed the current research findings. If the
+critique's verdict is REVISE and the reflexion budget is not spent,
+loops back to `research` with the critique attached as
+`research_feedback`, so the retry is informed rather than a blind
+re-run. Otherwise it proceeds to `synthesize`.
+
+Routing (`_next`):
+  - verdict PASS                     -> synthesize
+  - verdict REVISE, budget remaining -> research_each_question  (+ research_feedback)
+  - verdict REVISE, budget spent     -> synthesize
+
+Reflexion is a best-effort quality booster, not a hard gate: once the
+budget is spent the workflow proceeds anyway, and the human approval
+step is the final backstop.
+"""
+import json
+import os
+import re
+
+# Automated revision passes allowed. `research` runs at most
+# MAX_REFLEXION_REVISIONS + 1 times per user pass. Bump to allow more.
+MAX_REFLEXION_REVISIONS = 2
+
+
+def load_state():
+    path = os.environ.get("GRAPH_STATE_FILE")
+    if path:
+        with open(path) as f:
+            return json.load(f)
+    return json.loads(os.environ.get("GRAPH_STATE", "{}"))
+
+
+def as_int(value, default=0):
+    try:
+        return int(value)
+    except (TypeError, ValueError):
+        return default
+
+
+def parse_verdict(critique):
+    """Pull PASS/REVISE from the critique's `VERDICT:` line. Defaults to
+    PASS when no verdict line is found, so a malformed critique lets the
+    workflow proceed instead of burning the whole revision budget."""
+    match = re.search(r"VERDICT:\s*([A-Za-z]+)", critique, re.IGNORECASE)
+    if not match:
+        return "PASS"
+    return match.group(1).upper()
+
+
+def main():
+    state = load_state()
+    critique = state.get("critique") or ""
+    verdict = parse_verdict(critique)
+    attempts = as_int(state.get("research_attempts"))
+
+    if verdict == "REVISE" and attempts < MAX_REFLEXION_REVISIONS:
+        feedback = (
+            "A reviewer judged the previous research pass incomplete. "
+            "Address every point in the critique below:\n\n" + critique
+        )
+        output = {
+            "_next": "research_each_question",
+            "research_attempts": attempts + 1,
+            "research_feedback": feedback,
+        }
+    else:
+        output = {"_next": "synthesize"}
+
+    print(json.dumps(output))
+
+
+if __name__ == "__main__":
+    main()
@@ -0,0 +1,69 @@
+#!/usr/bin/env python3
+"""Check that the sources cited in the research report are reachable.
+
+Scans the final report for URLs and DOIs, probes each with a HEAD
+request, and writes a `source_check` summary into state so the human
+reviewer sees broken citations at the approval step.
+
+Times out per request so a slow source cannot stall the graph.
+"""
+import json
+import os
+import re
+import urllib.error
+import urllib.request
+
+DOI_RE = re.compile(r"\b(10\.\d{4,9}/[-._;()/:A-Z0-9]+)", re.IGNORECASE)
+URL_RE = re.compile(r"https?://[^\s)\]\}\"'>]+")
+
+
+def load_state():
+    path = os.environ.get("GRAPH_STATE_FILE")
+    if path:
+        with open(path) as f:
+            return json.load(f)
+    return json.loads(os.environ.get("GRAPH_STATE", "{}"))
+
+
+def reachable(url, timeout=5.0):
+    req = urllib.request.Request(url, method="HEAD")
+    try:
+        with urllib.request.urlopen(req, timeout=timeout) as resp:
+            return 200 <= resp.status < 400
+    except urllib.error.HTTPError as e:
+        return 200 <= e.code < 400
+    except Exception:
+        return False
+
+
+def main():
+    state = load_state()
+    report = state.get("report") or ""
+
+    urls = sorted({u.rstrip(".,;)") for u in URL_RE.findall(report)})
+    dois = sorted(set(DOI_RE.findall(report)))
+
+    results = []
+    for url in urls:
+        ok = reachable(url)
+        results.append(f"  {'OK' if ok else 'UNREACHABLE'}  {url}")
+    for doi in dois:
+        url = f"https://doi.org/{doi}"
+        if url in urls:
+            continue
+        ok = reachable(url)
+        results.append(f"  {'OK' if ok else 'UNREACHABLE'}  DOI {doi} ({url})")
+
+    if not results:
+        summary = "No web sources were cited in the report."
+    else:
+        summary = (
+            f"Source reachability ({len(results)} checked):\n"
+            + "\n".join(results)
+        )
+
+    print(json.dumps({"source_check": summary}))
+
+
+if __name__ == "__main__":
+    main()
@@ -0,0 +1,39 @@
+#!/usr/bin/env bash
+
+set -e
+
+# @env LLM_OUTPUT=/dev/stdout The output path
+
+# @cmd Classify the credibility tier of a web source from its URL.
+# A deterministic check based on the host and top-level domain. Use it
+# to weigh how much trust to place in a source before relying on it.
+# @option --url!  The full source URL to classify
+classify_source() {
+    # shellcheck disable=SC2154
+    local url="$argc_url"
+    local host="${url#*://}"
+    host="${host%%/*}"
+    host="${host##*@}"
+    host="${host%%:*}"
+    host="$(printf '%s' "$host" | tr '[:upper:]' '[:lower:]')"
+
+    local tier
+    case "$host" in
+        '')
+            tier="UNKNOWN - no host could be parsed from the URL" ;;
+        *.gov | *.gov.* | *.mil)
+            tier="HIGH - government source" ;;
+        *.edu | *.edu.* | *.ac.*)
+            tier="HIGH - academic institution" ;;
+        arxiv.org | *.arxiv.org | biorxiv.org | *.biorxiv.org | medrxiv.org | *.medrxiv.org | ssrn.com | *.ssrn.com)
+            tier="PREPRINT - not yet peer reviewed, corroborate before citing" ;;
+        wikipedia.org | *.wikipedia.org)
+            tier="TERTIARY - encyclopedia, good for orientation not citation" ;;
+        *.org | *.org.*)
+            tier="MEDIUM - organization site, check for institutional bias" ;;
+        *)
+            tier="UNVERIFIED - general web source, corroborate before citing" ;;
+    esac
+
+    printf '%s: %s\n' "${host:-<none>}" "$tier" >> "$LLM_OUTPUT"
+}
@@ -0,0 +1,46 @@
+# report-writer
+
+A tiny, focused sub-agent that turns a set of research findings into a
+single coherent final report. Reads only what it is given — does not
+do independent research, does not access the web, does not invent
+facts. It exists as a focused tool for orchestrating agents to
+delegate the writing phase to.
+
+## Why a separate agent?
+
+This is an example of the **agent-as-tool** pattern in graph agents.
+The `deep-research` graph agent's `synthesize` node is an `agent` node
+that spawns this one (see `assets/agents/deep-research/graph.yaml`).
+Separating the role has two practical benefits:
+
+- The orchestrating agent can use a cheap model (or a high-temperature
+  exploratory one) for the research phase, while letting the writing
+  phase use a different (typically lower-temperature, possibly larger)
+  model dedicated to coherent prose.
+- The writing prompt is owned by this agent's `config.yaml` rather
+  than buried inside another agent's graph. You can polish it
+  independently without touching the research flow.
+
+## Standalone use
+
+You can also use this agent directly if you have a set of findings you
+want polished:
+
+```sh
+loki -a report-writer "Topic: X. Findings: <paste findings here>"
+```
+
+It will produce a single Markdown report following the rules in its
+system prompt: executive summary at the top, grouped sections by
+related sub-questions, every inline citation preserved verbatim, and a
+final "Open questions / disagreements" section.
+
+## What it will NOT do
+
+- Search the web, fetch URLs, query an MCP server, or use any tool.
+  It has no tools configured.
+- Invent facts beyond what is in the findings you give it.
+- Strip or rewrite citations.
+
+These constraints are the point of the agent existing: a writer that
+the orchestrator can trust to stay in its lane.
@@ -0,0 +1,34 @@
+name: report-writer
+description: Polishes research findings into a clear, citation-preserving final report
+version: 1.0.0
+temperature: 0.2
+
+instructions: |
+  You are a technical writer. You will be given:
+    - a research topic
+    - a set of findings, organized per sub-question, with inline
+      citations next to each claim
+    - a source-credibility assessment of the cited sources
+
+  Your job is to produce a single, well-organized final report:
+
+  Rules:
+    - Use ONLY the findings provided. Do not introduce facts from
+      your own memory. Do not speculate beyond what the findings
+      support.
+    - Preserve every inline citation. If a sentence in the findings
+      had a URL or DOI, the equivalent sentence in your report must
+      keep the same citation.
+    - Lead with a 2-3 sentence executive summary at the top.
+    - Organize the body so that related sub-questions are grouped,
+      not strictly one section per question. The findings are raw
+      material; the report should read as a single coherent answer
+      to the original topic.
+    - End with a short "Open questions / disagreements" section
+      naming anything the findings flagged as unresolved or
+      contested.
+
+  Output plain Markdown. No metadata, no JSON wrapper.
+
+conversation_starters:
+  - "Polish these findings into a cited report"