From d81d233527c1e010a3c52512b764e307f06ee838 Mon Sep 17 00:00:00 2001 From: Alex Clarke Date: Thu, 21 May 2026 11:27:55 -0600 Subject: [PATCH] feat: created new graph-based deep-research agent --- assets/agents/deep-research/README.md | 274 ++++++++++++++++ assets/agents/deep-research/graph.yaml | 294 ++++++++++++++++++ .../agents/deep-research/knowledge/README.md | 23 ++ .../knowledge/research-style-notes.md | 49 +++ .../scripts/bootstrap_research.py | 18 ++ .../deep-research/scripts/combine_findings.py | 39 +++ .../scripts/incorporate_feedback.py | 41 +++ .../deep-research/scripts/parse_request.py | 35 +++ .../deep-research/scripts/reflexion_gate.py | 76 +++++ .../deep-research/scripts/verify_sources.py | 69 ++++ assets/agents/deep-research/tools.sh | 39 +++ assets/agents/report-writer/README.md | 46 +++ assets/agents/report-writer/config.yaml | 34 ++ 13 files changed, 1037 insertions(+) create mode 100644 assets/agents/deep-research/README.md create mode 100644 assets/agents/deep-research/graph.yaml create mode 100644 assets/agents/deep-research/knowledge/README.md create mode 100644 assets/agents/deep-research/knowledge/research-style-notes.md create mode 100644 assets/agents/deep-research/scripts/bootstrap_research.py create mode 100644 assets/agents/deep-research/scripts/combine_findings.py create mode 100644 assets/agents/deep-research/scripts/incorporate_feedback.py create mode 100644 assets/agents/deep-research/scripts/parse_request.py create mode 100644 assets/agents/deep-research/scripts/reflexion_gate.py create mode 100644 assets/agents/deep-research/scripts/verify_sources.py create mode 100644 assets/agents/deep-research/tools.sh create mode 100644 assets/agents/report-writer/README.md create mode 100644 assets/agents/report-writer/config.yaml diff --git a/assets/agents/deep-research/README.md b/assets/agents/deep-research/README.md new file mode 100644 index 0000000..fbb1f2f --- /dev/null +++ b/assets/agents/deep-research/README.md @@ -0,0 +1,274 @@ +# deep-research + +A deep web research agent, built as a Loki graph agent. It plans an +investigation, decomposes it into sub-questions researched in +parallel, grounds the work in a local knowledge corpus, vets the +credibility of cited sources, runs a reflexion self-critique loop to +revise weak findings, delegates the final write-up to a focused +sub-agent, checks that the cited sources are reachable, and gates the +result behind human approval. + +Unlike a regular agent (which takes a goal and improvises the steps), +this agent runs a fixed graph: every request goes through the same +`plan -> parallel research -> vet -> critique -> synthesize -> verify -> approve` +pipeline. + +This agent is also the **canonical reference for the Loki graph +system**: it exercises every node type (`script`, `llm`, `rag`, `map`, +`agent`, `input`, `approval`, `end`) and both static fan-out and +dynamic `map` fan-out. If you are learning how to build a graph +agent, this is the file to read alongside the +[Graph-Agents wiki](https://github.com/Dark-Alex-17/loki/wiki/Graph-Agents). + +## Workflow + +17 nodes. `->` is the static route; a script node can also route +dynamically via `_next`. The `▶▶` line is a parallel super-step — +those branches run concurrently: + +``` +parse_request (script) -> bootstrap_research (or -> ask_topic if no topic) +ask_topic (input) -> bootstrap_research +bootstrap_research (script) -> [plan, knowledge_lookup] ▶▶ parallel +plan (llm + output_schema) -> research_each_question +knowledge_lookup (rag) -> research_each_question +research_each_question (map) -> combine_findings (spawns one branch per question) + └─ research_one_question (llm) (atomic; runs N×, joins at map) +combine_findings (script) -> vet_sources +vet_sources (llm + custom tool) -> critique +critique (llm) -> reflexion_gate +reflexion_gate (script) -> synthesize (or -> research_each_question: reflexion loop) +synthesize (agent: report-writer) -> verify_sources +verify_sources (script) -> approve +approve (approval) -> end_accepted ("accept") + -> end_rejected ("reject") + -> incorporate_feedback (any free-form answer) +incorporate_feedback (script) -> research_each_question (the human-feedback loop) +``` + +### Node-type breakdown + +| Type | Nodes | +|---|---| +| `script` (Python) | `parse_request`, `bootstrap_research`, `combine_findings`, `reflexion_gate`, `verify_sources`, `incorporate_feedback` | +| `llm` (tools: `[]`) | `plan`, `critique` | +| `llm` (with tool whitelist) | `research_one_question`, `vet_sources` | +| `rag` | `knowledge_lookup` — local corpus retrieval | +| `map` | `research_each_question` — dynamic fan-out per sub-question | +| `agent` | `synthesize` — spawns the `report-writer` sub-agent | +| `input` | `ask_topic` | +| `approval` | `approve` | +| `end` | `end_accepted`, `end_rejected` | + +## Parallel execution + +The graph has two parallel super-steps where Loki's BSP scheduler runs +branches concurrently. + +**1. Context loading (`plan` ‖ `knowledge_lookup`)** — after +`bootstrap_research`, the LLM planner (which decomposes the topic into +sub-questions) and the RAG retrieval over the local `knowledge/` +corpus run side by side. They write disjoint state keys (`plan` writes +`research_plan` and `questions`; `knowledge_lookup` writes +`local_context` and `local_sources`) so no reducer is needed. + +**2. Per-question research (`research_each_question` map)** — the +plan emits a `questions` array (3-5 entries, enforced by its +`output_schema`). The `map` node spawns one parallel branch per +question (`max_concurrency: 3`). Each branch is an isolated +`research_one_question` LLM invocation with web tools, instructed to +investigate exactly its assigned question. Outputs collect into +`question_findings` in input order, then `combine_findings` joins +them into a single `findings` Markdown document for downstream nodes. + +`settings.max_concurrency: 4` is the graph-wide cap; the per-`map` +override (`max_concurrency: 3` on `research_each_question`) is +deliberately lower to leave headroom for the planner's tool calls +running alongside RAG. + +## Local knowledge corpus + +`knowledge_lookup` is a `rag` node — it runs hybrid (vector + keyword) +retrieval over every file in `knowledge/`. The directory ships with a +small `research-style-notes.md` so the RAG node has something to +retrieve against on a clean install; drop your own Markdown notes, +PDFs, or text files into `knowledge/` to bias the research toward +your local context. + +The knowledge base is built once, at agent-load time, into +`~/.config/loki/agents/deep-research/knowledge_lookup.yaml`. Because +the node fully specifies its build config (`embedding_model`, +`chunk_size`, `chunk_overlap`), the build is non-interactive. Delete +that cached file after adding or changing knowledge to force a +rebuild. + +## Sub-agent: report-writer + +The `synthesize` node is an `agent` node that spawns the +`report-writer` sub-agent (`assets/agents/report-writer/`). This is +the agent-as-tool pattern: the orchestrating graph delegates the +writing phase to a focused sub-agent dedicated to coherent prose, +while the research phase uses different (typically cheaper) LLM nodes +for fast-and-many-question investigation. + +The `report-writer` sub-agent has no tools — it cannot access the +web, cannot search, and cannot invent facts. It reads only the +findings it is given and produces a final Markdown report preserving +every inline citation. See `assets/agents/report-writer/README.md` +for details. + +## Tools and tool scoping + +This agent demonstrates Loki's three tool sources and how an `llm` +node's `tools:` whitelist scopes them per node. + +The agent's full tool universe, declared in `graph.yaml`: + +- **Global tools** (`global_tools`): `web_search_loki`, + `fetch_url_via_curl`, `search_arxiv` - Loki's built-in tool scripts. +- **MCP server** (`mcp_servers`): `ddg-search` - a DuckDuckGo web + search MCP server. Referenced in a whitelist as `mcp:ddg-search`. +- **Custom agent tool** (`tools.sh`): `classify_source` - a + deterministic source-credibility classifier shipped with this agent. + +No node receives all of these. Each `llm` node's `tools:` whitelist +narrows the universe to exactly what that step needs: + +| Node | `tools:` whitelist | Draws from | +|---|---|---| +| `plan`, `critique` | `[]` | nothing - pure reasoning | +| `research_one_question` | `web_search_loki`, `fetch_url_via_curl`, `search_arxiv`, `mcp:ddg-search` | global tools + MCP | +| `vet_sources` | `classify_source` | the custom tool only | + +`research_one_question` (each parallel branch of the map) can search +and fetch but cannot classify sources; `vet_sources` can classify +sources but cannot touch the web. That separation is the point of the +`tools:` whitelist: a node gets only the tools its job calls for, +never the agent's full set. + +The `classify_source` custom tool (`tools.sh`) takes a URL and returns +a credibility tier (government, academic, preprint, organization, +unverified) derived from the host and top-level domain. It is +deterministic - exactly the kind of logic a tool should own rather than +the LLM guessing. + +Web search may require API-key configuration; see the +[Tools](https://github.com/Dark-Alex-17/loki/wiki/Tools) docs. +`fetch_url_via_curl`, `search_arxiv`, and `classify_source` work +without a key. + +## Setup + +`research_one_question` (each parallel branch of the `map`) uses the +`ddg-search` MCP server via `mcp:ddg-search`. It is one of Loki's +default MCP servers; make sure it is registered in +`~/.config/loki/mcp.json` (run `loki --install mcp_config` to restore +the default template if it is missing). If `ddg-search` is unavailable, +the branches still have their global web-search tools to fall back on. + +The `synthesize` node spawns the `report-writer` sub-agent. Both +agents ship with `loki agents install`; if you install one manually, +install both so the agent reference resolves. + +## Reflexion + +The agent has two loops, both built with script nodes that route via +`_next`. The engine allows back-edges at runtime; the validator only +rejects cycles built from static `next` / `routes` edges, so script +`_next` loops are always allowed. + +**Automated reflexion loop.** After the parallel research map and +`vet_sources`, the `critique` node reviews the merged findings +against the research plan and the source credibility assessment, and +emits `VERDICT: PASS` or `VERDICT: REVISE` with specific feedback. +`reflexion_gate.py` then: + +- `PASS` -> continue to `synthesize`. +- `REVISE`, budget remaining -> loop back to `research_each_question`, + with the critique injected as `research_feedback` so every parallel + branch sees it on the retry. +- `REVISE`, budget spent -> continue to `synthesize` anyway (the human + approval step is the final backstop). + +The budget is `MAX_REFLEXION_REVISIONS` in `reflexion_gate.py` +(default 2, so the research map runs at most 3 times per pass). + +**Human-feedback loop.** At `approve` the user answers `accept`, +`reject`, or types their own feedback. A free-form answer routes via +the approval node's `on_other` to `incorporate_feedback.py`, which +folds that text into `research_feedback` and loops back to +`research_each_question` for another parallel pass. + +`settings.max_loop_iterations` (40) is the engine's infinite-loop +backstop: it caps the total visits to any single node. + +## Running + +```sh +loki agents install # ships deep-research +loki -a deep-research "How does HTTP/3 differ from HTTP/2?" +loki -a deep-research "Recent advances in solid-state batteries" +loki -a deep-research # no prompt -> triggers ask_topic +``` + +## Anti-hallucination + +- `research_one_question` (each map branch) is instructed to back + every claim with a real retrieved source and never to fabricate + URLs, titles, or DOIs. +- `vet_sources` classifies every cited source so weak sources are + visible to the critique step. +- `critique` independently reviews the merged findings and sends weak + or uncited work back for another parallel research pass. +- `synthesize` (the `report-writer` sub-agent) is grounded: it may use + only the gathered findings and must keep each claim's inline source. + It has no tools and cannot browse the web. +- `verify_sources` probes every cited URL / DOI with an HTTP HEAD + request and reports which are unreachable, so the human reviewer + sees broken citations before approving. + +## Customizing + +- **Loop budget.** `MAX_REFLEXION_REVISIONS` in `reflexion_gate.py`. +- **Map concurrency.** The `research_each_question` node's + `max_concurrency: 3` caps simultaneous web-research branches. + Raise to investigate more questions in parallel; lower to be gentle + on rate-limited providers. +- **Per-node model.** Add `model: anthropic:...` to any `llm` node. + Cheap models work well for `plan` / `critique` / `vet_sources`; the + heavy intelligence is needed in `research_one_question` and the + `report-writer` sub-agent. +- **Tool scope.** Narrow the `research_one_question` node's `tools:` + list to constrain where each branch looks (for example, drop + `web_search_loki` and `mcp:ddg-search` to force arXiv-only + research). +- **Local knowledge.** Drop files into `knowledge/` to bias every + research branch toward your local context (see the *Local + knowledge corpus* section above). +- **Different writer.** Replace `agent: report-writer` on the + `synthesize` node with the name of any other agent. The + orchestrator does not care what kind of agent the writer is. +- **Skip approval.** Point both `approve` routes at `end_accepted`, + or wire `verify_sources` straight to an `end` node. + +## Files + +``` +assets/agents/deep-research/ + graph.yaml - agent config + 17-node workflow + tools.sh - classify_source custom tool + README.md - this file + knowledge/ + README.md - corpus-format notes + research-style-notes.md - starter knowledge file (replace with your notes) + scripts/ + parse_request.py - _next: bootstrap_research, or ask_topic if no topic + bootstrap_research.py - fan-out source: next [plan, knowledge_lookup] + combine_findings.py - joins map output (question_findings) into findings + reflexion_gate.py - _next: research_each_question (revise) or synthesize + verify_sources.py - HTTP HEAD on cited URLs / DOIs + incorporate_feedback.py - _next: research_each_question, with user feedback +``` + +See also `assets/agents/report-writer/` — the sub-agent the +`synthesize` node spawns. diff --git a/assets/agents/deep-research/graph.yaml b/assets/agents/deep-research/graph.yaml new file mode 100644 index 0000000..9b33716 --- /dev/null +++ b/assets/agents/deep-research/graph.yaml @@ -0,0 +1,294 @@ +name: deep-research +description: | + Deep web research workflow. Plans an investigation, decomposes it + into sub-questions researched in parallel, grounds the work in a + local knowledge corpus, vets the credibility of cited sources, runs + a reflexion self-critique loop to revise weak or incomplete findings, + delegates the final write-up to a focused sub-agent, checks that the + cited sources are reachable, and gates the result behind human + approval. A reviewer's free-form feedback at the approval step feeds + back into another research pass. + + This is the canonical Loki graph-agent reference: it exercises every + node type (script, llm, rag, map, agent, input, approval, end) and + both static fan-out and dynamic map fan-out. + +version: "1.0" + +temperature: 0.0 + +global_tools: + - web_search_loki.sh + - fetch_url_via_curl.sh + - search_arxiv.sh + +mcp_servers: + - ddg-search + +conversation_starters: + - "How does HTTP/3 differ from HTTP/2?" + - "Summarize recent advances in solid-state battery chemistry" + +settings: + max_loop_iterations: 40 + log_state_snapshots: false + validate_before_run: true + max_concurrency: 4 + +initial_state: + research_feedback: "" + research_attempts: 0 + local_context: "" + local_sources: "" + +start: parse_request + +nodes: + + parse_request: + id: parse_request + type: script + script: scripts/parse_request.py + next: bootstrap_research + + ask_topic: + id: ask_topic + type: input + question: "What would you like me to research?" + validation: "len(input) > 0" + state_updates: + topic: "{{input}}" + next: bootstrap_research + + bootstrap_research: + id: bootstrap_research + type: script + script: scripts/bootstrap_research.py + next: [plan, knowledge_lookup] + + plan: + id: plan + type: llm + instructions: | + You are a research planner. Given a topic, produce a focused + research plan and decompose it into 3-5 specific sub-questions + that can each be researched independently in parallel. + + The plan is a short narrative naming the key questions and the + kinds of sources that would be authoritative. The sub-questions + are precise, self-contained queries (each one is sent on its own + to a separate research worker, so they must be answerable + without each other's context). + prompt: "Research topic: {{topic}}" + tools: [] + output_schema: + type: object + properties: + research_plan: + type: string + description: A short plan narrative. + questions: + type: array + items: { type: string } + minItems: 1 + maxItems: 6 + description: 3-5 specific, self-contained sub-questions. + required: [research_plan, questions] + next: research_each_question + + knowledge_lookup: + id: knowledge_lookup + type: rag + documents: + - ./knowledge/ + query: "{{topic}}" + top_k: 6 + embedding_model: openai:text-embedding-3-small + chunk_size: 1000 + chunk_overlap: 100 + state_updates: + local_context: "{{output.context}}" + local_sources: "{{output.sources}}" + next: research_each_question + + research_each_question: + id: research_each_question + type: map + over: "{{questions}}" + as: question + branch: research_one_question + collect_into: question_findings + max_concurrency: 3 + next: combine_findings + + research_one_question: + id: research_one_question + type: llm + instructions: | + You are a web research assistant. Investigate the SINGLE question + given to you using your tools: search the web, fetch and read + pages, and search arXiv for academic sources. + + Rules: + - Every factual claim must be backed by a real source you + actually retrieved. Never fabricate URLs, page titles, + authors, or DOIs. + - Prefer primary and authoritative sources over aggregators. + - Where sources disagree, report the disagreement rather than + papering over it. + - Put the URL (or DOI) inline next to each claim it supports. + + Return organized findings in plain text. Do not include + meta-commentary about the process. + prompt: | + Research question: {{question}} + + Local context that may help: + {{local_context}} + + {{research_feedback}} + tools: + - web_search_loki + - fetch_url_via_curl + - search_arxiv + - mcp:ddg-search + max_iterations: 10 + max_attempts: 2 + temperature: 0.1 + + combine_findings: + id: combine_findings + type: script + script: scripts/combine_findings.py + next: vet_sources + + vet_sources: + id: vet_sources + type: llm + instructions: | + You assess the credibility of the sources cited in a set of + research findings. For every distinct source URL in the findings, + call the `classify_source` tool to get its credibility tier. Then + summarize: which claims rest on HIGH-credibility sources, and + which rest on PREPRINT or UNVERIFIED sources and so need + corroboration. Do NOT do any new research -- assess only what is + already cited. + prompt: | + Findings to assess: + {{findings}} + tools: + - classify_source + max_iterations: 15 + state_updates: + source_assessment: "{{output}}" + next: critique + + critique: + id: critique + type: llm + instructions: | + You are a meticulous research reviewer. Judge whether the + findings below are good enough to synthesize a complete, + well-supported report that answers the research plan. + + Mark the findings REVISE if ANY of these hold: + - A research-plan question is unanswered or only weakly + addressed. + - A factual claim has no source, or cites a source that looks + fabricated. + - The findings lean on a single source where corroboration is + needed. + - A key claim rests only on a PREPRINT or UNVERIFIED source, + per the source credibility assessment below. + - An obvious counter-perspective or recent development is + missing. + Otherwise mark them PASS. + + Respond in EXACTLY this format, nothing else: + + VERDICT: + FEEDBACK: + prompt: | + Research plan: + {{research_plan}} + + Findings under review: + {{findings}} + + Source credibility assessment: + {{source_assessment}} + tools: [] + state_updates: + critique: "{{output}}" + next: reflexion_gate + + reflexion_gate: + id: reflexion_gate + type: script + script: scripts/reflexion_gate.py + next: synthesize + + synthesize: + id: synthesize + type: agent + agent: report-writer + prompt: | + Research topic: {{topic}} + + Findings (organized by sub-question, with inline citations): + {{findings}} + + Source credibility assessment: + {{source_assessment}} + + Produce the final report following your instructions. + timeout: 300 + state_updates: + report: "{{output}}" + next: verify_sources + + verify_sources: + id: verify_sources + type: script + script: scripts/verify_sources.py + next: approve + + approve: + id: approve + type: approval + question: | + Research report on: {{topic}} + + {{report}} + + ---- + {{source_check}} + ---- + + Accept this report? Pick "accept" or "reject", or type specific + feedback to send the research back for another pass. + options: + - "accept" + - "reject" + routes: + "accept": end_accepted + "reject": end_rejected + on_other: incorporate_feedback + state_updates: + decision: "{{choice}}" + + incorporate_feedback: + id: incorporate_feedback + type: script + script: scripts/incorporate_feedback.py + + end_accepted: + id: end_accepted + type: end + output: "{{report}}" + + end_rejected: + id: end_rejected + type: end + output: "Research on '{{topic}}' was rejected and discarded." diff --git a/assets/agents/deep-research/knowledge/README.md b/assets/agents/deep-research/knowledge/README.md new file mode 100644 index 0000000..52b578b --- /dev/null +++ b/assets/agents/deep-research/knowledge/README.md @@ -0,0 +1,23 @@ +# Local knowledge corpus for deep-research + +The `knowledge_lookup` node in `graph.yaml` is a `rag` node that runs +hybrid (vector + keyword) retrieval over every file in this directory. +Drop your own notes, papers (PDFs), Markdown docs, or text files here +and they will be indexed into a per-agent knowledge base on first run. + +Loki supports common file types out of the box: `.md`, `.txt`, `.pdf`, +`.html`, and others. Subdirectories are walked recursively. + +A small starter file (`research-style-notes.md`) ships so the RAG +node has something non-empty to retrieve against on a clean install. +Replace or extend it with your own materials to bias the research +phase toward your local context. + +To force the knowledge base to rebuild after you add or change files, +delete the cached index: + +```sh +rm ~/.config/loki/agents/deep-research/knowledge_lookup.yaml +``` + +The next run will rebuild from the current contents of this directory. diff --git a/assets/agents/deep-research/knowledge/research-style-notes.md b/assets/agents/deep-research/knowledge/research-style-notes.md new file mode 100644 index 0000000..a64848d --- /dev/null +++ b/assets/agents/deep-research/knowledge/research-style-notes.md @@ -0,0 +1,49 @@ +# Research style notes + +These are general principles the `deep-research` agent should keep in +mind regardless of topic. Replace this file with your own notes if you +want to bias retrieval toward your local context. + +## What "good research" means here + +- **Every factual claim cites a source you actually retrieved.** Never + fabricate URLs, page titles, authors, or DOIs. +- **Primary sources beat aggregators.** Prefer the original paper, the + RFC, the standards body, or the manufacturer over a blog summarizing + them. +- **Corroboration matters where stakes are high.** If a single source + makes a strong claim, look for a second independent source before + taking it as established. +- **Disagreement is information, not noise.** If two credible sources + disagree, report the disagreement and the reasoning on each side. +- **Old does not mean wrong.** A 2014 RFC is still authoritative if no + newer one has obsoleted it; check before assuming a source is stale. + +## Source-tier heuristics + +The `vet_sources` node uses these rough tiers to weigh credibility. +The custom tool `classify_source` (see `tools.sh`) implements this +deterministically by hostname / TLD. + +- **HIGH:** government domains (`.gov`, `.mil`), academic institutions + (`.edu`, university subdomains), peer-reviewed journals, standards + bodies (IETF/RFCs, W3C, ISO, IEEE, NIST), and primary documents from + the entities being researched (e.g. a vendor's official spec page). +- **PREPRINT:** arXiv, bioRxiv, medRxiv, SSRN. Useful but not yet + peer-reviewed; treat numeric claims with extra caution. +- **ORGANIZATION:** established nonprofits, standards-adjacent groups, + industry consortia. Reliable for their stated mission but may have a + perspective. +- **UNVERIFIED:** general web pages, blogs, news aggregators, social + media. Useful for leads but should not be the only source for a + factual claim. + +## Common pitfalls to flag in critique + +- A claim cited only to a PREPRINT or UNVERIFIED source on a numeric + or contested point. +- A research-plan question that the findings address only obliquely. +- "Findings" that paraphrase a single source three times rather than + triangulating. +- Citation collisions where two sources are listed but turn out to + be the same study reported via different aggregators. diff --git a/assets/agents/deep-research/scripts/bootstrap_research.py b/assets/agents/deep-research/scripts/bootstrap_research.py new file mode 100644 index 0000000..d230afc --- /dev/null +++ b/assets/agents/deep-research/scripts/bootstrap_research.py @@ -0,0 +1,18 @@ +#!/usr/bin/env python3 +"""Fan-out source for context loading. + +Has no logic of its own. Exists so the static `next: [plan, knowledge_lookup]` +list on this node fans out into two parallel branches (the LLM planner and +the RAG knowledge lookup) as a single super-step. The validator requires +declared parallel-branch script outputs, so we emit an empty JSON object +explicitly here. +""" +import json + + +def main(): + print(json.dumps({})) + + +if __name__ == "__main__": + main() diff --git a/assets/agents/deep-research/scripts/combine_findings.py b/assets/agents/deep-research/scripts/combine_findings.py new file mode 100644 index 0000000..b55d7aa --- /dev/null +++ b/assets/agents/deep-research/scripts/combine_findings.py @@ -0,0 +1,39 @@ +#!/usr/bin/env python3 +"""Join the per-question map outputs into a single `findings` string. + +The `research_each_question` map writes `question_findings` (an array, +one entry per sub-question, in input order). Downstream nodes +(`vet_sources`, `critique`, `synthesize`) read `{{findings}}` as a +single block, so this script renders the array as a Markdown document +with one section per question. +""" +import json +import os + + +def load_state(): + path = os.environ.get("GRAPH_STATE_FILE") + if path: + with open(path) as f: + return json.load(f) + return json.loads(os.environ.get("GRAPH_STATE", "{}")) + + +def main(): + state = load_state() + questions = state.get("questions") or [] + per_question = state.get("question_findings") or [] + + sections = [] + for idx, q in enumerate(questions): + body = per_question[idx] if idx < len(per_question) else "" + if isinstance(body, dict) or isinstance(body, list): + body = json.dumps(body, indent=2) + sections.append(f"## {q}\n\n{body}") + + findings = "\n\n".join(sections) if sections else "No findings gathered." + print(json.dumps({"findings": findings})) + + +if __name__ == "__main__": + main() diff --git a/assets/agents/deep-research/scripts/incorporate_feedback.py b/assets/agents/deep-research/scripts/incorporate_feedback.py new file mode 100644 index 0000000..27c22f2 --- /dev/null +++ b/assets/agents/deep-research/scripts/incorporate_feedback.py @@ -0,0 +1,41 @@ +#!/usr/bin/env python3 +"""Fold a reviewer's free-form feedback back into the research loop. + +Runs when the user answers the approval step with their own text +instead of "accept" or "reject". That text (saved by the approval node +as `decision`) becomes `research_feedback`, and the graph loops back to +`research_each_question` for another informed pass (each sub-question is +re-researched in parallel with the new feedback in context). The +reflexion counter is reset so the user-driven pass gets a fresh revision +budget. + +Routing (`_next`): always research_each_question. +""" +import json +import os + + +def load_state(): + path = os.environ.get("GRAPH_STATE_FILE") + if path: + with open(path) as f: + return json.load(f) + return json.loads(os.environ.get("GRAPH_STATE", "{}")) + + +def main(): + state = load_state() + feedback = (state.get("decision") or "").strip() + output = { + "_next": "research_each_question", + "research_attempts": 0, + "research_feedback": ( + "The user reviewed the report and asked for changes. Treat " + "this as the top priority for the next pass:\n\n" + feedback + ), + } + print(json.dumps(output)) + + +if __name__ == "__main__": + main() diff --git a/assets/agents/deep-research/scripts/parse_request.py b/assets/agents/deep-research/scripts/parse_request.py new file mode 100644 index 0000000..faae28c --- /dev/null +++ b/assets/agents/deep-research/scripts/parse_request.py @@ -0,0 +1,35 @@ +#!/usr/bin/env python3 +"""Entry router for deep-research. + +Reads the caller's prompt from state. If it contains a usable research +topic, stores it as `topic` and falls through to the static `next` +(plan). If the prompt is empty, routes to `ask_topic` so the user can +supply one interactively. + +Routing (`_next`): + - prompt present -> (no _next; static next: plan) + - prompt empty -> ask_topic +""" +import json +import os + + +def load_state(): + path = os.environ.get("GRAPH_STATE_FILE") + if path: + with open(path) as f: + return json.load(f) + return json.loads(os.environ.get("GRAPH_STATE", "{}")) + + +def main(): + state = load_state() + prompt = (state.get("initial_prompt") or "").strip() + if prompt: + print(json.dumps({"topic": prompt})) + else: + print(json.dumps({"_next": "ask_topic"})) + + +if __name__ == "__main__": + main() diff --git a/assets/agents/deep-research/scripts/reflexion_gate.py b/assets/agents/deep-research/scripts/reflexion_gate.py new file mode 100644 index 0000000..2dd1e6b --- /dev/null +++ b/assets/agents/deep-research/scripts/reflexion_gate.py @@ -0,0 +1,76 @@ +#!/usr/bin/env python3 +"""Reflexion gate for deep-research. + +Runs after `critique` has reviewed the current research findings. If the +critique's verdict is REVISE and the reflexion budget is not spent, +loops back to `research` with the critique attached as +`research_feedback`, so the retry is informed rather than a blind +re-run. Otherwise it proceeds to `synthesize`. + +Routing (`_next`): + - verdict PASS -> synthesize + - verdict REVISE, budget remaining -> research_each_question (+ research_feedback) + - verdict REVISE, budget spent -> synthesize + +Reflexion is a best-effort quality booster, not a hard gate: once the +budget is spent the workflow proceeds anyway, and the human approval +step is the final backstop. +""" +import json +import os +import re + +# Automated revision passes allowed. `research` runs at most +# MAX_REFLEXION_REVISIONS + 1 times per user pass. Bump to allow more. +MAX_REFLEXION_REVISIONS = 2 + + +def load_state(): + path = os.environ.get("GRAPH_STATE_FILE") + if path: + with open(path) as f: + return json.load(f) + return json.loads(os.environ.get("GRAPH_STATE", "{}")) + + +def as_int(value, default=0): + try: + return int(value) + except (TypeError, ValueError): + return default + + +def parse_verdict(critique): + """Pull PASS/REVISE from the critique's `VERDICT:` line. Defaults to + PASS when no verdict line is found, so a malformed critique lets the + workflow proceed instead of burning the whole revision budget.""" + match = re.search(r"VERDICT:\s*([A-Za-z]+)", critique, re.IGNORECASE) + if not match: + return "PASS" + return match.group(1).upper() + + +def main(): + state = load_state() + critique = state.get("critique") or "" + verdict = parse_verdict(critique) + attempts = as_int(state.get("research_attempts")) + + if verdict == "REVISE" and attempts < MAX_REFLEXION_REVISIONS: + feedback = ( + "A reviewer judged the previous research pass incomplete. " + "Address every point in the critique below:\n\n" + critique + ) + output = { + "_next": "research_each_question", + "research_attempts": attempts + 1, + "research_feedback": feedback, + } + else: + output = {"_next": "synthesize"} + + print(json.dumps(output)) + + +if __name__ == "__main__": + main() diff --git a/assets/agents/deep-research/scripts/verify_sources.py b/assets/agents/deep-research/scripts/verify_sources.py new file mode 100644 index 0000000..9828341 --- /dev/null +++ b/assets/agents/deep-research/scripts/verify_sources.py @@ -0,0 +1,69 @@ +#!/usr/bin/env python3 +"""Check that the sources cited in the research report are reachable. + +Scans the final report for URLs and DOIs, probes each with a HEAD +request, and writes a `source_check` summary into state so the human +reviewer sees broken citations at the approval step. + +Times out per request so a slow source cannot stall the graph. +""" +import json +import os +import re +import urllib.error +import urllib.request + +DOI_RE = re.compile(r"\b(10\.\d{4,9}/[-._;()/:A-Z0-9]+)", re.IGNORECASE) +URL_RE = re.compile(r"https?://[^\s)\]\}\"'>]+") + + +def load_state(): + path = os.environ.get("GRAPH_STATE_FILE") + if path: + with open(path) as f: + return json.load(f) + return json.loads(os.environ.get("GRAPH_STATE", "{}")) + + +def reachable(url, timeout=5.0): + req = urllib.request.Request(url, method="HEAD") + try: + with urllib.request.urlopen(req, timeout=timeout) as resp: + return 200 <= resp.status < 400 + except urllib.error.HTTPError as e: + return 200 <= e.code < 400 + except Exception: + return False + + +def main(): + state = load_state() + report = state.get("report") or "" + + urls = sorted({u.rstrip(".,;)") for u in URL_RE.findall(report)}) + dois = sorted(set(DOI_RE.findall(report))) + + results = [] + for url in urls: + ok = reachable(url) + results.append(f" {'OK' if ok else 'UNREACHABLE'} {url}") + for doi in dois: + url = f"https://doi.org/{doi}" + if url in urls: + continue + ok = reachable(url) + results.append(f" {'OK' if ok else 'UNREACHABLE'} DOI {doi} ({url})") + + if not results: + summary = "No web sources were cited in the report." + else: + summary = ( + f"Source reachability ({len(results)} checked):\n" + + "\n".join(results) + ) + + print(json.dumps({"source_check": summary})) + + +if __name__ == "__main__": + main() diff --git a/assets/agents/deep-research/tools.sh b/assets/agents/deep-research/tools.sh new file mode 100644 index 0000000..b715364 --- /dev/null +++ b/assets/agents/deep-research/tools.sh @@ -0,0 +1,39 @@ +#!/usr/bin/env bash + +set -e + +# @env LLM_OUTPUT=/dev/stdout The output path + +# @cmd Classify the credibility tier of a web source from its URL. +# A deterministic check based on the host and top-level domain. Use it +# to weigh how much trust to place in a source before relying on it. +# @option --url! The full source URL to classify +classify_source() { + # shellcheck disable=SC2154 + local url="$argc_url" + local host="${url#*://}" + host="${host%%/*}" + host="${host##*@}" + host="${host%%:*}" + host="$(printf '%s' "$host" | tr '[:upper:]' '[:lower:]')" + + local tier + case "$host" in + '') + tier="UNKNOWN - no host could be parsed from the URL" ;; + *.gov | *.gov.* | *.mil) + tier="HIGH - government source" ;; + *.edu | *.edu.* | *.ac.*) + tier="HIGH - academic institution" ;; + arxiv.org | *.arxiv.org | biorxiv.org | *.biorxiv.org | medrxiv.org | *.medrxiv.org | ssrn.com | *.ssrn.com) + tier="PREPRINT - not yet peer reviewed, corroborate before citing" ;; + wikipedia.org | *.wikipedia.org) + tier="TERTIARY - encyclopedia, good for orientation not citation" ;; + *.org | *.org.*) + tier="MEDIUM - organization site, check for institutional bias" ;; + *) + tier="UNVERIFIED - general web source, corroborate before citing" ;; + esac + + printf '%s: %s\n' "${host:-}" "$tier" >> "$LLM_OUTPUT" +} diff --git a/assets/agents/report-writer/README.md b/assets/agents/report-writer/README.md new file mode 100644 index 0000000..0adf134 --- /dev/null +++ b/assets/agents/report-writer/README.md @@ -0,0 +1,46 @@ +# report-writer + +A tiny, focused sub-agent that turns a set of research findings into a +single coherent final report. Reads only what it is given — does not +do independent research, does not access the web, does not invent +facts. It exists as a focused tool for orchestrating agents to +delegate the writing phase to. + +## Why a separate agent? + +This is an example of the **agent-as-tool** pattern in graph agents. +The `deep-research` graph agent's `synthesize` node is an `agent` node +that spawns this one (see `assets/agents/deep-research/graph.yaml`). +Separating the role has two practical benefits: + +- The orchestrating agent can use a cheap model (or a high-temperature + exploratory one) for the research phase, while letting the writing + phase use a different (typically lower-temperature, possibly larger) + model dedicated to coherent prose. +- The writing prompt is owned by this agent's `config.yaml` rather + than buried inside another agent's graph. You can polish it + independently without touching the research flow. + +## Standalone use + +You can also use this agent directly if you have a set of findings you +want polished: + +```sh +loki -a report-writer "Topic: X. Findings: " +``` + +It will produce a single Markdown report following the rules in its +system prompt: executive summary at the top, grouped sections by +related sub-questions, every inline citation preserved verbatim, and a +final "Open questions / disagreements" section. + +## What it will NOT do + +- Search the web, fetch URLs, query an MCP server, or use any tool. + It has no tools configured. +- Invent facts beyond what is in the findings you give it. +- Strip or rewrite citations. + +These constraints are the point of the agent existing: a writer that +the orchestrator can trust to stay in its lane. diff --git a/assets/agents/report-writer/config.yaml b/assets/agents/report-writer/config.yaml new file mode 100644 index 0000000..2940c7d --- /dev/null +++ b/assets/agents/report-writer/config.yaml @@ -0,0 +1,34 @@ +name: report-writer +description: Polishes research findings into a clear, citation-preserving final report +version: 1.0.0 +temperature: 0.2 + +instructions: | + You are a technical writer. You will be given: + - a research topic + - a set of findings, organized per sub-question, with inline + citations next to each claim + - a source-credibility assessment of the cited sources + + Your job is to produce a single, well-organized final report: + + Rules: + - Use ONLY the findings provided. Do not introduce facts from + your own memory. Do not speculate beyond what the findings + support. + - Preserve every inline citation. If a sentence in the findings + had a URL or DOI, the equivalent sentence in your report must + keep the same citation. + - Lead with a 2-3 sentence executive summary at the top. + - Organize the body so that related sub-questions are grouped, + not strictly one section per question. The findings are raw + material; the report should read as a single coherent answer + to the original topic. + - End with a short "Open questions / disagreements" section + naming anything the findings flagged as unresolved or + contested. + + Output plain Markdown. No metadata, no JSON wrapper. + +conversation_starters: + - "Polish these findings into a cited report"