feat: created new graph-based deep-research agent

2026-05-21 11:27:55 -06:00
parent 738b600fa6
commit 4e88cebe28
13 changed files with 1037 additions and 0 deletions
@@ -0,0 +1,274 @@
 # deep-research
 A deep web research agent, built as a Loki graph agent. It plans an
 investigation, decomposes it into sub-questions researched in
 parallel, grounds the work in a local knowledge corpus, vets the
 credibility of cited sources, runs a reflexion self-critique loop to
 revise weak findings, delegates the final write-up to a focused
 sub-agent, checks that the cited sources are reachable, and gates the
 result behind human approval.
 Unlike a regular agent (which takes a goal and improvises the steps),
 this agent runs a fixed graph: every request goes through the same
 `plan -> parallel research -> vet -> critique -> synthesize -> verify -> approve`
 pipeline.
 This agent is also the **canonical reference for the Loki graph
 system**: it exercises every node type (`script`, `llm`, `rag`, `map`,
 `agent`, `input`, `approval`, `end`) and both static fan-out and
 dynamic `map` fan-out. If you are learning how to build a graph
 agent, this is the file to read alongside the
 [Graph-Agents wiki](https://github.com/Dark-Alex-17/loki/wiki/Graph-Agents).
 ## Workflow
 17 nodes. `->` is the static route; a script node can also route
 dynamically via `_next`. The `▶▶` line is a parallel super-step —
 those branches run concurrently:
 ```
 parse_request (script)              -> bootstrap_research   (or -> ask_topic if no topic)
 ask_topic (input)                   -> bootstrap_research
 bootstrap_research (script)         -> [plan, knowledge_lookup]   ▶▶ parallel
 plan (llm + output_schema)          -> research_each_question
 knowledge_lookup (rag)              -> research_each_question
 research_each_question (map)        -> combine_findings    (spawns one branch per question)
  └─ research_one_question (llm)    (atomic; runs N×, joins at map)
 combine_findings (script)           -> vet_sources
 vet_sources (llm + custom tool)     -> critique
 critique (llm)                      -> reflexion_gate
 reflexion_gate (script)             -> synthesize  (or -> research_each_question: reflexion loop)
 synthesize (agent: report-writer)   -> verify_sources
 verify_sources (script)             -> approve
 approve (approval)                  -> end_accepted          ("accept")
                                    -> end_rejected          ("reject")
                                    -> incorporate_feedback   (any free-form answer)
 incorporate_feedback (script)       -> research_each_question (the human-feedback loop)
 ```
 ### Node-type breakdown
 | Type | Nodes |
 |---|---|
 | `script` (Python) | `parse_request`, `bootstrap_research`, `combine_findings`, `reflexion_gate`, `verify_sources`, `incorporate_feedback` |
 | `llm` (tools: `[]`) | `plan`, `critique` |
 | `llm` (with tool whitelist) | `research_one_question`, `vet_sources` |
 | `rag` | `knowledge_lookup` — local corpus retrieval |
 | `map` | `research_each_question` — dynamic fan-out per sub-question |
 | `agent` | `synthesize` — spawns the `report-writer` sub-agent |
 | `input` | `ask_topic` |
 | `approval` | `approve` |
 | `end` | `end_accepted`, `end_rejected` |
 ## Parallel execution
 The graph has two parallel super-steps where Loki's BSP scheduler runs
 branches concurrently.
 **1. Context loading (`plan` ‖ `knowledge_lookup`)** — after
 `bootstrap_research`, the LLM planner (which decomposes the topic into
 sub-questions) and the RAG retrieval over the local `knowledge/`
 corpus run side by side. They write disjoint state keys (`plan` writes
 `research_plan` and `questions`; `knowledge_lookup` writes
 `local_context` and `local_sources`) so no reducer is needed.
 **2. Per-question research (`research_each_question` map)** — the
 plan emits a `questions` array (3-5 entries, enforced by its
 `output_schema`). The `map` node spawns one parallel branch per
 question (`max_concurrency: 3`). Each branch is an isolated
 `research_one_question` LLM invocation with web tools, instructed to
 investigate exactly its assigned question. Outputs collect into
 `question_findings` in input order, then `combine_findings` joins
 them into a single `findings` Markdown document for downstream nodes.
 `settings.max_concurrency: 4` is the graph-wide cap; the per-`map`
 override (`max_concurrency: 3` on `research_each_question`) is
 deliberately lower to leave headroom for the planner's tool calls
 running alongside RAG.
 ## Local knowledge corpus
 `knowledge_lookup` is a `rag` node — it runs hybrid (vector + keyword)
 retrieval over every file in `knowledge/`. The directory ships with a
 small `research-style-notes.md` so the RAG node has something to
 retrieve against on a clean install; drop your own Markdown notes,
 PDFs, or text files into `knowledge/` to bias the research toward
 your local context.
 The knowledge base is built once, at agent-load time, into
 `~/.config/loki/agents/deep-research/knowledge_lookup.yaml`. Because
 the node fully specifies its build config (`embedding_model`,
 `chunk_size`, `chunk_overlap`), the build is non-interactive. Delete
 that cached file after adding or changing knowledge to force a
 rebuild.
 ## Sub-agent: report-writer
 The `synthesize` node is an `agent` node that spawns the
 `report-writer` sub-agent (`assets/agents/report-writer/`). This is
 the agent-as-tool pattern: the orchestrating graph delegates the
 writing phase to a focused sub-agent dedicated to coherent prose,
 while the research phase uses different (typically cheaper) LLM nodes
 for fast-and-many-question investigation.
 The `report-writer` sub-agent has no tools — it cannot access the
 web, cannot search, and cannot invent facts. It reads only the
 findings it is given and produces a final Markdown report preserving
 every inline citation. See `assets/agents/report-writer/README.md`
 for details.
 ## Tools and tool scoping
 This agent demonstrates Loki's three tool sources and how an `llm`
 node's `tools:` whitelist scopes them per node.
 The agent's full tool universe, declared in `graph.yaml`:
 - **Global tools** (`global_tools`): `web_search_loki`,
  `fetch_url_via_curl`, `search_arxiv` - Loki's built-in tool scripts.
 - **MCP server** (`mcp_servers`): `ddg-search` - a DuckDuckGo web
  search MCP server. Referenced in a whitelist as `mcp:ddg-search`.
 - **Custom agent tool** (`tools.sh`): `classify_source` - a
  deterministic source-credibility classifier shipped with this agent.
 No node receives all of these. Each `llm` node's `tools:` whitelist
 narrows the universe to exactly what that step needs:
 | Node | `tools:` whitelist | Draws from |
 |---|---|---|
 | `plan`, `critique` | `[]` | nothing - pure reasoning |
 | `research_one_question` | `web_search_loki`, `fetch_url_via_curl`, `search_arxiv`, `mcp:ddg-search` | global tools + MCP |
 | `vet_sources` | `classify_source` | the custom tool only |
 `research_one_question` (each parallel branch of the map) can search
 and fetch but cannot classify sources; `vet_sources` can classify
 sources but cannot touch the web. That separation is the point of the
 `tools:` whitelist: a node gets only the tools its job calls for,
 never the agent's full set.
 The `classify_source` custom tool (`tools.sh`) takes a URL and returns
 a credibility tier (government, academic, preprint, organization,
 unverified) derived from the host and top-level domain. It is
 deterministic - exactly the kind of logic a tool should own rather than
 the LLM guessing.
 Web search may require API-key configuration; see the
 [Tools](https://github.com/Dark-Alex-17/loki/wiki/Tools) docs.
 `fetch_url_via_curl`, `search_arxiv`, and `classify_source` work
 without a key.
 ## Setup
 `research_one_question` (each parallel branch of the `map`) uses the
 `ddg-search` MCP server via `mcp:ddg-search`. It is one of Loki's
 default MCP servers; make sure it is registered in
 `~/.config/loki/mcp.json` (run `loki --install mcp_config` to restore
 the default template if it is missing). If `ddg-search` is unavailable,
 the branches still have their global web-search tools to fall back on.
 The `synthesize` node spawns the `report-writer` sub-agent. Both
 agents ship with `loki agents install`; if you install one manually,
 install both so the agent reference resolves.
 ## Reflexion
 The agent has two loops, both built with script nodes that route via
 `_next`. The engine allows back-edges at runtime; the validator only
 rejects cycles built from static `next` / `routes` edges, so script
 `_next` loops are always allowed.
 **Automated reflexion loop.** After the parallel research map and
 `vet_sources`, the `critique` node reviews the merged findings
 against the research plan and the source credibility assessment, and
 emits `VERDICT: PASS` or `VERDICT: REVISE` with specific feedback.
 `reflexion_gate.py` then:
 - `PASS` -> continue to `synthesize`.
 - `REVISE`, budget remaining -> loop back to `research_each_question`,
  with the critique injected as `research_feedback` so every parallel
  branch sees it on the retry.
 - `REVISE`, budget spent -> continue to `synthesize` anyway (the human
  approval step is the final backstop).
 The budget is `MAX_REFLEXION_REVISIONS` in `reflexion_gate.py`
 (default 2, so the research map runs at most 3 times per pass).
 **Human-feedback loop.** At `approve` the user answers `accept`,
 `reject`, or types their own feedback. A free-form answer routes via
 the approval node's `on_other` to `incorporate_feedback.py`, which
 folds that text into `research_feedback` and loops back to
 `research_each_question` for another parallel pass.
 `settings.max_loop_iterations` (40) is the engine's infinite-loop
 backstop: it caps the total visits to any single node.
 ## Running
 ```sh
 loki agents install                  # ships deep-research
 loki -a deep-research "How does HTTP/3 differ from HTTP/2?"
 loki -a deep-research "Recent advances in solid-state batteries"
 loki -a deep-research                # no prompt -> triggers ask_topic
 ```
 ## Anti-hallucination
 - `research_one_question` (each map branch) is instructed to back
  every claim with a real retrieved source and never to fabricate
  URLs, titles, or DOIs.
 - `vet_sources` classifies every cited source so weak sources are
  visible to the critique step.
 - `critique` independently reviews the merged findings and sends weak
  or uncited work back for another parallel research pass.
 - `synthesize` (the `report-writer` sub-agent) is grounded: it may use
  only the gathered findings and must keep each claim's inline source.
  It has no tools and cannot browse the web.
 - `verify_sources` probes every cited URL / DOI with an HTTP HEAD
  request and reports which are unreachable, so the human reviewer
  sees broken citations before approving.
 ## Customizing
 - **Loop budget.** `MAX_REFLEXION_REVISIONS` in `reflexion_gate.py`.
 - **Map concurrency.** The `research_each_question` node's
  `max_concurrency: 3` caps simultaneous web-research branches.
  Raise to investigate more questions in parallel; lower to be gentle
  on rate-limited providers.
 - **Per-node model.** Add `model: anthropic:...` to any `llm` node.
  Cheap models work well for `plan` / `critique` / `vet_sources`; the
  heavy intelligence is needed in `research_one_question` and the
  `report-writer` sub-agent.
 - **Tool scope.** Narrow the `research_one_question` node's `tools:`
  list to constrain where each branch looks (for example, drop
  `web_search_loki` and `mcp:ddg-search` to force arXiv-only
  research).
 - **Local knowledge.** Drop files into `knowledge/` to bias every
  research branch toward your local context (see the *Local
  knowledge corpus* section above).
 - **Different writer.** Replace `agent: report-writer` on the
  `synthesize` node with the name of any other agent. The
  orchestrator does not care what kind of agent the writer is.
 - **Skip approval.** Point both `approve` routes at `end_accepted`,
  or wire `verify_sources` straight to an `end` node.
 ## Files
 ```
 assets/agents/deep-research/
  graph.yaml                    - agent config + 17-node workflow
  tools.sh                      - classify_source custom tool
  README.md                     - this file
  knowledge/
    README.md                   - corpus-format notes
    research-style-notes.md     - starter knowledge file (replace with your notes)
  scripts/
    parse_request.py            - _next: bootstrap_research, or ask_topic if no topic
    bootstrap_research.py       - fan-out source: next [plan, knowledge_lookup]
    combine_findings.py         - joins map output (question_findings) into findings
    reflexion_gate.py           - _next: research_each_question (revise) or synthesize
    verify_sources.py           - HTTP HEAD on cited URLs / DOIs
    incorporate_feedback.py     - _next: research_each_question, with user feedback
 ```
 See also `assets/agents/report-writer/` — the sub-agent the
 `synthesize` node spawns.
@@ -0,0 +1,294 @@
 name: deep-research
 description: |
  Deep web research workflow. Plans an investigation, decomposes it
  into sub-questions researched in parallel, grounds the work in a
  local knowledge corpus, vets the credibility of cited sources, runs
  a reflexion self-critique loop to revise weak or incomplete findings,
  delegates the final write-up to a focused sub-agent, checks that the
  cited sources are reachable, and gates the result behind human
  approval. A reviewer's free-form feedback at the approval step feeds
  back into another research pass.
  This is the canonical Loki graph-agent reference: it exercises every
  node type (script, llm, rag, map, agent, input, approval, end) and
  both static fan-out and dynamic map fan-out.
 version: "1.0"
 temperature: 0.0
 global_tools:
  - web_search_loki.sh
  - fetch_url_via_curl.sh
  - search_arxiv.sh
 mcp_servers:
  - ddg-search
 conversation_starters:
  - "How does HTTP/3 differ from HTTP/2?"
  - "Summarize recent advances in solid-state battery chemistry"
 settings:
  max_loop_iterations: 40
  log_state_snapshots: false
  validate_before_run: true
  max_concurrency: 4
 initial_state:
  research_feedback: ""
  research_attempts: 0
  local_context: ""
  local_sources: ""
 start: parse_request
 nodes:
  parse_request:
    id: parse_request
    type: script
    script: scripts/parse_request.py
    next: bootstrap_research
  ask_topic:
    id: ask_topic
    type: input
    question: "What would you like me to research?"
    validation: "len(input) > 0"
    state_updates:
      topic: "{{input}}"
    next: bootstrap_research
  bootstrap_research:
    id: bootstrap_research
    type: script
    script: scripts/bootstrap_research.py
    next: [plan, knowledge_lookup]
  plan:
    id: plan
    type: llm
    instructions: |
      You are a research planner. Given a topic, produce a focused
      research plan and decompose it into 3-5 specific sub-questions
      that can each be researched independently in parallel.
      The plan is a short narrative naming the key questions and the
      kinds of sources that would be authoritative. The sub-questions
      are precise, self-contained queries (each one is sent on its own
      to a separate research worker, so they must be answerable
      without each other's context).
    prompt: "Research topic: {{topic}}"
    tools: []
    output_schema:
      type: object
      properties:
        research_plan:
          type: string
          description: A short plan narrative.
        questions:
          type: array
          items: { type: string }
          minItems: 1
          maxItems: 6
          description: 3-5 specific, self-contained sub-questions.
      required: [research_plan, questions]
    next: research_each_question
  knowledge_lookup:
    id: knowledge_lookup
    type: rag
    documents:
      - ./knowledge/
    query: "{{topic}}"
    top_k: 6
    embedding_model: openai:text-embedding-3-small
    chunk_size: 1000
    chunk_overlap: 100
    state_updates:
      local_context: "{{output.context}}"
      local_sources: "{{output.sources}}"
    next: research_each_question
  research_each_question:
    id: research_each_question
    type: map
    over: "{{questions}}"
    as: question
    branch: research_one_question
    collect_into: question_findings
    max_concurrency: 3
    next: combine_findings
  research_one_question:
    id: research_one_question
    type: llm
    instructions: |
      You are a web research assistant. Investigate the SINGLE question
      given to you using your tools: search the web, fetch and read
      pages, and search arXiv for academic sources.
      Rules:
        - Every factual claim must be backed by a real source you
          actually retrieved. Never fabricate URLs, page titles,
          authors, or DOIs.
        - Prefer primary and authoritative sources over aggregators.
        - Where sources disagree, report the disagreement rather than
          papering over it.
        - Put the URL (or DOI) inline next to each claim it supports.
      Return organized findings in plain text. Do not include
      meta-commentary about the process.
    prompt: |
      Research question: {{question}}
      Local context that may help:
      {{local_context}}
      {{research_feedback}}
    tools:
      - web_search_loki
      - fetch_url_via_curl
      - search_arxiv
      - mcp:ddg-search
    max_iterations: 10
    max_attempts: 2
    temperature: 0.1
  combine_findings:
    id: combine_findings
    type: script
    script: scripts/combine_findings.py
    next: vet_sources
  vet_sources:
    id: vet_sources
    type: llm
    instructions: |
      You assess the credibility of the sources cited in a set of
      research findings. For every distinct source URL in the findings,
      call the `classify_source` tool to get its credibility tier. Then
      summarize: which claims rest on HIGH-credibility sources, and
      which rest on PREPRINT or UNVERIFIED sources and so need
      corroboration. Do NOT do any new research -- assess only what is
      already cited.
    prompt: |
      Findings to assess:
      {{findings}}
    tools:
      - classify_source
    max_iterations: 15
    state_updates:
      source_assessment: "{{output}}"
    next: critique
  critique:
    id: critique
    type: llm
    instructions: |
      You are a meticulous research reviewer. Judge whether the
      findings below are good enough to synthesize a complete,
      well-supported report that answers the research plan.
      Mark the findings REVISE if ANY of these hold:
        - A research-plan question is unanswered or only weakly
          addressed.
        - A factual claim has no source, or cites a source that looks
          fabricated.
        - The findings lean on a single source where corroboration is
          needed.
        - A key claim rests only on a PREPRINT or UNVERIFIED source,
          per the source credibility assessment below.
        - An obvious counter-perspective or recent development is
          missing.
      Otherwise mark them PASS.
      Respond in EXACTLY this format, nothing else:
      VERDICT: <PASS or REVISE>
      FEEDBACK: <if REVISE, be specific and actionable -- name the gaps
      and what kind of source would close them; if PASS, write "none">
    prompt: |
      Research plan:
      {{research_plan}}
      Findings under review:
      {{findings}}
      Source credibility assessment:
      {{source_assessment}}
    tools: []
    state_updates:
      critique: "{{output}}"
    next: reflexion_gate
  reflexion_gate:
    id: reflexion_gate
    type: script
    script: scripts/reflexion_gate.py
    next: synthesize
  synthesize:
    id: synthesize
    type: agent
    agent: report-writer
    prompt: |
      Research topic: {{topic}}
      Findings (organized by sub-question, with inline citations):
      {{findings}}
      Source credibility assessment:
      {{source_assessment}}
      Produce the final report following your instructions.
    timeout: 300
    state_updates:
      report: "{{output}}"
    next: verify_sources
  verify_sources:
    id: verify_sources
    type: script
    script: scripts/verify_sources.py
    next: approve
  approve:
    id: approve
    type: approval
    question: |
      Research report on: {{topic}}
      {{report}}
      ----
      {{source_check}}
      ----
      Accept this report? Pick "accept" or "reject", or type specific
      feedback to send the research back for another pass.
    options:
      - "accept"
      - "reject"
    routes:
      "accept": end_accepted
      "reject": end_rejected
    on_other: incorporate_feedback
    state_updates:
      decision: "{{choice}}"
  incorporate_feedback:
    id: incorporate_feedback
    type: script
    script: scripts/incorporate_feedback.py
  end_accepted:
    id: end_accepted
    type: end
    output: "{{report}}"
  end_rejected:
    id: end_rejected
    type: end
    output: "Research on '{{topic}}' was rejected and discarded."
@@ -0,0 +1,23 @@
 # Local knowledge corpus for deep-research
 The `knowledge_lookup` node in `graph.yaml` is a `rag` node that runs
 hybrid (vector + keyword) retrieval over every file in this directory.
 Drop your own notes, papers (PDFs), Markdown docs, or text files here
 and they will be indexed into a per-agent knowledge base on first run.
 Loki supports common file types out of the box: `.md`, `.txt`, `.pdf`,
 `.html`, and others. Subdirectories are walked recursively.
 A small starter file (`research-style-notes.md`) ships so the RAG
 node has something non-empty to retrieve against on a clean install.
 Replace or extend it with your own materials to bias the research
 phase toward your local context.
 To force the knowledge base to rebuild after you add or change files,
 delete the cached index:
 ```sh
 rm ~/.config/loki/agents/deep-research/knowledge_lookup.yaml
 ```
 The next run will rebuild from the current contents of this directory.
@@ -0,0 +1,49 @@
 # Research style notes
 These are general principles the `deep-research` agent should keep in
 mind regardless of topic. Replace this file with your own notes if you
 want to bias retrieval toward your local context.
 ## What "good research" means here
 - **Every factual claim cites a source you actually retrieved.** Never
  fabricate URLs, page titles, authors, or DOIs.
 - **Primary sources beat aggregators.** Prefer the original paper, the
  RFC, the standards body, or the manufacturer over a blog summarizing
  them.
 - **Corroboration matters where stakes are high.** If a single source
  makes a strong claim, look for a second independent source before
  taking it as established.
 - **Disagreement is information, not noise.** If two credible sources
  disagree, report the disagreement and the reasoning on each side.
 - **Old does not mean wrong.** A 2014 RFC is still authoritative if no
  newer one has obsoleted it; check before assuming a source is stale.
 ## Source-tier heuristics
 The `vet_sources` node uses these rough tiers to weigh credibility.
 The custom tool `classify_source` (see `tools.sh`) implements this
 deterministically by hostname / TLD.
 - **HIGH:** government domains (`.gov`, `.mil`), academic institutions
  (`.edu`, university subdomains), peer-reviewed journals, standards
  bodies (IETF/RFCs, W3C, ISO, IEEE, NIST), and primary documents from
  the entities being researched (e.g. a vendor's official spec page).
 - **PREPRINT:** arXiv, bioRxiv, medRxiv, SSRN. Useful but not yet
  peer-reviewed; treat numeric claims with extra caution.
 - **ORGANIZATION:** established nonprofits, standards-adjacent groups,
  industry consortia. Reliable for their stated mission but may have a
  perspective.
 - **UNVERIFIED:** general web pages, blogs, news aggregators, social
  media. Useful for leads but should not be the only source for a
  factual claim.
 ## Common pitfalls to flag in critique
 - A claim cited only to a PREPRINT or UNVERIFIED source on a numeric
  or contested point.
 - A research-plan question that the findings address only obliquely.
 - "Findings" that paraphrase a single source three times rather than
  triangulating.
 - Citation collisions where two sources are listed but turn out to
  be the same study reported via different aggregators.
@@ -0,0 +1,18 @@
 #!/usr/bin/env python3
 """Fan-out source for context loading.
 Has no logic of its own. Exists so the static `next: [plan, knowledge_lookup]`
 list on this node fans out into two parallel branches (the LLM planner and
 the RAG knowledge lookup) as a single super-step. The validator requires
 declared parallel-branch script outputs, so we emit an empty JSON object
 explicitly here.
 """
 import json
 def main():
    print(json.dumps({}))
 if __name__ == "__main__":
    main()
@@ -0,0 +1,39 @@
 #!/usr/bin/env python3
 """Join the per-question map outputs into a single `findings` string.
 The `research_each_question` map writes `question_findings` (an array,
 one entry per sub-question, in input order). Downstream nodes
 (`vet_sources`, `critique`, `synthesize`) read `{{findings}}` as a
 single block, so this script renders the array as a Markdown document
 with one section per question.
 """
 import json
 import os
 def load_state():
    path = os.environ.get("GRAPH_STATE_FILE")
    if path:
        with open(path) as f:
            return json.load(f)
    return json.loads(os.environ.get("GRAPH_STATE", "{}"))
 def main():
    state = load_state()
    questions = state.get("questions") or []
    per_question = state.get("question_findings") or []
    sections = []
    for idx, q in enumerate(questions):
        body = per_question[idx] if idx < len(per_question) else ""
        if isinstance(body, dict) or isinstance(body, list):
            body = json.dumps(body, indent=2)
        sections.append(f"## {q}\n\n{body}")
    findings = "\n\n".join(sections) if sections else "No findings gathered."
    print(json.dumps({"findings": findings}))
 if __name__ == "__main__":
    main()
@@ -0,0 +1,41 @@
 #!/usr/bin/env python3
 """Fold a reviewer's free-form feedback back into the research loop.
 Runs when the user answers the approval step with their own text
 instead of "accept" or "reject". That text (saved by the approval node
 as `decision`) becomes `research_feedback`, and the graph loops back to
 `research_each_question` for another informed pass (each sub-question is
 re-researched in parallel with the new feedback in context). The
 reflexion counter is reset so the user-driven pass gets a fresh revision
 budget.
 Routing (`_next`): always research_each_question.
 """
 import json
 import os
 def load_state():
    path = os.environ.get("GRAPH_STATE_FILE")
    if path:
        with open(path) as f:
            return json.load(f)
    return json.loads(os.environ.get("GRAPH_STATE", "{}"))
 def main():
    state = load_state()
    feedback = (state.get("decision") or "").strip()
    output = {
        "_next": "research_each_question",
        "research_attempts": 0,
        "research_feedback": (
            "The user reviewed the report and asked for changes. Treat "
            "this as the top priority for the next pass:\n\n" + feedback
        ),
    }
    print(json.dumps(output))
 if __name__ == "__main__":
    main()
@@ -0,0 +1,35 @@
 #!/usr/bin/env python3
 """Entry router for deep-research.
 Reads the caller's prompt from state. If it contains a usable research
 topic, stores it as `topic` and falls through to the static `next`
 (plan). If the prompt is empty, routes to `ask_topic` so the user can
 supply one interactively.
 Routing (`_next`):
  - prompt present -> (no _next; static next: plan)
  - prompt empty   -> ask_topic
 """
 import json
 import os
 def load_state():
    path = os.environ.get("GRAPH_STATE_FILE")
    if path:
        with open(path) as f:
            return json.load(f)
    return json.loads(os.environ.get("GRAPH_STATE", "{}"))
 def main():
    state = load_state()
    prompt = (state.get("initial_prompt") or "").strip()
    if prompt:
        print(json.dumps({"topic": prompt}))
    else:
        print(json.dumps({"_next": "ask_topic"}))
 if __name__ == "__main__":
    main()
@@ -0,0 +1,76 @@
 #!/usr/bin/env python3
 """Reflexion gate for deep-research.
 Runs after `critique` has reviewed the current research findings. If the
 critique's verdict is REVISE and the reflexion budget is not spent,
 loops back to `research` with the critique attached as
 `research_feedback`, so the retry is informed rather than a blind
 re-run. Otherwise it proceeds to `synthesize`.
 Routing (`_next`):
  - verdict PASS                     -> synthesize
  - verdict REVISE, budget remaining -> research_each_question  (+ research_feedback)
  - verdict REVISE, budget spent     -> synthesize
 Reflexion is a best-effort quality booster, not a hard gate: once the
 budget is spent the workflow proceeds anyway, and the human approval
 step is the final backstop.
 """
 import json
 import os
 import re
 # Automated revision passes allowed. `research` runs at most
 # MAX_REFLEXION_REVISIONS + 1 times per user pass. Bump to allow more.
 MAX_REFLEXION_REVISIONS = 2
 def load_state():
    path = os.environ.get("GRAPH_STATE_FILE")
    if path:
        with open(path) as f:
            return json.load(f)
    return json.loads(os.environ.get("GRAPH_STATE", "{}"))
 def as_int(value, default=0):
    try:
        return int(value)
    except (TypeError, ValueError):
        return default
 def parse_verdict(critique):
    """Pull PASS/REVISE from the critique's `VERDICT:` line. Defaults to
    PASS when no verdict line is found, so a malformed critique lets the
    workflow proceed instead of burning the whole revision budget."""
    match = re.search(r"VERDICT:\s*([A-Za-z]+)", critique, re.IGNORECASE)
    if not match:
        return "PASS"
    return match.group(1).upper()
 def main():
    state = load_state()
    critique = state.get("critique") or ""
    verdict = parse_verdict(critique)
    attempts = as_int(state.get("research_attempts"))
    if verdict == "REVISE" and attempts < MAX_REFLEXION_REVISIONS:
        feedback = (
            "A reviewer judged the previous research pass incomplete. "
            "Address every point in the critique below:\n\n" + critique
        )
        output = {
            "_next": "research_each_question",
            "research_attempts": attempts + 1,
            "research_feedback": feedback,
        }
    else:
        output = {"_next": "synthesize"}
    print(json.dumps(output))
 if __name__ == "__main__":
    main()
@@ -0,0 +1,69 @@
 #!/usr/bin/env python3
 """Check that the sources cited in the research report are reachable.
 Scans the final report for URLs and DOIs, probes each with a HEAD
 request, and writes a `source_check` summary into state so the human
 reviewer sees broken citations at the approval step.
 Times out per request so a slow source cannot stall the graph.
 """
 import json
 import os
 import re
 import urllib.error
 import urllib.request
 DOI_RE = re.compile(r"\b(10\.\d{4,9}/[-._;()/:A-Z0-9]+)", re.IGNORECASE)
 URL_RE = re.compile(r"https?://[^\s)\]\}\"'>]+")
 def load_state():
    path = os.environ.get("GRAPH_STATE_FILE")
    if path:
        with open(path) as f:
            return json.load(f)
    return json.loads(os.environ.get("GRAPH_STATE", "{}"))
 def reachable(url, timeout=5.0):
    req = urllib.request.Request(url, method="HEAD")
    try:
        with urllib.request.urlopen(req, timeout=timeout) as resp:
            return 200 <= resp.status < 400
    except urllib.error.HTTPError as e:
        return 200 <= e.code < 400
    except Exception:
        return False
 def main():
    state = load_state()
    report = state.get("report") or ""
    urls = sorted({u.rstrip(".,;)") for u in URL_RE.findall(report)})
    dois = sorted(set(DOI_RE.findall(report)))
    results = []
    for url in urls:
        ok = reachable(url)
        results.append(f"  {'OK' if ok else 'UNREACHABLE'}  {url}")
    for doi in dois:
        url = f"https://doi.org/{doi}"
        if url in urls:
            continue
        ok = reachable(url)
        results.append(f"  {'OK' if ok else 'UNREACHABLE'}  DOI {doi} ({url})")
    if not results:
        summary = "No web sources were cited in the report."
    else:
        summary = (
            f"Source reachability ({len(results)} checked):\n"
            + "\n".join(results)
        )
    print(json.dumps({"source_check": summary}))
 if __name__ == "__main__":
    main()
@@ -0,0 +1,39 @@
 #!/usr/bin/env bash
 set -e
 # @env LLM_OUTPUT=/dev/stdout The output path
 # @cmd Classify the credibility tier of a web source from its URL.
 # A deterministic check based on the host and top-level domain. Use it
 # to weigh how much trust to place in a source before relying on it.
 # @option --url!  The full source URL to classify
 classify_source() {
    # shellcheck disable=SC2154
    local url="$argc_url"
    local host="${url#*://}"
    host="${host%%/*}"
    host="${host##*@}"
    host="${host%%:*}"
    host="$(printf '%s' "$host" | tr '[:upper:]' '[:lower:]')"
    local tier
    case "$host" in
        '')
            tier="UNKNOWN - no host could be parsed from the URL" ;;
        *.gov | *.gov.* | *.mil)
            tier="HIGH - government source" ;;
        *.edu | *.edu.* | *.ac.*)
            tier="HIGH - academic institution" ;;
        arxiv.org | *.arxiv.org | biorxiv.org | *.biorxiv.org | medrxiv.org | *.medrxiv.org | ssrn.com | *.ssrn.com)
            tier="PREPRINT - not yet peer reviewed, corroborate before citing" ;;
        wikipedia.org | *.wikipedia.org)
            tier="TERTIARY - encyclopedia, good for orientation not citation" ;;
        *.org | *.org.*)
            tier="MEDIUM - organization site, check for institutional bias" ;;
        *)
            tier="UNVERIFIED - general web source, corroborate before citing" ;;
    esac
    printf '%s: %s\n' "${host:-<none>}" "$tier" >> "$LLM_OUTPUT"
 }
@@ -0,0 +1,46 @@
 # report-writer
 A tiny, focused sub-agent that turns a set of research findings into a
 single coherent final report. Reads only what it is given — does not
 do independent research, does not access the web, does not invent
 facts. It exists as a focused tool for orchestrating agents to
 delegate the writing phase to.
 ## Why a separate agent?
 This is an example of the **agent-as-tool** pattern in graph agents.
 The `deep-research` graph agent's `synthesize` node is an `agent` node
 that spawns this one (see `assets/agents/deep-research/graph.yaml`).
 Separating the role has two practical benefits:
 - The orchestrating agent can use a cheap model (or a high-temperature
  exploratory one) for the research phase, while letting the writing
  phase use a different (typically lower-temperature, possibly larger)
  model dedicated to coherent prose.
 - The writing prompt is owned by this agent's `config.yaml` rather
  than buried inside another agent's graph. You can polish it
  independently without touching the research flow.
 ## Standalone use
 You can also use this agent directly if you have a set of findings you
 want polished:
 ```sh
 loki -a report-writer "Topic: X. Findings: <paste findings here>"
 ```
 It will produce a single Markdown report following the rules in its
 system prompt: executive summary at the top, grouped sections by
 related sub-questions, every inline citation preserved verbatim, and a
 final "Open questions / disagreements" section.
 ## What it will NOT do
 - Search the web, fetch URLs, query an MCP server, or use any tool.
  It has no tools configured.
 - Invent facts beyond what is in the findings you give it.
 - Strip or rewrite citations.
 These constraints are the point of the agent existing: a writer that
 the orchestrator can trust to stay in its lane.
@@ -0,0 +1,34 @@
 name: report-writer
 description: Polishes research findings into a clear, citation-preserving final report
 version: 1.0.0
 temperature: 0.2
 instructions: |
  You are a technical writer. You will be given:
    - a research topic
    - a set of findings, organized per sub-question, with inline
      citations next to each claim
    - a source-credibility assessment of the cited sources
  Your job is to produce a single, well-organized final report:
  Rules:
    - Use ONLY the findings provided. Do not introduce facts from
      your own memory. Do not speculate beyond what the findings
      support.
    - Preserve every inline citation. If a sentence in the findings
      had a URL or DOI, the equivalent sentence in your report must
      keep the same citation.
    - Lead with a 2-3 sentence executive summary at the top.
    - Organize the body so that related sub-questions are grouped,
      not strictly one section per question. The findings are raw
      material; the report should read as a single coherent answer
      to the original topic.
    - End with a short "Open questions / disagreements" section
      naming anything the findings flagged as unresolved or
      contested.
  Output plain Markdown. No metadata, no JSON wrapper.
 conversation_starters:
  - "Polish these findings into a cited report"