feat: created new graph-based deep-research agent

This commit is contained in:
2026-05-21 11:27:55 -06:00
parent 597f823bdf
commit d81d233527
13 changed files with 1037 additions and 0 deletions
+274
View File
@@ -0,0 +1,274 @@
# deep-research
A deep web research agent, built as a Loki graph agent. It plans an
investigation, decomposes it into sub-questions researched in
parallel, grounds the work in a local knowledge corpus, vets the
credibility of cited sources, runs a reflexion self-critique loop to
revise weak findings, delegates the final write-up to a focused
sub-agent, checks that the cited sources are reachable, and gates the
result behind human approval.
Unlike a regular agent (which takes a goal and improvises the steps),
this agent runs a fixed graph: every request goes through the same
`plan -> parallel research -> vet -> critique -> synthesize -> verify -> approve`
pipeline.
This agent is also the **canonical reference for the Loki graph
system**: it exercises every node type (`script`, `llm`, `rag`, `map`,
`agent`, `input`, `approval`, `end`) and both static fan-out and
dynamic `map` fan-out. If you are learning how to build a graph
agent, this is the file to read alongside the
[Graph-Agents wiki](https://github.com/Dark-Alex-17/loki/wiki/Graph-Agents).
## Workflow
17 nodes. `->` is the static route; a script node can also route
dynamically via `_next`. The `▶▶` line is a parallel super-step —
those branches run concurrently:
```
parse_request (script) -> bootstrap_research (or -> ask_topic if no topic)
ask_topic (input) -> bootstrap_research
bootstrap_research (script) -> [plan, knowledge_lookup] ▶▶ parallel
plan (llm + output_schema) -> research_each_question
knowledge_lookup (rag) -> research_each_question
research_each_question (map) -> combine_findings (spawns one branch per question)
└─ research_one_question (llm) (atomic; runs N×, joins at map)
combine_findings (script) -> vet_sources
vet_sources (llm + custom tool) -> critique
critique (llm) -> reflexion_gate
reflexion_gate (script) -> synthesize (or -> research_each_question: reflexion loop)
synthesize (agent: report-writer) -> verify_sources
verify_sources (script) -> approve
approve (approval) -> end_accepted ("accept")
-> end_rejected ("reject")
-> incorporate_feedback (any free-form answer)
incorporate_feedback (script) -> research_each_question (the human-feedback loop)
```
### Node-type breakdown
| Type | Nodes |
|---|---|
| `script` (Python) | `parse_request`, `bootstrap_research`, `combine_findings`, `reflexion_gate`, `verify_sources`, `incorporate_feedback` |
| `llm` (tools: `[]`) | `plan`, `critique` |
| `llm` (with tool whitelist) | `research_one_question`, `vet_sources` |
| `rag` | `knowledge_lookup` — local corpus retrieval |
| `map` | `research_each_question` — dynamic fan-out per sub-question |
| `agent` | `synthesize` — spawns the `report-writer` sub-agent |
| `input` | `ask_topic` |
| `approval` | `approve` |
| `end` | `end_accepted`, `end_rejected` |
## Parallel execution
The graph has two parallel super-steps where Loki's BSP scheduler runs
branches concurrently.
**1. Context loading (`plan` ‖ `knowledge_lookup`)** — after
`bootstrap_research`, the LLM planner (which decomposes the topic into
sub-questions) and the RAG retrieval over the local `knowledge/`
corpus run side by side. They write disjoint state keys (`plan` writes
`research_plan` and `questions`; `knowledge_lookup` writes
`local_context` and `local_sources`) so no reducer is needed.
**2. Per-question research (`research_each_question` map)** — the
plan emits a `questions` array (3-5 entries, enforced by its
`output_schema`). The `map` node spawns one parallel branch per
question (`max_concurrency: 3`). Each branch is an isolated
`research_one_question` LLM invocation with web tools, instructed to
investigate exactly its assigned question. Outputs collect into
`question_findings` in input order, then `combine_findings` joins
them into a single `findings` Markdown document for downstream nodes.
`settings.max_concurrency: 4` is the graph-wide cap; the per-`map`
override (`max_concurrency: 3` on `research_each_question`) is
deliberately lower to leave headroom for the planner's tool calls
running alongside RAG.
## Local knowledge corpus
`knowledge_lookup` is a `rag` node — it runs hybrid (vector + keyword)
retrieval over every file in `knowledge/`. The directory ships with a
small `research-style-notes.md` so the RAG node has something to
retrieve against on a clean install; drop your own Markdown notes,
PDFs, or text files into `knowledge/` to bias the research toward
your local context.
The knowledge base is built once, at agent-load time, into
`~/.config/loki/agents/deep-research/knowledge_lookup.yaml`. Because
the node fully specifies its build config (`embedding_model`,
`chunk_size`, `chunk_overlap`), the build is non-interactive. Delete
that cached file after adding or changing knowledge to force a
rebuild.
## Sub-agent: report-writer
The `synthesize` node is an `agent` node that spawns the
`report-writer` sub-agent (`assets/agents/report-writer/`). This is
the agent-as-tool pattern: the orchestrating graph delegates the
writing phase to a focused sub-agent dedicated to coherent prose,
while the research phase uses different (typically cheaper) LLM nodes
for fast-and-many-question investigation.
The `report-writer` sub-agent has no tools — it cannot access the
web, cannot search, and cannot invent facts. It reads only the
findings it is given and produces a final Markdown report preserving
every inline citation. See `assets/agents/report-writer/README.md`
for details.
## Tools and tool scoping
This agent demonstrates Loki's three tool sources and how an `llm`
node's `tools:` whitelist scopes them per node.
The agent's full tool universe, declared in `graph.yaml`:
- **Global tools** (`global_tools`): `web_search_loki`,
`fetch_url_via_curl`, `search_arxiv` - Loki's built-in tool scripts.
- **MCP server** (`mcp_servers`): `ddg-search` - a DuckDuckGo web
search MCP server. Referenced in a whitelist as `mcp:ddg-search`.
- **Custom agent tool** (`tools.sh`): `classify_source` - a
deterministic source-credibility classifier shipped with this agent.
No node receives all of these. Each `llm` node's `tools:` whitelist
narrows the universe to exactly what that step needs:
| Node | `tools:` whitelist | Draws from |
|---|---|---|
| `plan`, `critique` | `[]` | nothing - pure reasoning |
| `research_one_question` | `web_search_loki`, `fetch_url_via_curl`, `search_arxiv`, `mcp:ddg-search` | global tools + MCP |
| `vet_sources` | `classify_source` | the custom tool only |
`research_one_question` (each parallel branch of the map) can search
and fetch but cannot classify sources; `vet_sources` can classify
sources but cannot touch the web. That separation is the point of the
`tools:` whitelist: a node gets only the tools its job calls for,
never the agent's full set.
The `classify_source` custom tool (`tools.sh`) takes a URL and returns
a credibility tier (government, academic, preprint, organization,
unverified) derived from the host and top-level domain. It is
deterministic - exactly the kind of logic a tool should own rather than
the LLM guessing.
Web search may require API-key configuration; see the
[Tools](https://github.com/Dark-Alex-17/loki/wiki/Tools) docs.
`fetch_url_via_curl`, `search_arxiv`, and `classify_source` work
without a key.
## Setup
`research_one_question` (each parallel branch of the `map`) uses the
`ddg-search` MCP server via `mcp:ddg-search`. It is one of Loki's
default MCP servers; make sure it is registered in
`~/.config/loki/mcp.json` (run `loki --install mcp_config` to restore
the default template if it is missing). If `ddg-search` is unavailable,
the branches still have their global web-search tools to fall back on.
The `synthesize` node spawns the `report-writer` sub-agent. Both
agents ship with `loki agents install`; if you install one manually,
install both so the agent reference resolves.
## Reflexion
The agent has two loops, both built with script nodes that route via
`_next`. The engine allows back-edges at runtime; the validator only
rejects cycles built from static `next` / `routes` edges, so script
`_next` loops are always allowed.
**Automated reflexion loop.** After the parallel research map and
`vet_sources`, the `critique` node reviews the merged findings
against the research plan and the source credibility assessment, and
emits `VERDICT: PASS` or `VERDICT: REVISE` with specific feedback.
`reflexion_gate.py` then:
- `PASS` -> continue to `synthesize`.
- `REVISE`, budget remaining -> loop back to `research_each_question`,
with the critique injected as `research_feedback` so every parallel
branch sees it on the retry.
- `REVISE`, budget spent -> continue to `synthesize` anyway (the human
approval step is the final backstop).
The budget is `MAX_REFLEXION_REVISIONS` in `reflexion_gate.py`
(default 2, so the research map runs at most 3 times per pass).
**Human-feedback loop.** At `approve` the user answers `accept`,
`reject`, or types their own feedback. A free-form answer routes via
the approval node's `on_other` to `incorporate_feedback.py`, which
folds that text into `research_feedback` and loops back to
`research_each_question` for another parallel pass.
`settings.max_loop_iterations` (40) is the engine's infinite-loop
backstop: it caps the total visits to any single node.
## Running
```sh
loki agents install # ships deep-research
loki -a deep-research "How does HTTP/3 differ from HTTP/2?"
loki -a deep-research "Recent advances in solid-state batteries"
loki -a deep-research # no prompt -> triggers ask_topic
```
## Anti-hallucination
- `research_one_question` (each map branch) is instructed to back
every claim with a real retrieved source and never to fabricate
URLs, titles, or DOIs.
- `vet_sources` classifies every cited source so weak sources are
visible to the critique step.
- `critique` independently reviews the merged findings and sends weak
or uncited work back for another parallel research pass.
- `synthesize` (the `report-writer` sub-agent) is grounded: it may use
only the gathered findings and must keep each claim's inline source.
It has no tools and cannot browse the web.
- `verify_sources` probes every cited URL / DOI with an HTTP HEAD
request and reports which are unreachable, so the human reviewer
sees broken citations before approving.
## Customizing
- **Loop budget.** `MAX_REFLEXION_REVISIONS` in `reflexion_gate.py`.
- **Map concurrency.** The `research_each_question` node's
`max_concurrency: 3` caps simultaneous web-research branches.
Raise to investigate more questions in parallel; lower to be gentle
on rate-limited providers.
- **Per-node model.** Add `model: anthropic:...` to any `llm` node.
Cheap models work well for `plan` / `critique` / `vet_sources`; the
heavy intelligence is needed in `research_one_question` and the
`report-writer` sub-agent.
- **Tool scope.** Narrow the `research_one_question` node's `tools:`
list to constrain where each branch looks (for example, drop
`web_search_loki` and `mcp:ddg-search` to force arXiv-only
research).
- **Local knowledge.** Drop files into `knowledge/` to bias every
research branch toward your local context (see the *Local
knowledge corpus* section above).
- **Different writer.** Replace `agent: report-writer` on the
`synthesize` node with the name of any other agent. The
orchestrator does not care what kind of agent the writer is.
- **Skip approval.** Point both `approve` routes at `end_accepted`,
or wire `verify_sources` straight to an `end` node.
## Files
```
assets/agents/deep-research/
graph.yaml - agent config + 17-node workflow
tools.sh - classify_source custom tool
README.md - this file
knowledge/
README.md - corpus-format notes
research-style-notes.md - starter knowledge file (replace with your notes)
scripts/
parse_request.py - _next: bootstrap_research, or ask_topic if no topic
bootstrap_research.py - fan-out source: next [plan, knowledge_lookup]
combine_findings.py - joins map output (question_findings) into findings
reflexion_gate.py - _next: research_each_question (revise) or synthesize
verify_sources.py - HTTP HEAD on cited URLs / DOIs
incorporate_feedback.py - _next: research_each_question, with user feedback
```
See also `assets/agents/report-writer/` — the sub-agent the
`synthesize` node spawns.
+294
View File
@@ -0,0 +1,294 @@
name: deep-research
description: |
Deep web research workflow. Plans an investigation, decomposes it
into sub-questions researched in parallel, grounds the work in a
local knowledge corpus, vets the credibility of cited sources, runs
a reflexion self-critique loop to revise weak or incomplete findings,
delegates the final write-up to a focused sub-agent, checks that the
cited sources are reachable, and gates the result behind human
approval. A reviewer's free-form feedback at the approval step feeds
back into another research pass.
This is the canonical Loki graph-agent reference: it exercises every
node type (script, llm, rag, map, agent, input, approval, end) and
both static fan-out and dynamic map fan-out.
version: "1.0"
temperature: 0.0
global_tools:
- web_search_loki.sh
- fetch_url_via_curl.sh
- search_arxiv.sh
mcp_servers:
- ddg-search
conversation_starters:
- "How does HTTP/3 differ from HTTP/2?"
- "Summarize recent advances in solid-state battery chemistry"
settings:
max_loop_iterations: 40
log_state_snapshots: false
validate_before_run: true
max_concurrency: 4
initial_state:
research_feedback: ""
research_attempts: 0
local_context: ""
local_sources: ""
start: parse_request
nodes:
parse_request:
id: parse_request
type: script
script: scripts/parse_request.py
next: bootstrap_research
ask_topic:
id: ask_topic
type: input
question: "What would you like me to research?"
validation: "len(input) > 0"
state_updates:
topic: "{{input}}"
next: bootstrap_research
bootstrap_research:
id: bootstrap_research
type: script
script: scripts/bootstrap_research.py
next: [plan, knowledge_lookup]
plan:
id: plan
type: llm
instructions: |
You are a research planner. Given a topic, produce a focused
research plan and decompose it into 3-5 specific sub-questions
that can each be researched independently in parallel.
The plan is a short narrative naming the key questions and the
kinds of sources that would be authoritative. The sub-questions
are precise, self-contained queries (each one is sent on its own
to a separate research worker, so they must be answerable
without each other's context).
prompt: "Research topic: {{topic}}"
tools: []
output_schema:
type: object
properties:
research_plan:
type: string
description: A short plan narrative.
questions:
type: array
items: { type: string }
minItems: 1
maxItems: 6
description: 3-5 specific, self-contained sub-questions.
required: [research_plan, questions]
next: research_each_question
knowledge_lookup:
id: knowledge_lookup
type: rag
documents:
- ./knowledge/
query: "{{topic}}"
top_k: 6
embedding_model: openai:text-embedding-3-small
chunk_size: 1000
chunk_overlap: 100
state_updates:
local_context: "{{output.context}}"
local_sources: "{{output.sources}}"
next: research_each_question
research_each_question:
id: research_each_question
type: map
over: "{{questions}}"
as: question
branch: research_one_question
collect_into: question_findings
max_concurrency: 3
next: combine_findings
research_one_question:
id: research_one_question
type: llm
instructions: |
You are a web research assistant. Investigate the SINGLE question
given to you using your tools: search the web, fetch and read
pages, and search arXiv for academic sources.
Rules:
- Every factual claim must be backed by a real source you
actually retrieved. Never fabricate URLs, page titles,
authors, or DOIs.
- Prefer primary and authoritative sources over aggregators.
- Where sources disagree, report the disagreement rather than
papering over it.
- Put the URL (or DOI) inline next to each claim it supports.
Return organized findings in plain text. Do not include
meta-commentary about the process.
prompt: |
Research question: {{question}}
Local context that may help:
{{local_context}}
{{research_feedback}}
tools:
- web_search_loki
- fetch_url_via_curl
- search_arxiv
- mcp:ddg-search
max_iterations: 10
max_attempts: 2
temperature: 0.1
combine_findings:
id: combine_findings
type: script
script: scripts/combine_findings.py
next: vet_sources
vet_sources:
id: vet_sources
type: llm
instructions: |
You assess the credibility of the sources cited in a set of
research findings. For every distinct source URL in the findings,
call the `classify_source` tool to get its credibility tier. Then
summarize: which claims rest on HIGH-credibility sources, and
which rest on PREPRINT or UNVERIFIED sources and so need
corroboration. Do NOT do any new research -- assess only what is
already cited.
prompt: |
Findings to assess:
{{findings}}
tools:
- classify_source
max_iterations: 15
state_updates:
source_assessment: "{{output}}"
next: critique
critique:
id: critique
type: llm
instructions: |
You are a meticulous research reviewer. Judge whether the
findings below are good enough to synthesize a complete,
well-supported report that answers the research plan.
Mark the findings REVISE if ANY of these hold:
- A research-plan question is unanswered or only weakly
addressed.
- A factual claim has no source, or cites a source that looks
fabricated.
- The findings lean on a single source where corroboration is
needed.
- A key claim rests only on a PREPRINT or UNVERIFIED source,
per the source credibility assessment below.
- An obvious counter-perspective or recent development is
missing.
Otherwise mark them PASS.
Respond in EXACTLY this format, nothing else:
VERDICT: <PASS or REVISE>
FEEDBACK: <if REVISE, be specific and actionable -- name the gaps
and what kind of source would close them; if PASS, write "none">
prompt: |
Research plan:
{{research_plan}}
Findings under review:
{{findings}}
Source credibility assessment:
{{source_assessment}}
tools: []
state_updates:
critique: "{{output}}"
next: reflexion_gate
reflexion_gate:
id: reflexion_gate
type: script
script: scripts/reflexion_gate.py
next: synthesize
synthesize:
id: synthesize
type: agent
agent: report-writer
prompt: |
Research topic: {{topic}}
Findings (organized by sub-question, with inline citations):
{{findings}}
Source credibility assessment:
{{source_assessment}}
Produce the final report following your instructions.
timeout: 300
state_updates:
report: "{{output}}"
next: verify_sources
verify_sources:
id: verify_sources
type: script
script: scripts/verify_sources.py
next: approve
approve:
id: approve
type: approval
question: |
Research report on: {{topic}}
{{report}}
----
{{source_check}}
----
Accept this report? Pick "accept" or "reject", or type specific
feedback to send the research back for another pass.
options:
- "accept"
- "reject"
routes:
"accept": end_accepted
"reject": end_rejected
on_other: incorporate_feedback
state_updates:
decision: "{{choice}}"
incorporate_feedback:
id: incorporate_feedback
type: script
script: scripts/incorporate_feedback.py
end_accepted:
id: end_accepted
type: end
output: "{{report}}"
end_rejected:
id: end_rejected
type: end
output: "Research on '{{topic}}' was rejected and discarded."
@@ -0,0 +1,23 @@
# Local knowledge corpus for deep-research
The `knowledge_lookup` node in `graph.yaml` is a `rag` node that runs
hybrid (vector + keyword) retrieval over every file in this directory.
Drop your own notes, papers (PDFs), Markdown docs, or text files here
and they will be indexed into a per-agent knowledge base on first run.
Loki supports common file types out of the box: `.md`, `.txt`, `.pdf`,
`.html`, and others. Subdirectories are walked recursively.
A small starter file (`research-style-notes.md`) ships so the RAG
node has something non-empty to retrieve against on a clean install.
Replace or extend it with your own materials to bias the research
phase toward your local context.
To force the knowledge base to rebuild after you add or change files,
delete the cached index:
```sh
rm ~/.config/loki/agents/deep-research/knowledge_lookup.yaml
```
The next run will rebuild from the current contents of this directory.
@@ -0,0 +1,49 @@
# Research style notes
These are general principles the `deep-research` agent should keep in
mind regardless of topic. Replace this file with your own notes if you
want to bias retrieval toward your local context.
## What "good research" means here
- **Every factual claim cites a source you actually retrieved.** Never
fabricate URLs, page titles, authors, or DOIs.
- **Primary sources beat aggregators.** Prefer the original paper, the
RFC, the standards body, or the manufacturer over a blog summarizing
them.
- **Corroboration matters where stakes are high.** If a single source
makes a strong claim, look for a second independent source before
taking it as established.
- **Disagreement is information, not noise.** If two credible sources
disagree, report the disagreement and the reasoning on each side.
- **Old does not mean wrong.** A 2014 RFC is still authoritative if no
newer one has obsoleted it; check before assuming a source is stale.
## Source-tier heuristics
The `vet_sources` node uses these rough tiers to weigh credibility.
The custom tool `classify_source` (see `tools.sh`) implements this
deterministically by hostname / TLD.
- **HIGH:** government domains (`.gov`, `.mil`), academic institutions
(`.edu`, university subdomains), peer-reviewed journals, standards
bodies (IETF/RFCs, W3C, ISO, IEEE, NIST), and primary documents from
the entities being researched (e.g. a vendor's official spec page).
- **PREPRINT:** arXiv, bioRxiv, medRxiv, SSRN. Useful but not yet
peer-reviewed; treat numeric claims with extra caution.
- **ORGANIZATION:** established nonprofits, standards-adjacent groups,
industry consortia. Reliable for their stated mission but may have a
perspective.
- **UNVERIFIED:** general web pages, blogs, news aggregators, social
media. Useful for leads but should not be the only source for a
factual claim.
## Common pitfalls to flag in critique
- A claim cited only to a PREPRINT or UNVERIFIED source on a numeric
or contested point.
- A research-plan question that the findings address only obliquely.
- "Findings" that paraphrase a single source three times rather than
triangulating.
- Citation collisions where two sources are listed but turn out to
be the same study reported via different aggregators.
@@ -0,0 +1,18 @@
#!/usr/bin/env python3
"""Fan-out source for context loading.
Has no logic of its own. Exists so the static `next: [plan, knowledge_lookup]`
list on this node fans out into two parallel branches (the LLM planner and
the RAG knowledge lookup) as a single super-step. The validator requires
declared parallel-branch script outputs, so we emit an empty JSON object
explicitly here.
"""
import json
def main():
print(json.dumps({}))
if __name__ == "__main__":
main()
@@ -0,0 +1,39 @@
#!/usr/bin/env python3
"""Join the per-question map outputs into a single `findings` string.
The `research_each_question` map writes `question_findings` (an array,
one entry per sub-question, in input order). Downstream nodes
(`vet_sources`, `critique`, `synthesize`) read `{{findings}}` as a
single block, so this script renders the array as a Markdown document
with one section per question.
"""
import json
import os
def load_state():
path = os.environ.get("GRAPH_STATE_FILE")
if path:
with open(path) as f:
return json.load(f)
return json.loads(os.environ.get("GRAPH_STATE", "{}"))
def main():
state = load_state()
questions = state.get("questions") or []
per_question = state.get("question_findings") or []
sections = []
for idx, q in enumerate(questions):
body = per_question[idx] if idx < len(per_question) else ""
if isinstance(body, dict) or isinstance(body, list):
body = json.dumps(body, indent=2)
sections.append(f"## {q}\n\n{body}")
findings = "\n\n".join(sections) if sections else "No findings gathered."
print(json.dumps({"findings": findings}))
if __name__ == "__main__":
main()
@@ -0,0 +1,41 @@
#!/usr/bin/env python3
"""Fold a reviewer's free-form feedback back into the research loop.
Runs when the user answers the approval step with their own text
instead of "accept" or "reject". That text (saved by the approval node
as `decision`) becomes `research_feedback`, and the graph loops back to
`research_each_question` for another informed pass (each sub-question is
re-researched in parallel with the new feedback in context). The
reflexion counter is reset so the user-driven pass gets a fresh revision
budget.
Routing (`_next`): always research_each_question.
"""
import json
import os
def load_state():
path = os.environ.get("GRAPH_STATE_FILE")
if path:
with open(path) as f:
return json.load(f)
return json.loads(os.environ.get("GRAPH_STATE", "{}"))
def main():
state = load_state()
feedback = (state.get("decision") or "").strip()
output = {
"_next": "research_each_question",
"research_attempts": 0,
"research_feedback": (
"The user reviewed the report and asked for changes. Treat "
"this as the top priority for the next pass:\n\n" + feedback
),
}
print(json.dumps(output))
if __name__ == "__main__":
main()
@@ -0,0 +1,35 @@
#!/usr/bin/env python3
"""Entry router for deep-research.
Reads the caller's prompt from state. If it contains a usable research
topic, stores it as `topic` and falls through to the static `next`
(plan). If the prompt is empty, routes to `ask_topic` so the user can
supply one interactively.
Routing (`_next`):
- prompt present -> (no _next; static next: plan)
- prompt empty -> ask_topic
"""
import json
import os
def load_state():
path = os.environ.get("GRAPH_STATE_FILE")
if path:
with open(path) as f:
return json.load(f)
return json.loads(os.environ.get("GRAPH_STATE", "{}"))
def main():
state = load_state()
prompt = (state.get("initial_prompt") or "").strip()
if prompt:
print(json.dumps({"topic": prompt}))
else:
print(json.dumps({"_next": "ask_topic"}))
if __name__ == "__main__":
main()
@@ -0,0 +1,76 @@
#!/usr/bin/env python3
"""Reflexion gate for deep-research.
Runs after `critique` has reviewed the current research findings. If the
critique's verdict is REVISE and the reflexion budget is not spent,
loops back to `research` with the critique attached as
`research_feedback`, so the retry is informed rather than a blind
re-run. Otherwise it proceeds to `synthesize`.
Routing (`_next`):
- verdict PASS -> synthesize
- verdict REVISE, budget remaining -> research_each_question (+ research_feedback)
- verdict REVISE, budget spent -> synthesize
Reflexion is a best-effort quality booster, not a hard gate: once the
budget is spent the workflow proceeds anyway, and the human approval
step is the final backstop.
"""
import json
import os
import re
# Automated revision passes allowed. `research` runs at most
# MAX_REFLEXION_REVISIONS + 1 times per user pass. Bump to allow more.
MAX_REFLEXION_REVISIONS = 2
def load_state():
path = os.environ.get("GRAPH_STATE_FILE")
if path:
with open(path) as f:
return json.load(f)
return json.loads(os.environ.get("GRAPH_STATE", "{}"))
def as_int(value, default=0):
try:
return int(value)
except (TypeError, ValueError):
return default
def parse_verdict(critique):
"""Pull PASS/REVISE from the critique's `VERDICT:` line. Defaults to
PASS when no verdict line is found, so a malformed critique lets the
workflow proceed instead of burning the whole revision budget."""
match = re.search(r"VERDICT:\s*([A-Za-z]+)", critique, re.IGNORECASE)
if not match:
return "PASS"
return match.group(1).upper()
def main():
state = load_state()
critique = state.get("critique") or ""
verdict = parse_verdict(critique)
attempts = as_int(state.get("research_attempts"))
if verdict == "REVISE" and attempts < MAX_REFLEXION_REVISIONS:
feedback = (
"A reviewer judged the previous research pass incomplete. "
"Address every point in the critique below:\n\n" + critique
)
output = {
"_next": "research_each_question",
"research_attempts": attempts + 1,
"research_feedback": feedback,
}
else:
output = {"_next": "synthesize"}
print(json.dumps(output))
if __name__ == "__main__":
main()
@@ -0,0 +1,69 @@
#!/usr/bin/env python3
"""Check that the sources cited in the research report are reachable.
Scans the final report for URLs and DOIs, probes each with a HEAD
request, and writes a `source_check` summary into state so the human
reviewer sees broken citations at the approval step.
Times out per request so a slow source cannot stall the graph.
"""
import json
import os
import re
import urllib.error
import urllib.request
DOI_RE = re.compile(r"\b(10\.\d{4,9}/[-._;()/:A-Z0-9]+)", re.IGNORECASE)
URL_RE = re.compile(r"https?://[^\s)\]\}\"'>]+")
def load_state():
path = os.environ.get("GRAPH_STATE_FILE")
if path:
with open(path) as f:
return json.load(f)
return json.loads(os.environ.get("GRAPH_STATE", "{}"))
def reachable(url, timeout=5.0):
req = urllib.request.Request(url, method="HEAD")
try:
with urllib.request.urlopen(req, timeout=timeout) as resp:
return 200 <= resp.status < 400
except urllib.error.HTTPError as e:
return 200 <= e.code < 400
except Exception:
return False
def main():
state = load_state()
report = state.get("report") or ""
urls = sorted({u.rstrip(".,;)") for u in URL_RE.findall(report)})
dois = sorted(set(DOI_RE.findall(report)))
results = []
for url in urls:
ok = reachable(url)
results.append(f" {'OK' if ok else 'UNREACHABLE'} {url}")
for doi in dois:
url = f"https://doi.org/{doi}"
if url in urls:
continue
ok = reachable(url)
results.append(f" {'OK' if ok else 'UNREACHABLE'} DOI {doi} ({url})")
if not results:
summary = "No web sources were cited in the report."
else:
summary = (
f"Source reachability ({len(results)} checked):\n"
+ "\n".join(results)
)
print(json.dumps({"source_check": summary}))
if __name__ == "__main__":
main()
+39
View File
@@ -0,0 +1,39 @@
#!/usr/bin/env bash
set -e
# @env LLM_OUTPUT=/dev/stdout The output path
# @cmd Classify the credibility tier of a web source from its URL.
# A deterministic check based on the host and top-level domain. Use it
# to weigh how much trust to place in a source before relying on it.
# @option --url! The full source URL to classify
classify_source() {
# shellcheck disable=SC2154
local url="$argc_url"
local host="${url#*://}"
host="${host%%/*}"
host="${host##*@}"
host="${host%%:*}"
host="$(printf '%s' "$host" | tr '[:upper:]' '[:lower:]')"
local tier
case "$host" in
'')
tier="UNKNOWN - no host could be parsed from the URL" ;;
*.gov | *.gov.* | *.mil)
tier="HIGH - government source" ;;
*.edu | *.edu.* | *.ac.*)
tier="HIGH - academic institution" ;;
arxiv.org | *.arxiv.org | biorxiv.org | *.biorxiv.org | medrxiv.org | *.medrxiv.org | ssrn.com | *.ssrn.com)
tier="PREPRINT - not yet peer reviewed, corroborate before citing" ;;
wikipedia.org | *.wikipedia.org)
tier="TERTIARY - encyclopedia, good for orientation not citation" ;;
*.org | *.org.*)
tier="MEDIUM - organization site, check for institutional bias" ;;
*)
tier="UNVERIFIED - general web source, corroborate before citing" ;;
esac
printf '%s: %s\n' "${host:-<none>}" "$tier" >> "$LLM_OUTPUT"
}
+46
View File
@@ -0,0 +1,46 @@
# report-writer
A tiny, focused sub-agent that turns a set of research findings into a
single coherent final report. Reads only what it is given — does not
do independent research, does not access the web, does not invent
facts. It exists as a focused tool for orchestrating agents to
delegate the writing phase to.
## Why a separate agent?
This is an example of the **agent-as-tool** pattern in graph agents.
The `deep-research` graph agent's `synthesize` node is an `agent` node
that spawns this one (see `assets/agents/deep-research/graph.yaml`).
Separating the role has two practical benefits:
- The orchestrating agent can use a cheap model (or a high-temperature
exploratory one) for the research phase, while letting the writing
phase use a different (typically lower-temperature, possibly larger)
model dedicated to coherent prose.
- The writing prompt is owned by this agent's `config.yaml` rather
than buried inside another agent's graph. You can polish it
independently without touching the research flow.
## Standalone use
You can also use this agent directly if you have a set of findings you
want polished:
```sh
loki -a report-writer "Topic: X. Findings: <paste findings here>"
```
It will produce a single Markdown report following the rules in its
system prompt: executive summary at the top, grouped sections by
related sub-questions, every inline citation preserved verbatim, and a
final "Open questions / disagreements" section.
## What it will NOT do
- Search the web, fetch URLs, query an MCP server, or use any tool.
It has no tools configured.
- Invent facts beyond what is in the findings you give it.
- Strip or rewrite citations.
These constraints are the point of the agent existing: a writer that
the orchestrator can trust to stay in its lane.
+34
View File
@@ -0,0 +1,34 @@
name: report-writer
description: Polishes research findings into a clear, citation-preserving final report
version: 1.0.0
temperature: 0.2
instructions: |
You are a technical writer. You will be given:
- a research topic
- a set of findings, organized per sub-question, with inline
citations next to each claim
- a source-credibility assessment of the cited sources
Your job is to produce a single, well-organized final report:
Rules:
- Use ONLY the findings provided. Do not introduce facts from
your own memory. Do not speculate beyond what the findings
support.
- Preserve every inline citation. If a sentence in the findings
had a URL or DOI, the equivalent sentence in your report must
keep the same citation.
- Lead with a 2-3 sentence executive summary at the top.
- Organize the body so that related sub-questions are grouped,
not strictly one section per question. The findings are raw
material; the report should read as a single coherent answer
to the original topic.
- End with a short "Open questions / disagreements" section
naming anything the findings flagged as unresolved or
contested.
Output plain Markdown. No metadata, no JSON wrapper.
conversation_starters:
- "Polish these findings into a cited report"