feat: created new graph-based deep-research agent
This commit is contained in:
@@ -0,0 +1,274 @@
|
||||
# deep-research
|
||||
|
||||
A deep web research agent, built as a Loki graph agent. It plans an
|
||||
investigation, decomposes it into sub-questions researched in
|
||||
parallel, grounds the work in a local knowledge corpus, vets the
|
||||
credibility of cited sources, runs a reflexion self-critique loop to
|
||||
revise weak findings, delegates the final write-up to a focused
|
||||
sub-agent, checks that the cited sources are reachable, and gates the
|
||||
result behind human approval.
|
||||
|
||||
Unlike a regular agent (which takes a goal and improvises the steps),
|
||||
this agent runs a fixed graph: every request goes through the same
|
||||
`plan -> parallel research -> vet -> critique -> synthesize -> verify -> approve`
|
||||
pipeline.
|
||||
|
||||
This agent is also the **canonical reference for the Loki graph
|
||||
system**: it exercises every node type (`script`, `llm`, `rag`, `map`,
|
||||
`agent`, `input`, `approval`, `end`) and both static fan-out and
|
||||
dynamic `map` fan-out. If you are learning how to build a graph
|
||||
agent, this is the file to read alongside the
|
||||
[Graph-Agents wiki](https://github.com/Dark-Alex-17/loki/wiki/Graph-Agents).
|
||||
|
||||
## Workflow
|
||||
|
||||
17 nodes. `->` is the static route; a script node can also route
|
||||
dynamically via `_next`. The `▶▶` line is a parallel super-step —
|
||||
those branches run concurrently:
|
||||
|
||||
```
|
||||
parse_request (script) -> bootstrap_research (or -> ask_topic if no topic)
|
||||
ask_topic (input) -> bootstrap_research
|
||||
bootstrap_research (script) -> [plan, knowledge_lookup] ▶▶ parallel
|
||||
plan (llm + output_schema) -> research_each_question
|
||||
knowledge_lookup (rag) -> research_each_question
|
||||
research_each_question (map) -> combine_findings (spawns one branch per question)
|
||||
└─ research_one_question (llm) (atomic; runs N×, joins at map)
|
||||
combine_findings (script) -> vet_sources
|
||||
vet_sources (llm + custom tool) -> critique
|
||||
critique (llm) -> reflexion_gate
|
||||
reflexion_gate (script) -> synthesize (or -> research_each_question: reflexion loop)
|
||||
synthesize (agent: report-writer) -> verify_sources
|
||||
verify_sources (script) -> approve
|
||||
approve (approval) -> end_accepted ("accept")
|
||||
-> end_rejected ("reject")
|
||||
-> incorporate_feedback (any free-form answer)
|
||||
incorporate_feedback (script) -> research_each_question (the human-feedback loop)
|
||||
```
|
||||
|
||||
### Node-type breakdown
|
||||
|
||||
| Type | Nodes |
|
||||
|---|---|
|
||||
| `script` (Python) | `parse_request`, `bootstrap_research`, `combine_findings`, `reflexion_gate`, `verify_sources`, `incorporate_feedback` |
|
||||
| `llm` (tools: `[]`) | `plan`, `critique` |
|
||||
| `llm` (with tool whitelist) | `research_one_question`, `vet_sources` |
|
||||
| `rag` | `knowledge_lookup` — local corpus retrieval |
|
||||
| `map` | `research_each_question` — dynamic fan-out per sub-question |
|
||||
| `agent` | `synthesize` — spawns the `report-writer` sub-agent |
|
||||
| `input` | `ask_topic` |
|
||||
| `approval` | `approve` |
|
||||
| `end` | `end_accepted`, `end_rejected` |
|
||||
|
||||
## Parallel execution
|
||||
|
||||
The graph has two parallel super-steps where Loki's BSP scheduler runs
|
||||
branches concurrently.
|
||||
|
||||
**1. Context loading (`plan` ‖ `knowledge_lookup`)** — after
|
||||
`bootstrap_research`, the LLM planner (which decomposes the topic into
|
||||
sub-questions) and the RAG retrieval over the local `knowledge/`
|
||||
corpus run side by side. They write disjoint state keys (`plan` writes
|
||||
`research_plan` and `questions`; `knowledge_lookup` writes
|
||||
`local_context` and `local_sources`) so no reducer is needed.
|
||||
|
||||
**2. Per-question research (`research_each_question` map)** — the
|
||||
plan emits a `questions` array (3-5 entries, enforced by its
|
||||
`output_schema`). The `map` node spawns one parallel branch per
|
||||
question (`max_concurrency: 3`). Each branch is an isolated
|
||||
`research_one_question` LLM invocation with web tools, instructed to
|
||||
investigate exactly its assigned question. Outputs collect into
|
||||
`question_findings` in input order, then `combine_findings` joins
|
||||
them into a single `findings` Markdown document for downstream nodes.
|
||||
|
||||
`settings.max_concurrency: 4` is the graph-wide cap; the per-`map`
|
||||
override (`max_concurrency: 3` on `research_each_question`) is
|
||||
deliberately lower to leave headroom for the planner's tool calls
|
||||
running alongside RAG.
|
||||
|
||||
## Local knowledge corpus
|
||||
|
||||
`knowledge_lookup` is a `rag` node — it runs hybrid (vector + keyword)
|
||||
retrieval over every file in `knowledge/`. The directory ships with a
|
||||
small `research-style-notes.md` so the RAG node has something to
|
||||
retrieve against on a clean install; drop your own Markdown notes,
|
||||
PDFs, or text files into `knowledge/` to bias the research toward
|
||||
your local context.
|
||||
|
||||
The knowledge base is built once, at agent-load time, into
|
||||
`~/.config/loki/agents/deep-research/knowledge_lookup.yaml`. Because
|
||||
the node fully specifies its build config (`embedding_model`,
|
||||
`chunk_size`, `chunk_overlap`), the build is non-interactive. Delete
|
||||
that cached file after adding or changing knowledge to force a
|
||||
rebuild.
|
||||
|
||||
## Sub-agent: report-writer
|
||||
|
||||
The `synthesize` node is an `agent` node that spawns the
|
||||
`report-writer` sub-agent (`assets/agents/report-writer/`). This is
|
||||
the agent-as-tool pattern: the orchestrating graph delegates the
|
||||
writing phase to a focused sub-agent dedicated to coherent prose,
|
||||
while the research phase uses different (typically cheaper) LLM nodes
|
||||
for fast-and-many-question investigation.
|
||||
|
||||
The `report-writer` sub-agent has no tools — it cannot access the
|
||||
web, cannot search, and cannot invent facts. It reads only the
|
||||
findings it is given and produces a final Markdown report preserving
|
||||
every inline citation. See `assets/agents/report-writer/README.md`
|
||||
for details.
|
||||
|
||||
## Tools and tool scoping
|
||||
|
||||
This agent demonstrates Loki's three tool sources and how an `llm`
|
||||
node's `tools:` whitelist scopes them per node.
|
||||
|
||||
The agent's full tool universe, declared in `graph.yaml`:
|
||||
|
||||
- **Global tools** (`global_tools`): `web_search_loki`,
|
||||
`fetch_url_via_curl`, `search_arxiv` - Loki's built-in tool scripts.
|
||||
- **MCP server** (`mcp_servers`): `ddg-search` - a DuckDuckGo web
|
||||
search MCP server. Referenced in a whitelist as `mcp:ddg-search`.
|
||||
- **Custom agent tool** (`tools.sh`): `classify_source` - a
|
||||
deterministic source-credibility classifier shipped with this agent.
|
||||
|
||||
No node receives all of these. Each `llm` node's `tools:` whitelist
|
||||
narrows the universe to exactly what that step needs:
|
||||
|
||||
| Node | `tools:` whitelist | Draws from |
|
||||
|---|---|---|
|
||||
| `plan`, `critique` | `[]` | nothing - pure reasoning |
|
||||
| `research_one_question` | `web_search_loki`, `fetch_url_via_curl`, `search_arxiv`, `mcp:ddg-search` | global tools + MCP |
|
||||
| `vet_sources` | `classify_source` | the custom tool only |
|
||||
|
||||
`research_one_question` (each parallel branch of the map) can search
|
||||
and fetch but cannot classify sources; `vet_sources` can classify
|
||||
sources but cannot touch the web. That separation is the point of the
|
||||
`tools:` whitelist: a node gets only the tools its job calls for,
|
||||
never the agent's full set.
|
||||
|
||||
The `classify_source` custom tool (`tools.sh`) takes a URL and returns
|
||||
a credibility tier (government, academic, preprint, organization,
|
||||
unverified) derived from the host and top-level domain. It is
|
||||
deterministic - exactly the kind of logic a tool should own rather than
|
||||
the LLM guessing.
|
||||
|
||||
Web search may require API-key configuration; see the
|
||||
[Tools](https://github.com/Dark-Alex-17/loki/wiki/Tools) docs.
|
||||
`fetch_url_via_curl`, `search_arxiv`, and `classify_source` work
|
||||
without a key.
|
||||
|
||||
## Setup
|
||||
|
||||
`research_one_question` (each parallel branch of the `map`) uses the
|
||||
`ddg-search` MCP server via `mcp:ddg-search`. It is one of Loki's
|
||||
default MCP servers; make sure it is registered in
|
||||
`~/.config/loki/mcp.json` (run `loki --install mcp_config` to restore
|
||||
the default template if it is missing). If `ddg-search` is unavailable,
|
||||
the branches still have their global web-search tools to fall back on.
|
||||
|
||||
The `synthesize` node spawns the `report-writer` sub-agent. Both
|
||||
agents ship with `loki agents install`; if you install one manually,
|
||||
install both so the agent reference resolves.
|
||||
|
||||
## Reflexion
|
||||
|
||||
The agent has two loops, both built with script nodes that route via
|
||||
`_next`. The engine allows back-edges at runtime; the validator only
|
||||
rejects cycles built from static `next` / `routes` edges, so script
|
||||
`_next` loops are always allowed.
|
||||
|
||||
**Automated reflexion loop.** After the parallel research map and
|
||||
`vet_sources`, the `critique` node reviews the merged findings
|
||||
against the research plan and the source credibility assessment, and
|
||||
emits `VERDICT: PASS` or `VERDICT: REVISE` with specific feedback.
|
||||
`reflexion_gate.py` then:
|
||||
|
||||
- `PASS` -> continue to `synthesize`.
|
||||
- `REVISE`, budget remaining -> loop back to `research_each_question`,
|
||||
with the critique injected as `research_feedback` so every parallel
|
||||
branch sees it on the retry.
|
||||
- `REVISE`, budget spent -> continue to `synthesize` anyway (the human
|
||||
approval step is the final backstop).
|
||||
|
||||
The budget is `MAX_REFLEXION_REVISIONS` in `reflexion_gate.py`
|
||||
(default 2, so the research map runs at most 3 times per pass).
|
||||
|
||||
**Human-feedback loop.** At `approve` the user answers `accept`,
|
||||
`reject`, or types their own feedback. A free-form answer routes via
|
||||
the approval node's `on_other` to `incorporate_feedback.py`, which
|
||||
folds that text into `research_feedback` and loops back to
|
||||
`research_each_question` for another parallel pass.
|
||||
|
||||
`settings.max_loop_iterations` (40) is the engine's infinite-loop
|
||||
backstop: it caps the total visits to any single node.
|
||||
|
||||
## Running
|
||||
|
||||
```sh
|
||||
loki agents install # ships deep-research
|
||||
loki -a deep-research "How does HTTP/3 differ from HTTP/2?"
|
||||
loki -a deep-research "Recent advances in solid-state batteries"
|
||||
loki -a deep-research # no prompt -> triggers ask_topic
|
||||
```
|
||||
|
||||
## Anti-hallucination
|
||||
|
||||
- `research_one_question` (each map branch) is instructed to back
|
||||
every claim with a real retrieved source and never to fabricate
|
||||
URLs, titles, or DOIs.
|
||||
- `vet_sources` classifies every cited source so weak sources are
|
||||
visible to the critique step.
|
||||
- `critique` independently reviews the merged findings and sends weak
|
||||
or uncited work back for another parallel research pass.
|
||||
- `synthesize` (the `report-writer` sub-agent) is grounded: it may use
|
||||
only the gathered findings and must keep each claim's inline source.
|
||||
It has no tools and cannot browse the web.
|
||||
- `verify_sources` probes every cited URL / DOI with an HTTP HEAD
|
||||
request and reports which are unreachable, so the human reviewer
|
||||
sees broken citations before approving.
|
||||
|
||||
## Customizing
|
||||
|
||||
- **Loop budget.** `MAX_REFLEXION_REVISIONS` in `reflexion_gate.py`.
|
||||
- **Map concurrency.** The `research_each_question` node's
|
||||
`max_concurrency: 3` caps simultaneous web-research branches.
|
||||
Raise to investigate more questions in parallel; lower to be gentle
|
||||
on rate-limited providers.
|
||||
- **Per-node model.** Add `model: anthropic:...` to any `llm` node.
|
||||
Cheap models work well for `plan` / `critique` / `vet_sources`; the
|
||||
heavy intelligence is needed in `research_one_question` and the
|
||||
`report-writer` sub-agent.
|
||||
- **Tool scope.** Narrow the `research_one_question` node's `tools:`
|
||||
list to constrain where each branch looks (for example, drop
|
||||
`web_search_loki` and `mcp:ddg-search` to force arXiv-only
|
||||
research).
|
||||
- **Local knowledge.** Drop files into `knowledge/` to bias every
|
||||
research branch toward your local context (see the *Local
|
||||
knowledge corpus* section above).
|
||||
- **Different writer.** Replace `agent: report-writer` on the
|
||||
`synthesize` node with the name of any other agent. The
|
||||
orchestrator does not care what kind of agent the writer is.
|
||||
- **Skip approval.** Point both `approve` routes at `end_accepted`,
|
||||
or wire `verify_sources` straight to an `end` node.
|
||||
|
||||
## Files
|
||||
|
||||
```
|
||||
assets/agents/deep-research/
|
||||
graph.yaml - agent config + 17-node workflow
|
||||
tools.sh - classify_source custom tool
|
||||
README.md - this file
|
||||
knowledge/
|
||||
README.md - corpus-format notes
|
||||
research-style-notes.md - starter knowledge file (replace with your notes)
|
||||
scripts/
|
||||
parse_request.py - _next: bootstrap_research, or ask_topic if no topic
|
||||
bootstrap_research.py - fan-out source: next [plan, knowledge_lookup]
|
||||
combine_findings.py - joins map output (question_findings) into findings
|
||||
reflexion_gate.py - _next: research_each_question (revise) or synthesize
|
||||
verify_sources.py - HTTP HEAD on cited URLs / DOIs
|
||||
incorporate_feedback.py - _next: research_each_question, with user feedback
|
||||
```
|
||||
|
||||
See also `assets/agents/report-writer/` — the sub-agent the
|
||||
`synthesize` node spawns.
|
||||
@@ -0,0 +1,294 @@
|
||||
name: deep-research
|
||||
description: |
|
||||
Deep web research workflow. Plans an investigation, decomposes it
|
||||
into sub-questions researched in parallel, grounds the work in a
|
||||
local knowledge corpus, vets the credibility of cited sources, runs
|
||||
a reflexion self-critique loop to revise weak or incomplete findings,
|
||||
delegates the final write-up to a focused sub-agent, checks that the
|
||||
cited sources are reachable, and gates the result behind human
|
||||
approval. A reviewer's free-form feedback at the approval step feeds
|
||||
back into another research pass.
|
||||
|
||||
This is the canonical Loki graph-agent reference: it exercises every
|
||||
node type (script, llm, rag, map, agent, input, approval, end) and
|
||||
both static fan-out and dynamic map fan-out.
|
||||
|
||||
version: "1.0"
|
||||
|
||||
temperature: 0.0
|
||||
|
||||
global_tools:
|
||||
- web_search_loki.sh
|
||||
- fetch_url_via_curl.sh
|
||||
- search_arxiv.sh
|
||||
|
||||
mcp_servers:
|
||||
- ddg-search
|
||||
|
||||
conversation_starters:
|
||||
- "How does HTTP/3 differ from HTTP/2?"
|
||||
- "Summarize recent advances in solid-state battery chemistry"
|
||||
|
||||
settings:
|
||||
max_loop_iterations: 40
|
||||
log_state_snapshots: false
|
||||
validate_before_run: true
|
||||
max_concurrency: 4
|
||||
|
||||
initial_state:
|
||||
research_feedback: ""
|
||||
research_attempts: 0
|
||||
local_context: ""
|
||||
local_sources: ""
|
||||
|
||||
start: parse_request
|
||||
|
||||
nodes:
|
||||
|
||||
parse_request:
|
||||
id: parse_request
|
||||
type: script
|
||||
script: scripts/parse_request.py
|
||||
next: bootstrap_research
|
||||
|
||||
ask_topic:
|
||||
id: ask_topic
|
||||
type: input
|
||||
question: "What would you like me to research?"
|
||||
validation: "len(input) > 0"
|
||||
state_updates:
|
||||
topic: "{{input}}"
|
||||
next: bootstrap_research
|
||||
|
||||
bootstrap_research:
|
||||
id: bootstrap_research
|
||||
type: script
|
||||
script: scripts/bootstrap_research.py
|
||||
next: [plan, knowledge_lookup]
|
||||
|
||||
plan:
|
||||
id: plan
|
||||
type: llm
|
||||
instructions: |
|
||||
You are a research planner. Given a topic, produce a focused
|
||||
research plan and decompose it into 3-5 specific sub-questions
|
||||
that can each be researched independently in parallel.
|
||||
|
||||
The plan is a short narrative naming the key questions and the
|
||||
kinds of sources that would be authoritative. The sub-questions
|
||||
are precise, self-contained queries (each one is sent on its own
|
||||
to a separate research worker, so they must be answerable
|
||||
without each other's context).
|
||||
prompt: "Research topic: {{topic}}"
|
||||
tools: []
|
||||
output_schema:
|
||||
type: object
|
||||
properties:
|
||||
research_plan:
|
||||
type: string
|
||||
description: A short plan narrative.
|
||||
questions:
|
||||
type: array
|
||||
items: { type: string }
|
||||
minItems: 1
|
||||
maxItems: 6
|
||||
description: 3-5 specific, self-contained sub-questions.
|
||||
required: [research_plan, questions]
|
||||
next: research_each_question
|
||||
|
||||
knowledge_lookup:
|
||||
id: knowledge_lookup
|
||||
type: rag
|
||||
documents:
|
||||
- ./knowledge/
|
||||
query: "{{topic}}"
|
||||
top_k: 6
|
||||
embedding_model: openai:text-embedding-3-small
|
||||
chunk_size: 1000
|
||||
chunk_overlap: 100
|
||||
state_updates:
|
||||
local_context: "{{output.context}}"
|
||||
local_sources: "{{output.sources}}"
|
||||
next: research_each_question
|
||||
|
||||
research_each_question:
|
||||
id: research_each_question
|
||||
type: map
|
||||
over: "{{questions}}"
|
||||
as: question
|
||||
branch: research_one_question
|
||||
collect_into: question_findings
|
||||
max_concurrency: 3
|
||||
next: combine_findings
|
||||
|
||||
research_one_question:
|
||||
id: research_one_question
|
||||
type: llm
|
||||
instructions: |
|
||||
You are a web research assistant. Investigate the SINGLE question
|
||||
given to you using your tools: search the web, fetch and read
|
||||
pages, and search arXiv for academic sources.
|
||||
|
||||
Rules:
|
||||
- Every factual claim must be backed by a real source you
|
||||
actually retrieved. Never fabricate URLs, page titles,
|
||||
authors, or DOIs.
|
||||
- Prefer primary and authoritative sources over aggregators.
|
||||
- Where sources disagree, report the disagreement rather than
|
||||
papering over it.
|
||||
- Put the URL (or DOI) inline next to each claim it supports.
|
||||
|
||||
Return organized findings in plain text. Do not include
|
||||
meta-commentary about the process.
|
||||
prompt: |
|
||||
Research question: {{question}}
|
||||
|
||||
Local context that may help:
|
||||
{{local_context}}
|
||||
|
||||
{{research_feedback}}
|
||||
tools:
|
||||
- web_search_loki
|
||||
- fetch_url_via_curl
|
||||
- search_arxiv
|
||||
- mcp:ddg-search
|
||||
max_iterations: 10
|
||||
max_attempts: 2
|
||||
temperature: 0.1
|
||||
|
||||
combine_findings:
|
||||
id: combine_findings
|
||||
type: script
|
||||
script: scripts/combine_findings.py
|
||||
next: vet_sources
|
||||
|
||||
vet_sources:
|
||||
id: vet_sources
|
||||
type: llm
|
||||
instructions: |
|
||||
You assess the credibility of the sources cited in a set of
|
||||
research findings. For every distinct source URL in the findings,
|
||||
call the `classify_source` tool to get its credibility tier. Then
|
||||
summarize: which claims rest on HIGH-credibility sources, and
|
||||
which rest on PREPRINT or UNVERIFIED sources and so need
|
||||
corroboration. Do NOT do any new research -- assess only what is
|
||||
already cited.
|
||||
prompt: |
|
||||
Findings to assess:
|
||||
{{findings}}
|
||||
tools:
|
||||
- classify_source
|
||||
max_iterations: 15
|
||||
state_updates:
|
||||
source_assessment: "{{output}}"
|
||||
next: critique
|
||||
|
||||
critique:
|
||||
id: critique
|
||||
type: llm
|
||||
instructions: |
|
||||
You are a meticulous research reviewer. Judge whether the
|
||||
findings below are good enough to synthesize a complete,
|
||||
well-supported report that answers the research plan.
|
||||
|
||||
Mark the findings REVISE if ANY of these hold:
|
||||
- A research-plan question is unanswered or only weakly
|
||||
addressed.
|
||||
- A factual claim has no source, or cites a source that looks
|
||||
fabricated.
|
||||
- The findings lean on a single source where corroboration is
|
||||
needed.
|
||||
- A key claim rests only on a PREPRINT or UNVERIFIED source,
|
||||
per the source credibility assessment below.
|
||||
- An obvious counter-perspective or recent development is
|
||||
missing.
|
||||
Otherwise mark them PASS.
|
||||
|
||||
Respond in EXACTLY this format, nothing else:
|
||||
|
||||
VERDICT: <PASS or REVISE>
|
||||
FEEDBACK: <if REVISE, be specific and actionable -- name the gaps
|
||||
and what kind of source would close them; if PASS, write "none">
|
||||
prompt: |
|
||||
Research plan:
|
||||
{{research_plan}}
|
||||
|
||||
Findings under review:
|
||||
{{findings}}
|
||||
|
||||
Source credibility assessment:
|
||||
{{source_assessment}}
|
||||
tools: []
|
||||
state_updates:
|
||||
critique: "{{output}}"
|
||||
next: reflexion_gate
|
||||
|
||||
reflexion_gate:
|
||||
id: reflexion_gate
|
||||
type: script
|
||||
script: scripts/reflexion_gate.py
|
||||
next: synthesize
|
||||
|
||||
synthesize:
|
||||
id: synthesize
|
||||
type: agent
|
||||
agent: report-writer
|
||||
prompt: |
|
||||
Research topic: {{topic}}
|
||||
|
||||
Findings (organized by sub-question, with inline citations):
|
||||
{{findings}}
|
||||
|
||||
Source credibility assessment:
|
||||
{{source_assessment}}
|
||||
|
||||
Produce the final report following your instructions.
|
||||
timeout: 300
|
||||
state_updates:
|
||||
report: "{{output}}"
|
||||
next: verify_sources
|
||||
|
||||
verify_sources:
|
||||
id: verify_sources
|
||||
type: script
|
||||
script: scripts/verify_sources.py
|
||||
next: approve
|
||||
|
||||
approve:
|
||||
id: approve
|
||||
type: approval
|
||||
question: |
|
||||
Research report on: {{topic}}
|
||||
|
||||
{{report}}
|
||||
|
||||
----
|
||||
{{source_check}}
|
||||
----
|
||||
|
||||
Accept this report? Pick "accept" or "reject", or type specific
|
||||
feedback to send the research back for another pass.
|
||||
options:
|
||||
- "accept"
|
||||
- "reject"
|
||||
routes:
|
||||
"accept": end_accepted
|
||||
"reject": end_rejected
|
||||
on_other: incorporate_feedback
|
||||
state_updates:
|
||||
decision: "{{choice}}"
|
||||
|
||||
incorporate_feedback:
|
||||
id: incorporate_feedback
|
||||
type: script
|
||||
script: scripts/incorporate_feedback.py
|
||||
|
||||
end_accepted:
|
||||
id: end_accepted
|
||||
type: end
|
||||
output: "{{report}}"
|
||||
|
||||
end_rejected:
|
||||
id: end_rejected
|
||||
type: end
|
||||
output: "Research on '{{topic}}' was rejected and discarded."
|
||||
@@ -0,0 +1,23 @@
|
||||
# Local knowledge corpus for deep-research
|
||||
|
||||
The `knowledge_lookup` node in `graph.yaml` is a `rag` node that runs
|
||||
hybrid (vector + keyword) retrieval over every file in this directory.
|
||||
Drop your own notes, papers (PDFs), Markdown docs, or text files here
|
||||
and they will be indexed into a per-agent knowledge base on first run.
|
||||
|
||||
Loki supports common file types out of the box: `.md`, `.txt`, `.pdf`,
|
||||
`.html`, and others. Subdirectories are walked recursively.
|
||||
|
||||
A small starter file (`research-style-notes.md`) ships so the RAG
|
||||
node has something non-empty to retrieve against on a clean install.
|
||||
Replace or extend it with your own materials to bias the research
|
||||
phase toward your local context.
|
||||
|
||||
To force the knowledge base to rebuild after you add or change files,
|
||||
delete the cached index:
|
||||
|
||||
```sh
|
||||
rm ~/.config/loki/agents/deep-research/knowledge_lookup.yaml
|
||||
```
|
||||
|
||||
The next run will rebuild from the current contents of this directory.
|
||||
@@ -0,0 +1,49 @@
|
||||
# Research style notes
|
||||
|
||||
These are general principles the `deep-research` agent should keep in
|
||||
mind regardless of topic. Replace this file with your own notes if you
|
||||
want to bias retrieval toward your local context.
|
||||
|
||||
## What "good research" means here
|
||||
|
||||
- **Every factual claim cites a source you actually retrieved.** Never
|
||||
fabricate URLs, page titles, authors, or DOIs.
|
||||
- **Primary sources beat aggregators.** Prefer the original paper, the
|
||||
RFC, the standards body, or the manufacturer over a blog summarizing
|
||||
them.
|
||||
- **Corroboration matters where stakes are high.** If a single source
|
||||
makes a strong claim, look for a second independent source before
|
||||
taking it as established.
|
||||
- **Disagreement is information, not noise.** If two credible sources
|
||||
disagree, report the disagreement and the reasoning on each side.
|
||||
- **Old does not mean wrong.** A 2014 RFC is still authoritative if no
|
||||
newer one has obsoleted it; check before assuming a source is stale.
|
||||
|
||||
## Source-tier heuristics
|
||||
|
||||
The `vet_sources` node uses these rough tiers to weigh credibility.
|
||||
The custom tool `classify_source` (see `tools.sh`) implements this
|
||||
deterministically by hostname / TLD.
|
||||
|
||||
- **HIGH:** government domains (`.gov`, `.mil`), academic institutions
|
||||
(`.edu`, university subdomains), peer-reviewed journals, standards
|
||||
bodies (IETF/RFCs, W3C, ISO, IEEE, NIST), and primary documents from
|
||||
the entities being researched (e.g. a vendor's official spec page).
|
||||
- **PREPRINT:** arXiv, bioRxiv, medRxiv, SSRN. Useful but not yet
|
||||
peer-reviewed; treat numeric claims with extra caution.
|
||||
- **ORGANIZATION:** established nonprofits, standards-adjacent groups,
|
||||
industry consortia. Reliable for their stated mission but may have a
|
||||
perspective.
|
||||
- **UNVERIFIED:** general web pages, blogs, news aggregators, social
|
||||
media. Useful for leads but should not be the only source for a
|
||||
factual claim.
|
||||
|
||||
## Common pitfalls to flag in critique
|
||||
|
||||
- A claim cited only to a PREPRINT or UNVERIFIED source on a numeric
|
||||
or contested point.
|
||||
- A research-plan question that the findings address only obliquely.
|
||||
- "Findings" that paraphrase a single source three times rather than
|
||||
triangulating.
|
||||
- Citation collisions where two sources are listed but turn out to
|
||||
be the same study reported via different aggregators.
|
||||
@@ -0,0 +1,18 @@
|
||||
#!/usr/bin/env python3
|
||||
"""Fan-out source for context loading.
|
||||
|
||||
Has no logic of its own. Exists so the static `next: [plan, knowledge_lookup]`
|
||||
list on this node fans out into two parallel branches (the LLM planner and
|
||||
the RAG knowledge lookup) as a single super-step. The validator requires
|
||||
declared parallel-branch script outputs, so we emit an empty JSON object
|
||||
explicitly here.
|
||||
"""
|
||||
import json
|
||||
|
||||
|
||||
def main():
|
||||
print(json.dumps({}))
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
@@ -0,0 +1,39 @@
|
||||
#!/usr/bin/env python3
|
||||
"""Join the per-question map outputs into a single `findings` string.
|
||||
|
||||
The `research_each_question` map writes `question_findings` (an array,
|
||||
one entry per sub-question, in input order). Downstream nodes
|
||||
(`vet_sources`, `critique`, `synthesize`) read `{{findings}}` as a
|
||||
single block, so this script renders the array as a Markdown document
|
||||
with one section per question.
|
||||
"""
|
||||
import json
|
||||
import os
|
||||
|
||||
|
||||
def load_state():
|
||||
path = os.environ.get("GRAPH_STATE_FILE")
|
||||
if path:
|
||||
with open(path) as f:
|
||||
return json.load(f)
|
||||
return json.loads(os.environ.get("GRAPH_STATE", "{}"))
|
||||
|
||||
|
||||
def main():
|
||||
state = load_state()
|
||||
questions = state.get("questions") or []
|
||||
per_question = state.get("question_findings") or []
|
||||
|
||||
sections = []
|
||||
for idx, q in enumerate(questions):
|
||||
body = per_question[idx] if idx < len(per_question) else ""
|
||||
if isinstance(body, dict) or isinstance(body, list):
|
||||
body = json.dumps(body, indent=2)
|
||||
sections.append(f"## {q}\n\n{body}")
|
||||
|
||||
findings = "\n\n".join(sections) if sections else "No findings gathered."
|
||||
print(json.dumps({"findings": findings}))
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
@@ -0,0 +1,41 @@
|
||||
#!/usr/bin/env python3
|
||||
"""Fold a reviewer's free-form feedback back into the research loop.
|
||||
|
||||
Runs when the user answers the approval step with their own text
|
||||
instead of "accept" or "reject". That text (saved by the approval node
|
||||
as `decision`) becomes `research_feedback`, and the graph loops back to
|
||||
`research_each_question` for another informed pass (each sub-question is
|
||||
re-researched in parallel with the new feedback in context). The
|
||||
reflexion counter is reset so the user-driven pass gets a fresh revision
|
||||
budget.
|
||||
|
||||
Routing (`_next`): always research_each_question.
|
||||
"""
|
||||
import json
|
||||
import os
|
||||
|
||||
|
||||
def load_state():
|
||||
path = os.environ.get("GRAPH_STATE_FILE")
|
||||
if path:
|
||||
with open(path) as f:
|
||||
return json.load(f)
|
||||
return json.loads(os.environ.get("GRAPH_STATE", "{}"))
|
||||
|
||||
|
||||
def main():
|
||||
state = load_state()
|
||||
feedback = (state.get("decision") or "").strip()
|
||||
output = {
|
||||
"_next": "research_each_question",
|
||||
"research_attempts": 0,
|
||||
"research_feedback": (
|
||||
"The user reviewed the report and asked for changes. Treat "
|
||||
"this as the top priority for the next pass:\n\n" + feedback
|
||||
),
|
||||
}
|
||||
print(json.dumps(output))
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
@@ -0,0 +1,35 @@
|
||||
#!/usr/bin/env python3
|
||||
"""Entry router for deep-research.
|
||||
|
||||
Reads the caller's prompt from state. If it contains a usable research
|
||||
topic, stores it as `topic` and falls through to the static `next`
|
||||
(plan). If the prompt is empty, routes to `ask_topic` so the user can
|
||||
supply one interactively.
|
||||
|
||||
Routing (`_next`):
|
||||
- prompt present -> (no _next; static next: plan)
|
||||
- prompt empty -> ask_topic
|
||||
"""
|
||||
import json
|
||||
import os
|
||||
|
||||
|
||||
def load_state():
|
||||
path = os.environ.get("GRAPH_STATE_FILE")
|
||||
if path:
|
||||
with open(path) as f:
|
||||
return json.load(f)
|
||||
return json.loads(os.environ.get("GRAPH_STATE", "{}"))
|
||||
|
||||
|
||||
def main():
|
||||
state = load_state()
|
||||
prompt = (state.get("initial_prompt") or "").strip()
|
||||
if prompt:
|
||||
print(json.dumps({"topic": prompt}))
|
||||
else:
|
||||
print(json.dumps({"_next": "ask_topic"}))
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
@@ -0,0 +1,76 @@
|
||||
#!/usr/bin/env python3
|
||||
"""Reflexion gate for deep-research.
|
||||
|
||||
Runs after `critique` has reviewed the current research findings. If the
|
||||
critique's verdict is REVISE and the reflexion budget is not spent,
|
||||
loops back to `research` with the critique attached as
|
||||
`research_feedback`, so the retry is informed rather than a blind
|
||||
re-run. Otherwise it proceeds to `synthesize`.
|
||||
|
||||
Routing (`_next`):
|
||||
- verdict PASS -> synthesize
|
||||
- verdict REVISE, budget remaining -> research_each_question (+ research_feedback)
|
||||
- verdict REVISE, budget spent -> synthesize
|
||||
|
||||
Reflexion is a best-effort quality booster, not a hard gate: once the
|
||||
budget is spent the workflow proceeds anyway, and the human approval
|
||||
step is the final backstop.
|
||||
"""
|
||||
import json
|
||||
import os
|
||||
import re
|
||||
|
||||
# Automated revision passes allowed. `research` runs at most
|
||||
# MAX_REFLEXION_REVISIONS + 1 times per user pass. Bump to allow more.
|
||||
MAX_REFLEXION_REVISIONS = 2
|
||||
|
||||
|
||||
def load_state():
|
||||
path = os.environ.get("GRAPH_STATE_FILE")
|
||||
if path:
|
||||
with open(path) as f:
|
||||
return json.load(f)
|
||||
return json.loads(os.environ.get("GRAPH_STATE", "{}"))
|
||||
|
||||
|
||||
def as_int(value, default=0):
|
||||
try:
|
||||
return int(value)
|
||||
except (TypeError, ValueError):
|
||||
return default
|
||||
|
||||
|
||||
def parse_verdict(critique):
|
||||
"""Pull PASS/REVISE from the critique's `VERDICT:` line. Defaults to
|
||||
PASS when no verdict line is found, so a malformed critique lets the
|
||||
workflow proceed instead of burning the whole revision budget."""
|
||||
match = re.search(r"VERDICT:\s*([A-Za-z]+)", critique, re.IGNORECASE)
|
||||
if not match:
|
||||
return "PASS"
|
||||
return match.group(1).upper()
|
||||
|
||||
|
||||
def main():
|
||||
state = load_state()
|
||||
critique = state.get("critique") or ""
|
||||
verdict = parse_verdict(critique)
|
||||
attempts = as_int(state.get("research_attempts"))
|
||||
|
||||
if verdict == "REVISE" and attempts < MAX_REFLEXION_REVISIONS:
|
||||
feedback = (
|
||||
"A reviewer judged the previous research pass incomplete. "
|
||||
"Address every point in the critique below:\n\n" + critique
|
||||
)
|
||||
output = {
|
||||
"_next": "research_each_question",
|
||||
"research_attempts": attempts + 1,
|
||||
"research_feedback": feedback,
|
||||
}
|
||||
else:
|
||||
output = {"_next": "synthesize"}
|
||||
|
||||
print(json.dumps(output))
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
@@ -0,0 +1,69 @@
|
||||
#!/usr/bin/env python3
|
||||
"""Check that the sources cited in the research report are reachable.
|
||||
|
||||
Scans the final report for URLs and DOIs, probes each with a HEAD
|
||||
request, and writes a `source_check` summary into state so the human
|
||||
reviewer sees broken citations at the approval step.
|
||||
|
||||
Times out per request so a slow source cannot stall the graph.
|
||||
"""
|
||||
import json
|
||||
import os
|
||||
import re
|
||||
import urllib.error
|
||||
import urllib.request
|
||||
|
||||
DOI_RE = re.compile(r"\b(10\.\d{4,9}/[-._;()/:A-Z0-9]+)", re.IGNORECASE)
|
||||
URL_RE = re.compile(r"https?://[^\s)\]\}\"'>]+")
|
||||
|
||||
|
||||
def load_state():
|
||||
path = os.environ.get("GRAPH_STATE_FILE")
|
||||
if path:
|
||||
with open(path) as f:
|
||||
return json.load(f)
|
||||
return json.loads(os.environ.get("GRAPH_STATE", "{}"))
|
||||
|
||||
|
||||
def reachable(url, timeout=5.0):
|
||||
req = urllib.request.Request(url, method="HEAD")
|
||||
try:
|
||||
with urllib.request.urlopen(req, timeout=timeout) as resp:
|
||||
return 200 <= resp.status < 400
|
||||
except urllib.error.HTTPError as e:
|
||||
return 200 <= e.code < 400
|
||||
except Exception:
|
||||
return False
|
||||
|
||||
|
||||
def main():
|
||||
state = load_state()
|
||||
report = state.get("report") or ""
|
||||
|
||||
urls = sorted({u.rstrip(".,;)") for u in URL_RE.findall(report)})
|
||||
dois = sorted(set(DOI_RE.findall(report)))
|
||||
|
||||
results = []
|
||||
for url in urls:
|
||||
ok = reachable(url)
|
||||
results.append(f" {'OK' if ok else 'UNREACHABLE'} {url}")
|
||||
for doi in dois:
|
||||
url = f"https://doi.org/{doi}"
|
||||
if url in urls:
|
||||
continue
|
||||
ok = reachable(url)
|
||||
results.append(f" {'OK' if ok else 'UNREACHABLE'} DOI {doi} ({url})")
|
||||
|
||||
if not results:
|
||||
summary = "No web sources were cited in the report."
|
||||
else:
|
||||
summary = (
|
||||
f"Source reachability ({len(results)} checked):\n"
|
||||
+ "\n".join(results)
|
||||
)
|
||||
|
||||
print(json.dumps({"source_check": summary}))
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
@@ -0,0 +1,39 @@
|
||||
#!/usr/bin/env bash
|
||||
|
||||
set -e
|
||||
|
||||
# @env LLM_OUTPUT=/dev/stdout The output path
|
||||
|
||||
# @cmd Classify the credibility tier of a web source from its URL.
|
||||
# A deterministic check based on the host and top-level domain. Use it
|
||||
# to weigh how much trust to place in a source before relying on it.
|
||||
# @option --url! The full source URL to classify
|
||||
classify_source() {
|
||||
# shellcheck disable=SC2154
|
||||
local url="$argc_url"
|
||||
local host="${url#*://}"
|
||||
host="${host%%/*}"
|
||||
host="${host##*@}"
|
||||
host="${host%%:*}"
|
||||
host="$(printf '%s' "$host" | tr '[:upper:]' '[:lower:]')"
|
||||
|
||||
local tier
|
||||
case "$host" in
|
||||
'')
|
||||
tier="UNKNOWN - no host could be parsed from the URL" ;;
|
||||
*.gov | *.gov.* | *.mil)
|
||||
tier="HIGH - government source" ;;
|
||||
*.edu | *.edu.* | *.ac.*)
|
||||
tier="HIGH - academic institution" ;;
|
||||
arxiv.org | *.arxiv.org | biorxiv.org | *.biorxiv.org | medrxiv.org | *.medrxiv.org | ssrn.com | *.ssrn.com)
|
||||
tier="PREPRINT - not yet peer reviewed, corroborate before citing" ;;
|
||||
wikipedia.org | *.wikipedia.org)
|
||||
tier="TERTIARY - encyclopedia, good for orientation not citation" ;;
|
||||
*.org | *.org.*)
|
||||
tier="MEDIUM - organization site, check for institutional bias" ;;
|
||||
*)
|
||||
tier="UNVERIFIED - general web source, corroborate before citing" ;;
|
||||
esac
|
||||
|
||||
printf '%s: %s\n' "${host:-<none>}" "$tier" >> "$LLM_OUTPUT"
|
||||
}
|
||||
@@ -0,0 +1,46 @@
|
||||
# report-writer
|
||||
|
||||
A tiny, focused sub-agent that turns a set of research findings into a
|
||||
single coherent final report. Reads only what it is given — does not
|
||||
do independent research, does not access the web, does not invent
|
||||
facts. It exists as a focused tool for orchestrating agents to
|
||||
delegate the writing phase to.
|
||||
|
||||
## Why a separate agent?
|
||||
|
||||
This is an example of the **agent-as-tool** pattern in graph agents.
|
||||
The `deep-research` graph agent's `synthesize` node is an `agent` node
|
||||
that spawns this one (see `assets/agents/deep-research/graph.yaml`).
|
||||
Separating the role has two practical benefits:
|
||||
|
||||
- The orchestrating agent can use a cheap model (or a high-temperature
|
||||
exploratory one) for the research phase, while letting the writing
|
||||
phase use a different (typically lower-temperature, possibly larger)
|
||||
model dedicated to coherent prose.
|
||||
- The writing prompt is owned by this agent's `config.yaml` rather
|
||||
than buried inside another agent's graph. You can polish it
|
||||
independently without touching the research flow.
|
||||
|
||||
## Standalone use
|
||||
|
||||
You can also use this agent directly if you have a set of findings you
|
||||
want polished:
|
||||
|
||||
```sh
|
||||
loki -a report-writer "Topic: X. Findings: <paste findings here>"
|
||||
```
|
||||
|
||||
It will produce a single Markdown report following the rules in its
|
||||
system prompt: executive summary at the top, grouped sections by
|
||||
related sub-questions, every inline citation preserved verbatim, and a
|
||||
final "Open questions / disagreements" section.
|
||||
|
||||
## What it will NOT do
|
||||
|
||||
- Search the web, fetch URLs, query an MCP server, or use any tool.
|
||||
It has no tools configured.
|
||||
- Invent facts beyond what is in the findings you give it.
|
||||
- Strip or rewrite citations.
|
||||
|
||||
These constraints are the point of the agent existing: a writer that
|
||||
the orchestrator can trust to stay in its lane.
|
||||
@@ -0,0 +1,34 @@
|
||||
name: report-writer
|
||||
description: Polishes research findings into a clear, citation-preserving final report
|
||||
version: 1.0.0
|
||||
temperature: 0.2
|
||||
|
||||
instructions: |
|
||||
You are a technical writer. You will be given:
|
||||
- a research topic
|
||||
- a set of findings, organized per sub-question, with inline
|
||||
citations next to each claim
|
||||
- a source-credibility assessment of the cited sources
|
||||
|
||||
Your job is to produce a single, well-organized final report:
|
||||
|
||||
Rules:
|
||||
- Use ONLY the findings provided. Do not introduce facts from
|
||||
your own memory. Do not speculate beyond what the findings
|
||||
support.
|
||||
- Preserve every inline citation. If a sentence in the findings
|
||||
had a URL or DOI, the equivalent sentence in your report must
|
||||
keep the same citation.
|
||||
- Lead with a 2-3 sentence executive summary at the top.
|
||||
- Organize the body so that related sub-questions are grouped,
|
||||
not strictly one section per question. The findings are raw
|
||||
material; the report should read as a single coherent answer
|
||||
to the original topic.
|
||||
- End with a short "Open questions / disagreements" section
|
||||
naming anything the findings flagged as unresolved or
|
||||
contested.
|
||||
|
||||
Output plain Markdown. No metadata, no JSON wrapper.
|
||||
|
||||
conversation_starters:
|
||||
- "Polish these findings into a cited report"
|
||||
Reference in New Issue
Block a user