feat: created new graph-based deep-research agent

2026-05-21 11:27:55 -06:00
parent 597f823bdf
commit d81d233527
13 changed files with 1037 additions and 0 deletions
@@ -0,0 +1,23 @@
+# Local knowledge corpus for deep-research
+
+The `knowledge_lookup` node in `graph.yaml` is a `rag` node that runs
+hybrid (vector + keyword) retrieval over every file in this directory.
+Drop your own notes, papers (PDFs), Markdown docs, or text files here
+and they will be indexed into a per-agent knowledge base on first run.
+
+Loki supports common file types out of the box: `.md`, `.txt`, `.pdf`,
+`.html`, and others. Subdirectories are walked recursively.
+
+A small starter file (`research-style-notes.md`) ships so the RAG
+node has something non-empty to retrieve against on a clean install.
+Replace or extend it with your own materials to bias the research
+phase toward your local context.
+
+To force the knowledge base to rebuild after you add or change files,
+delete the cached index:
+
+```sh
+rm ~/.config/loki/agents/deep-research/knowledge_lookup.yaml
+```
+
+The next run will rebuild from the current contents of this directory.
@@ -0,0 +1,49 @@
+# Research style notes
+
+These are general principles the `deep-research` agent should keep in
+mind regardless of topic. Replace this file with your own notes if you
+want to bias retrieval toward your local context.
+
+## What "good research" means here
+
+- **Every factual claim cites a source you actually retrieved.** Never
+  fabricate URLs, page titles, authors, or DOIs.
+- **Primary sources beat aggregators.** Prefer the original paper, the
+  RFC, the standards body, or the manufacturer over a blog summarizing
+  them.
+- **Corroboration matters where stakes are high.** If a single source
+  makes a strong claim, look for a second independent source before
+  taking it as established.
+- **Disagreement is information, not noise.** If two credible sources
+  disagree, report the disagreement and the reasoning on each side.
+- **Old does not mean wrong.** A 2014 RFC is still authoritative if no
+  newer one has obsoleted it; check before assuming a source is stale.
+
+## Source-tier heuristics
+
+The `vet_sources` node uses these rough tiers to weigh credibility.
+The custom tool `classify_source` (see `tools.sh`) implements this
+deterministically by hostname / TLD.
+
+- **HIGH:** government domains (`.gov`, `.mil`), academic institutions
+  (`.edu`, university subdomains), peer-reviewed journals, standards
+  bodies (IETF/RFCs, W3C, ISO, IEEE, NIST), and primary documents from
+  the entities being researched (e.g. a vendor's official spec page).
+- **PREPRINT:** arXiv, bioRxiv, medRxiv, SSRN. Useful but not yet
+  peer-reviewed; treat numeric claims with extra caution.
+- **ORGANIZATION:** established nonprofits, standards-adjacent groups,
+  industry consortia. Reliable for their stated mission but may have a
+  perspective.
+- **UNVERIFIED:** general web pages, blogs, news aggregators, social
+  media. Useful for leads but should not be the only source for a
+  factual claim.
+
+## Common pitfalls to flag in critique
+
+- A claim cited only to a PREPRINT or UNVERIFIED source on a numeric
+  or contested point.
+- A research-plan question that the findings address only obliquely.
+- "Findings" that paraphrase a single source three times rather than
+  triangulating.
+- Citation collisions where two sources are listed but turn out to
+  be the same study reported via different aggregators.