docs: documented the llm node skills policy in the graph.example.yaml

2026-06-02 13:58:59 -06:00
parent ed59f793fc
commit 307e2cfc50
6 changed files with 483 additions and 16 deletions
@@ -0,0 +1,61 @@
+# Librarian
+
+The "external grep" sibling of [Explore](../explore/README.md). Searches the web
+for authoritative external references (official docs, production OSS,
+specifications), fetches them, and synthesizes findings with inline citations.
+
+Designed to be delegated to by **[Sisyphus](../sisyphus/README.md)** — typically
+fanned out 1-3 in parallel alongside `explore` agents whenever an unfamiliar
+library, API, or framework is involved.
+
+## Workflow
+
+```
+search (llm + ddg-search)         identify 3-5 authoritative sources
+   ↓
+synthesize (llm + fetch_url_via_curl)   fetch, extract, cite, synthesize
+   ↓
+end_success / end_failure         LIBRARIAN_COMPLETE / LIBRARIAN_FAILED
+```
+
+Iteration 1 (this) is the happy-path MVP: single search pass, single synthesis
+pass, no quality-check loop. Future iterations may add:
+
+- `quality_check` LLM node + back-edge to `search` with a refined query if
+  the initial findings are thin or off-topic
+- `gh` CLI / GitHub MCP integration for first-class OSS-example retrieval
+- Reranking the search results before synthesis
+- Cache of recently-fetched URLs across invocations
+
+## Trigger phrases (when sisyphus should spawn it)
+
+- "How do I use [library]?"
+- "What's the best practice for [framework feature]?"
+- "Why does [external dependency] behave this way?"
+- "Find examples of [library] usage"
+- Any unfamiliar npm/pip/cargo/crate package surfaced by the user
+
+## Source priority
+
+1. Official documentation (docs.X.org, readthedocs.io, MDN, vendor docs)
+2. Production OSS examples (1000+ stars on GitHub)
+3. Specifications (RFCs, W3C, ECMA, IEEE)
+4. Credible secondary references — only when 1-3 are sparse
+
+Explicitly excluded: random blog posts, marketing pages, stale tutorials,
+"what is X" beginner articles (unless that is literally the user's question).
+
+## Outcomes
+
+- `LIBRARIAN_COMPLETE` — found and synthesized authoritative sources. Findings
+  include inline citations and verbatim snippets where references show
+  canonical patterns.
+- `LIBRARIAN_FAILED` — neither node could produce usable output (no usable
+  search results, or every URL failed to fetch).
+
+## Pro-Tip: Override search/fetch tooling
+
+The MVP uses `ddg-search` for search and `fetch_url_via_curl` for retrieval. If
+you have other tooling configured (Perplexity, Tavily, Jina) you can swap them
+in by editing the node's `tools:` whitelist. Higher-quality search/fetch
+generally produces higher-quality synthesis.
@@ -0,0 +1,380 @@
+name: librarian
+description: |
+  External-reference research agent. Triages the topic to extract hints,
+  fans out to doc search (ddg-search) and OSS search (personal-github MCP) in
+  parallel, synthesizes findings with citations, then trims narrative
+  preamble. The "external grep" sibling of explore (which handles
+  internal/codebase grep). Designed to be fanned out 1-3 in parallel by
+  sisyphus alongside explore when unfamiliar libraries/APIs/frameworks are
+  involved.
+
+  Iteration 3: smart triage node up front + final-format trim of LLM
+  narrative leakage.
+version: "1.0"
+
+global_tools:
+  - fetch_url_via_curl.sh
+
+mcp_servers:
+  - ddg-search
+  - personal-github
+
+skills_enabled: true
+enabled_skills:
+  - ai-slop-remover
+
+variables:
+  - name: project_dir
+    description: Project directory for context (unused in MVP but reserved for future iterations).
+    default: '.'
+
+settings:
+  max_loop_iterations: 12
+  log_state_snapshots: true
+  timeout: 600
+
+reducers:
+  output: overwrite
+
+initial_state:
+  language_ecosystem: "general"
+  doc_domain_hints: ""
+  refined_search_query: ""
+  question_type: "concept"
+  search_output: ""
+  oss_output: ""
+  findings: ""
+
+start: triage
+
+nodes:
+  triage:
+    id: triage
+    type: llm
+    description: Parse the research prompt to extract language, doc-domain hints, and a refined search query.
+    skills_enabled: true
+    enabled_skills:
+      - ai-slop-remover
+    instructions: |
+      You are a research triage specialist. Parse the user's research
+      prompt and extract structured hints downstream search nodes use to
+      target their queries.
+
+      Extract these four fields. Be terse - this is metadata, not prose.
+
+      - `language_ecosystem`: lowercase one-word language/ecosystem implied
+        by the prompt (e.g., "python", "rust", "typescript", "go", "java",
+        "css", "general"). Use "general" only if NO specific language is
+        identifiable.
+
+      - `doc_domain_hints`: comma-separated 1-3 authoritative documentation
+        domains the doc-search node should prioritize. Examples:
+          - python -> "docs.python.org,readthedocs.io"
+          - rust crate -> "docs.rs,doc.rust-lang.org"
+          - JS/CSS/web platform -> "developer.mozilla.org"
+          - tokio/axum/serde (rust) -> "docs.rs"
+          - django -> "docs.djangoproject.com"
+        Empty string if no obvious domain.
+
+      - `refined_search_query`: a clean, focused 3-8 word query that
+        captures the topic without the user's framing words. Examples:
+          "Find official docs for Python's pathlib API" -> "python pathlib API"
+          "How does axum's State extractor work?" -> "axum State extractor"
+          "Best practice for tokio mpsc channels" -> "tokio mpsc channel best practices"
+
+      - `question_type`: exactly one of:
+          - "api_reference" - looking up specific functions/signatures/types
+          - "best_practice" - "how should I", "what's the canonical way"
+          - "debugging" - "why does X happen", "fix Y"
+          - "concept" - explanations, comparisons, mental models
+    prompt: |
+      Research prompt: {{initial_prompt}}
+    tools: []
+    temperature: 0.1
+    output_schema:
+      type: object
+      properties:
+        language_ecosystem:
+          type: string
+          description: Lowercase language/ecosystem (e.g., "python", "rust", "general").
+        doc_domain_hints:
+          type: string
+          description: Comma-separated authoritative doc domains, or empty.
+        refined_search_query:
+          type: string
+          description: A 3-8 word focused search query.
+        question_type:
+          type: string
+          enum: [api_reference, best_practice, debugging, concept]
+          description: The kind of question being asked.
+      required: [language_ecosystem, doc_domain_hints, refined_search_query, question_type]
+    state_updates:
+      last_node_output: "{{output}}"
+    fallback: end_failure
+    next: [search, search_oss]
+
+  search:
+    id: search
+    type: llm
+    description: Identify 3-5 authoritative documentation sources via ddg-search.
+    skills_enabled: true
+    enabled_skills:
+      - ai-slop-remover
+    instructions: |
+      You are a research librarian's documentation specialist. Your only
+      job: use the ddg-search MCP tool to identify 3-5 authoritative
+      documentation sources for the research topic.
+
+      Priority order:
+        1. Official documentation - PRIORITIZE the hinted doc domains when
+           provided, then docs.X.org / readthedocs.io / MDN / vendor docs
+        2. Specifications (RFCs, W3C, ECMA, IEEE)
+        3. Credible secondary references (PEPs, official blog posts) - only
+           if 1-2 are sparse
+
+      Do NOT include:
+        - GitHub repos or code links (those come from the parallel OSS search)
+        - Random personal blog posts
+        - "What is X" beginner articles unless that is literally the topic
+        - Marketing/landing pages without technical content
+        - Pages older than ~2 years if the topic is a current technology
+
+      ## Search budget and fail-fast rules
+
+      You have a HARD BUDGET of 3 search calls total. After 3 calls, stop
+      calling tools and produce your final answer with whatever you have.
+
+      If a search returns "HTTP 202 Accepted", empty results, error messages,
+      or rate-limit warnings: that counts as a used call. Do not retry the
+      same query - either rephrase OR give up.
+
+      If after 3 calls you have NO usable URLs, output exactly:
+
+        NO_AUTHORITATIVE_SOURCES_FOUND
+        Reason: <one line>
+
+      and STOP.
+
+      ## Output format on success
+
+      Plain text, one block per source. Your response MUST start with the
+      first `URL:` line - NO introductory text.
+
+        URL: <full url>
+        Title: <short title>
+        Why authoritative: <one-line justification>
+
+        URL: <full url>
+        ...
+
+      Output 3-5 source blocks. No prose intro, no closing summary.
+    prompt: |
+      Research topic: {{initial_prompt}}
+
+      Triage hints:
+        - Language/ecosystem: {{language_ecosystem}}
+        - Doc domains to prioritize: {{doc_domain_hints}}
+        - Refined query: {{refined_search_query}}
+        - Question type: {{question_type}}
+
+      Use the ddg-search tool. Prioritize the hinted doc domains when present
+      (e.g., search with `site:docs.python.org pathlib` style queries).
+    tools:
+      - mcp:ddg-search
+    max_iterations: 15
+    temperature: 0.1
+    state_updates:
+      search_output: "{{output}}"
+    fallback: synthesize
+    next: synthesize
+
+  search_oss:
+    id: search_oss
+    type: llm
+    description: Find 2-3 production OSS examples relevant to the topic via the personal-github MCP.
+    skills_enabled: true
+    enabled_skills:
+      - ai-slop-remover
+    instructions: |
+      You are a research librarian's OSS specialist. Your only job: use the
+      personal-github MCP tools to find 2-3 PRODUCTION OSS code examples
+      (1000+ stars, not tutorials/demos) that demonstrate the research topic
+      in real-world usage.
+
+      Workflow:
+        1. Use the personal-github MCP discovery tools
+           (mcp_search_personal-github, mcp_describe_personal-github,
+           mcp_invoke_personal-github) to find the right tool for code/repo
+           search. Typical names: search_repositories, search_code,
+           get_file_contents.
+        2. Filter by language using the triage's language_ecosystem hint
+           when the search API supports it.
+        3. Search for repos with high star counts that use the feature in
+           question.
+        4. For each candidate: confirm it is a production codebase, not a
+           tutorial repo, learning project, or skeleton template.
+        5. Output 2-3 OSS source blocks.
+
+      ## Search budget and fail-fast rules
+
+      HARD BUDGET: 8 tool calls total. After 8 calls, stop and output what
+      you have - even one or two examples is fine.
+
+      If you find no production examples, output exactly:
+
+        NO_OSS_EXAMPLES_FOUND
+        Reason: <one line>
+
+      and STOP.
+
+      ## Output format on success
+
+      Plain text, one block per OSS source. Your response MUST start with
+      the first `REPO:` line - NO introductory text.
+
+        REPO: owner/name (stars: <count>)
+        URL: https://github.com/owner/name/blob/<ref>/<path>
+        Why this is a good example: <one line - what real-world pattern it shows>
+
+        REPO: ...
+
+      Output 2-3 blocks. The URL should point to a specific file that
+      demonstrates the pattern (not just the repo root) when possible.
+    prompt: |
+      Research topic: {{initial_prompt}}
+
+      Triage hints:
+        - Language/ecosystem: {{language_ecosystem}}
+        - Refined query: {{refined_search_query}}
+        - Question type: {{question_type}}
+
+      Use the personal-github MCP to find 2-3 production OSS examples.
+      Filter to {{language_ecosystem}} repositories when the API allows.
+    tools:
+      - mcp:personal-github
+    max_iterations: 15
+    temperature: 0.1
+    state_updates:
+      oss_output: "{{output}}"
+    fallback: synthesize
+    next: synthesize
+
+  synthesize:
+    id: synthesize
+    type: llm
+    description: Fetch sources from both branches, extract relevant signal, synthesize findings with citations.
+    skills_enabled: true
+    enabled_skills:
+      - ai-slop-remover
+    instructions: |
+      You are a research librarian's synthesis specialist. You receive two
+      source lists - documentation URLs and OSS code URLs - fetch each, read
+      the content, and produce a tight, citation-backed synthesis the
+      orchestrator can hand directly to a coder.
+
+      ## Short-circuit cases
+
+      If BOTH search_output starts with `NO_AUTHORITATIVE_SOURCES_FOUND` AND
+      oss_output starts with `NO_OSS_EXAMPLES_FOUND`, do NOT call any tools.
+      Output exactly:
+
+        ## Findings
+        No findings - both search branches found no usable sources.
+
+        ## Sources used
+        (none)
+
+        ## Sources skipped
+        (none - both searches returned no candidates)
+
+      and STOP.
+
+      If only one branch failed: proceed with the other, note the failure
+      under Sources skipped at the end.
+
+      ## Normal process
+
+        1. Call `fetch_url_via_curl --url <URL>` for each URL in BOTH
+           search_output and oss_output.
+        2. For each fetched page: extract only the parts relevant to the
+           research topic. Skip nav, ads, comments, "see also" sections,
+           changelogs unless asked.
+        3. Synthesize findings: official API/syntax from docs, real-world
+           usage patterns from OSS examples, known pitfalls. Paste actual
+           code/config snippets from the references verbatim when they show
+           the canonical pattern.
+        4. Cite sources inline by URL so the orchestrator can verify.
+        5. If a URL is dead, returns garbage, or is off-topic, note it
+           under "Sources skipped" at the end and move on. Do not retry.
+
+      Budget: max 8 fetches total (across both source lists). Skip
+      aggressively.
+
+      ## Output format
+
+      Plain text in this structure. Your response MUST start with the
+      `## Findings` heading - NO introductory text.
+
+        ## Findings
+        <terse, dense, citation-backed synthesis. Separate concerns:
+        official API/syntax first (from docs), then real-world patterns
+        (from OSS), then known pitfalls. Verbatim code snippets where
+        references show the canonical pattern.>
+
+        ## Sources used
+        - <url 1>
+        - <url 2>
+
+        ## Sources skipped
+        - <url>: <one-line reason>
+
+      No flattery, no preamble. Start with `## Findings`.
+    prompt: |
+      Research topic: {{initial_prompt}}
+
+      Documentation sources (from doc search branch):
+      {{search_output}}
+
+      OSS examples (from github search branch):
+      {{oss_output}}
+    tools:
+      - fetch_url_via_curl
+    max_iterations: 20
+    temperature: 0.1
+    state_updates:
+      findings: "{{output}}"
+    fallback: final_format
+    next: final_format
+
+  final_format:
+    id: final_format
+    type: script
+    description: Trim any LLM narrative preamble from findings - keep only from the first ## Findings heading onward.
+    script: scripts/final_format.sh
+    timeout: 5
+    fallback: end_success
+
+  end_success:
+    id: end_success
+    type: end
+    output: |
+      LIBRARIAN_COMPLETE
+      Topic: {{initial_prompt}}
+
+      {{findings}}
+
+  end_failure:
+    id: end_failure
+    type: end
+    output: |
+      LIBRARIAN_FAILED
+      Topic: {{initial_prompt}}
+
+      Doc search output:
+      {{search_output}}
+
+      OSS search output:
+      {{oss_output}}
+
+      Findings (partial):
+      {{findings}}
@@ -0,0 +1,3 @@
+#!/usr/bin/env bash
+set -euo pipefail
+echo '{}'
@@ -0,0 +1,25 @@
+#!/usr/bin/env bash
+set -euo pipefail
+
+if [[ -n "${GRAPH_STATE_FILE:-}" ]]; then
+  state=$(cat "$GRAPH_STATE_FILE")
+elif [[ -n "${GRAPH_STATE:-}" ]]; then
+  state="$GRAPH_STATE"
+else
+  state='{}'
+fi
+
+findings=$(echo "$state" | jq -r '.findings // ""')
+
+trimmed=$(echo "$findings" | awk '/^##+ [Ff]indings/{found=1} found{print}')
+
+if [[ -z "$trimmed" ]]; then
+  trimmed="$findings"
+fi
+
+jq -nc \
+  --arg f "$trimmed" \
+  '{
+    "findings": $f,
+    "_next": "end_success"
+  }'
@@ -119,10 +119,19 @@ instructions: |

  | Agent | Use For | Characteristics |
  |-------|---------|-----------------|
-  | `explore` | Find patterns, understand code, search | Read-only, returns findings, fan out 2-5 in parallel |
-  | `coder` | Write/edit files, implement features | Graph agent: plan → approval → implement → verify build+tests → bounded fix-loop |
+  | `explore` | Find patterns in THIS codebase, understand local code | Read-only, returns findings, fan out 2-5 in parallel |
+  | `librarian` | Find official docs, OSS examples, web best practices for EXTERNAL libraries | Read-only, returns citation-backed findings, fan out 1-3 in parallel |
+  | `coder` | Write/edit files, implement features | Graph agent: plan → approval → implement → verify build+tests → self_review → bounded fix-loop |
  | `oracle` | Architecture, complex debugging, review | Advisory, blocking — never answer the user before collecting Oracle results |

+  ### When to fire `librarian` (external grep) vs `explore` (internal grep)
+
+  - User mentions an unfamiliar npm/pip/cargo/crate package → fire `librarian` for official docs
+  - User asks "how do I use library X" → fire `librarian` + `explore` in parallel ("how does our code use X?" + "what do the docs say?")
+  - User asks "why does library X behave Y way" → `librarian` for the official spec
+  - User wants production patterns for framework Z → `librarian` for OSS examples
+  - All internal questions → `explore` only
+
  ### Coder delegation format (MANDATORY)

  Load `delegation-protocol` skill first. Then use this template — the coder has NOT seen the codebase, your prompt IS its entire context: