Compare commits
2 Commits
| Author | SHA1 | Date | |
|---|---|---|---|
|
9bab6a0c2d
|
|||
| ff3419a714 |
@@ -0,0 +1,416 @@
|
||||
# Sisyphus in LangChain/LangGraph
|
||||
|
||||
A faithful recreation of [Loki's Sisyphus agent](../../assets/agents/sisyphus/) using [LangGraph](https://docs.langchain.com/langgraph/) — LangChain's framework for stateful, multi-agent workflows.
|
||||
|
||||
This project exists to help you understand LangChain/LangGraph by mapping every concept to its Loki equivalent.
|
||||
|
||||
## Architecture Overview
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────┐
|
||||
│ SUPERVISOR NODE │
|
||||
│ Intent classification → Routing decision → Command(goto=) │
|
||||
│ │
|
||||
│ Loki equivalent: sisyphus/config.yaml │
|
||||
│ (agent__spawn → Command, agent__collect → graph edge) │
|
||||
└──────────┬──────────────┬──────────────┬────────────────────┘
|
||||
│ │ │
|
||||
▼ ▼ ▼
|
||||
┌────────────┐ ┌────────────┐ ┌────────────┐
|
||||
│ EXPLORE │ │ ORACLE │ │ CODER │
|
||||
│ (research) │ │ (advise) │ │ (build) │
|
||||
│ │ │ │ │ │
|
||||
│ read-only │ │ read-only │ │ read+write │
|
||||
│ tools │ │ tools │ │ tools │
|
||||
└─────┬──────┘ └─────┬──────┘ └─────┬──────┘
|
||||
│ │ │
|
||||
└──────────────┼──────────────┘
|
||||
│
|
||||
back to supervisor
|
||||
```
|
||||
|
||||
## Concept Map: Loki → LangGraph
|
||||
|
||||
This is the key reference. Every row maps a Loki concept to its LangGraph equivalent.
|
||||
|
||||
### Core Architecture
|
||||
|
||||
| Loki Concept | LangGraph Equivalent | Where in Code |
|
||||
|---|---|---|
|
||||
| Agent config (config.yaml) | Node function + system prompt | `agents/explore.py`, etc. |
|
||||
| Agent instructions | System prompt string | `EXPLORE_SYSTEM_PROMPT`, etc. |
|
||||
| Agent tools (tools.sh) | `@tool`-decorated Python functions | `tools/filesystem.py`, `tools/project.py` |
|
||||
| Agent session (chat loop) | Graph state + message list | `state.py` → `SisyphusState.messages` |
|
||||
| `agent__spawn --agent X` | `Command(goto="X")` | `agents/supervisor.py` |
|
||||
| `agent__collect --id` | Graph edge (implicit — workers return to supervisor) | `graph.py` → `add_edge("explore", "supervisor")` |
|
||||
| `agent__check` (non-blocking) | Not needed (graph handles scheduling) | — |
|
||||
| `agent__cancel` | Not needed (graph handles lifecycle) | — |
|
||||
| `can_spawn_agents: true` | Node has routing logic (supervisor) | `agents/supervisor.py` |
|
||||
| `max_concurrent_agents: 4` | `Send()` API for parallel fan-out | See [Parallel Execution](#parallel-execution) |
|
||||
| `max_agent_depth: 3` | `recursion_limit` in config | `cli.py` → `recursion_limit: 50` |
|
||||
| `summarization_threshold` | Manual truncation in supervisor | `supervisor.py` → `_summarize_outputs()` |
|
||||
|
||||
### Tool System
|
||||
|
||||
| Loki Concept | LangGraph Equivalent | Notes |
|
||||
|---|---|---|
|
||||
| `tools.sh` with `@cmd` annotations | `@tool` decorator | Loki compiles bash annotations to JSON schema; LangChain generates schema from the Python function signature + docstring |
|
||||
| `@option --pattern!` (required arg) | Function parameter without default | `def search_content(pattern: str)` |
|
||||
| `@option --lines` (optional arg) | Parameter with default | `def read_file(path: str, limit: int = 200)` |
|
||||
| `@env LLM_OUTPUT=/dev/stdout` | Return value | LangChain tools return strings; Loki tools write to `$LLM_OUTPUT` |
|
||||
| `@describe` | Docstring | The tool's docstring becomes the description the LLM sees |
|
||||
| Global tools (`fs_read.sh`, etc.) | Shared tool imports | Both agents import from `tools/filesystem.py` |
|
||||
| Agent-specific tools | Per-node tool binding | `llm.bind_tools(EXPLORE_TOOLS)` vs `llm.bind_tools(CODER_TOOLS)` |
|
||||
| `.shared/utils.sh` | `tools/project.py` | Shared project detection utilities |
|
||||
| `detect_project()` heuristic | `detect_project()` in Python | Same logic: check Cargo.toml → go.mod → package.json → etc. |
|
||||
| LLM fallback for unknown projects | (omitted) | The agents themselves can reason about unknown project types |
|
||||
|
||||
### State & Memory
|
||||
|
||||
| Loki Concept | LangGraph Equivalent | Notes |
|
||||
|---|---|---|
|
||||
| Agent session (conversation history) | `SisyphusState.messages` | `Annotated[list, add_messages]` — the reducer appends instead of replacing |
|
||||
| `agent_session: temp` | `MemorySaver` checkpointer | Loki's temp sessions are ephemeral; MemorySaver is in-memory (lost on restart) |
|
||||
| Per-agent isolation | Per-node system prompt + tools | In Loki agents have separate sessions; in LangGraph they share messages but have different system prompts |
|
||||
| `{{project_dir}}` variable | `SisyphusState.project_dir` | Loki interpolates variables into prompts; LangGraph stores them in state |
|
||||
| `{{__tools__}}` injection | `llm.bind_tools()` | Loki injects tool descriptions into the prompt; LangChain attaches them to the API call |
|
||||
|
||||
### Orchestration
|
||||
|
||||
| Loki Concept | LangGraph Equivalent | Notes |
|
||||
|---|---|---|
|
||||
| Intent classification table | `RoutingDecision` structured output | Loki does this in free text; LangGraph forces typed JSON |
|
||||
| Oracle triggers ("How should I...") | Supervisor prompt + structured output | Same trigger phrases, enforced via system prompt |
|
||||
| Coder delegation format | Supervisor builds HumanMessage | The structured prompt (Goal/Reference Files/Conventions/Constraints) |
|
||||
| `agent__spawn` (parallel) | `Send()` API | Dynamic fan-out to multiple nodes |
|
||||
| Todo system (`todo__init`, etc.) | `SisyphusState.todos` | State field with a merge reducer |
|
||||
| `auto_continue: true` | Supervisor loop (iteration counter) | Supervisor re-routes until FINISH or max iterations |
|
||||
| `max_auto_continues: 25` | `MAX_ITERATIONS = 15` | Safety valve to prevent infinite loops |
|
||||
| `user__ask` / `user__confirm` | `interrupt()` API | Pauses graph, surfaces question to caller, resumes with answer |
|
||||
| Escalation (child → parent → user) | `interrupt()` in any node | Any node can pause; the caller handles the interaction |
|
||||
|
||||
### Execution Model
|
||||
|
||||
| Loki Concept | LangGraph Equivalent | Notes |
|
||||
|---|---|---|
|
||||
| `loki --agent sisyphus` | `python -m sisyphus_langchain.cli` | CLI entry point |
|
||||
| REPL mode | `cli.py` → `repl()` | Interactive loop with thread persistence |
|
||||
| One-shot mode | `cli.py` → `run_query()` | Single query, print result, exit |
|
||||
| Streaming output | `graph.stream()` | LangGraph supports per-node streaming |
|
||||
| `inject_spawn_instructions` | (always on) | System prompts are always included |
|
||||
| `inject_todo_instructions` | (always on) | Todo instructions could be added to prompts |
|
||||
|
||||
## How the Execution Flow Works
|
||||
|
||||
### 1. User sends a message
|
||||
|
||||
```python
|
||||
graph.invoke({"messages": [HumanMessage("Add a health check endpoint")]})
|
||||
```
|
||||
|
||||
### 2. Supervisor classifies intent
|
||||
|
||||
The supervisor LLM reads the message and produces a `RoutingDecision`:
|
||||
```json
|
||||
{
|
||||
"intent": "implementation",
|
||||
"next_agent": "explore",
|
||||
"delegation_notes": "Find existing API endpoint patterns, route structure, and health check conventions"
|
||||
}
|
||||
```
|
||||
|
||||
### 3. Supervisor routes via Command
|
||||
|
||||
```python
|
||||
return Command(goto="explore", update={"intent": "implementation", "iteration_count": 1})
|
||||
```
|
||||
|
||||
### 4. Explore agent runs
|
||||
|
||||
- Receives the full message history (including the user's request)
|
||||
- Calls read-only tools (search_content, search_files, read_file)
|
||||
- Returns findings in messages
|
||||
|
||||
### 5. Control returns to supervisor
|
||||
|
||||
The graph edge `explore → supervisor` fires automatically.
|
||||
|
||||
### 6. Supervisor reviews and routes again
|
||||
|
||||
Now it has explore's findings. It routes to coder with context:
|
||||
```json
|
||||
{
|
||||
"intent": "implementation",
|
||||
"next_agent": "coder",
|
||||
"delegation_notes": "Implement health check endpoint following patterns found in src/routes/"
|
||||
}
|
||||
```
|
||||
|
||||
### 7. Coder implements
|
||||
|
||||
- Reads explore's findings from the message history
|
||||
- Writes files via `write_file` tool
|
||||
- Runs `verify_build` to check compilation
|
||||
|
||||
### 8. Supervisor verifies and finishes
|
||||
|
||||
```json
|
||||
{
|
||||
"intent": "implementation",
|
||||
"next_agent": "FINISH",
|
||||
"delegation_notes": "Added /health endpoint in src/routes/health.py. Build passes."
|
||||
}
|
||||
```
|
||||
|
||||
## Key Differences from Loki
|
||||
|
||||
### What LangGraph does better
|
||||
|
||||
1. **Declarative graph** — The topology is visible and debuggable. Loki's orchestration is emergent from the LLM's tool calls.
|
||||
2. **Typed state** — `SisyphusState` is a TypedDict with reducers. Loki's state is implicit in the conversation.
|
||||
3. **Checkpointing** — Built-in persistence. Loki manages sessions manually.
|
||||
4. **Time-travel debugging** — Inspect any checkpoint. Loki has no equivalent.
|
||||
5. **Structured routing** — `RoutingDecision` forces valid JSON. Loki relies on the LLM calling the right tool.
|
||||
|
||||
### What Loki does better
|
||||
|
||||
1. **True parallelism** — `agent__spawn` runs multiple agents concurrently in separate threads. This LangGraph implementation is sequential (see [Parallel Execution](#parallel-execution) for how to add it).
|
||||
2. **Agent isolation** — Each Loki agent has its own session, tools, and config. LangGraph nodes share state.
|
||||
3. **Teammate messaging** — Loki agents can send messages to siblings. LangGraph nodes communicate only through shared state.
|
||||
4. **Dynamic tool compilation** — Loki compiles bash/python/typescript tools at startup. LangChain tools are statically defined.
|
||||
5. **Escalation protocol** — Loki's child-to-parent escalation is sophisticated. LangGraph's `interrupt()` is simpler but less structured.
|
||||
6. **Task queues with dependencies** — Loki's `agent__task_create` supports dependency DAGs. LangGraph's routing is simpler (hub-and-spoke).
|
||||
|
||||
## Running It
|
||||
|
||||
### Prerequisites
|
||||
|
||||
```bash
|
||||
# Python 3.11+
|
||||
python --version
|
||||
|
||||
# Set your API key
|
||||
export OPENAI_API_KEY="sk-..."
|
||||
```
|
||||
|
||||
### Install
|
||||
|
||||
```bash
|
||||
cd examples/langchain-sisyphus
|
||||
|
||||
# With pip
|
||||
pip install -e .
|
||||
|
||||
# Or with uv (recommended)
|
||||
uv pip install -e .
|
||||
```
|
||||
|
||||
### Usage
|
||||
|
||||
```bash
|
||||
# Interactive REPL (like `loki --agent sisyphus`)
|
||||
sisyphus
|
||||
|
||||
# One-shot query
|
||||
sisyphus "Find all TODO comments in the codebase"
|
||||
|
||||
# With custom models (cost optimization)
|
||||
sisyphus --explore-model gpt-4o-mini --coder-model gpt-4o "Add input validation to the API"
|
||||
|
||||
# Programmatic usage
|
||||
python -c "
|
||||
from sisyphus_langchain import build_graph
|
||||
from langchain_core.messages import HumanMessage
|
||||
|
||||
graph = build_graph()
|
||||
result = graph.invoke({
|
||||
'messages': [HumanMessage('What patterns does this codebase use?')],
|
||||
'intent': 'ambiguous',
|
||||
'next_agent': '',
|
||||
'iteration_count': 0,
|
||||
'todos': [],
|
||||
'agent_outputs': {},
|
||||
'final_output': '',
|
||||
'project_dir': '.',
|
||||
}, config={'configurable': {'thread_id': 'demo'}, 'recursion_limit': 50})
|
||||
print(result['final_output'])
|
||||
"
|
||||
```
|
||||
|
||||
### Using Anthropic Models
|
||||
|
||||
Replace `ChatOpenAI` with `ChatAnthropic` in the agent factories:
|
||||
|
||||
```python
|
||||
from langchain_anthropic import ChatAnthropic
|
||||
|
||||
# In agents/oracle.py:
|
||||
llm = ChatAnthropic(model="claude-sonnet-4-20250514", temperature=0.2).bind_tools(ORACLE_TOOLS)
|
||||
```
|
||||
|
||||
## Deployment
|
||||
|
||||
### Option 1: Standalone Script (Simplest)
|
||||
|
||||
Just run the CLI directly. No infrastructure needed.
|
||||
|
||||
```bash
|
||||
sisyphus "Add a health check endpoint"
|
||||
```
|
||||
|
||||
### Option 2: FastAPI Server
|
||||
|
||||
```python
|
||||
# server.py
|
||||
from fastapi import FastAPI
|
||||
from langserve import add_routes
|
||||
from sisyphus_langchain import build_graph
|
||||
|
||||
app = FastAPI(title="Sisyphus API")
|
||||
graph = build_graph()
|
||||
add_routes(app, graph, path="/agent")
|
||||
|
||||
# Run: uvicorn server:app --host 0.0.0.0 --port 8000
|
||||
# Call: POST http://localhost:8000/agent/invoke
|
||||
```
|
||||
|
||||
### Option 3: LangGraph Platform (Production)
|
||||
|
||||
Create a `langgraph.json` at the project root:
|
||||
|
||||
```json
|
||||
{
|
||||
"graphs": {
|
||||
"sisyphus": "./sisyphus_langchain/graph.py:build_graph"
|
||||
},
|
||||
"dependencies": ["./sisyphus_langchain"],
|
||||
"env": ".env"
|
||||
}
|
||||
```
|
||||
|
||||
Then deploy:
|
||||
```bash
|
||||
pip install langgraph-cli
|
||||
langgraph deploy
|
||||
```
|
||||
|
||||
This gives you:
|
||||
- Durable checkpointing (PostgreSQL)
|
||||
- Background runs
|
||||
- Streaming API
|
||||
- Zero-downtime deployments
|
||||
- Built-in observability
|
||||
|
||||
### Option 4: Docker
|
||||
|
||||
```dockerfile
|
||||
FROM python:3.12-slim
|
||||
WORKDIR /app
|
||||
COPY . .
|
||||
RUN pip install -e .
|
||||
CMD ["sisyphus"]
|
||||
```
|
||||
|
||||
```bash
|
||||
docker build -t sisyphus .
|
||||
docker run -it -e OPENAI_API_KEY=$OPENAI_API_KEY sisyphus
|
||||
```
|
||||
|
||||
## Parallel Execution
|
||||
|
||||
This implementation routes sequentially for simplicity. To add Loki-style parallel agent execution, use LangGraph's `Send()` API:
|
||||
|
||||
```python
|
||||
from langgraph.types import Send
|
||||
|
||||
def supervisor_node(state):
|
||||
# Fan out to multiple explore agents in parallel
|
||||
# (like Loki's agent__spawn called multiple times)
|
||||
return [
|
||||
Send("explore", {
|
||||
**state,
|
||||
"messages": state["messages"] + [
|
||||
HumanMessage("Find existing API endpoint patterns")
|
||||
],
|
||||
}),
|
||||
Send("explore", {
|
||||
**state,
|
||||
"messages": state["messages"] + [
|
||||
HumanMessage("Find data models and database patterns")
|
||||
],
|
||||
}),
|
||||
]
|
||||
```
|
||||
|
||||
This is equivalent to Loki's pattern of spawning multiple explore agents:
|
||||
```
|
||||
agent__spawn --agent explore --prompt "Find API patterns"
|
||||
agent__spawn --agent explore --prompt "Find database patterns"
|
||||
agent__collect --id <id1>
|
||||
agent__collect --id <id2>
|
||||
```
|
||||
|
||||
## Adding Human-in-the-Loop
|
||||
|
||||
To replicate Loki's `user__ask` / `user__confirm` tools, use LangGraph's `interrupt()`:
|
||||
|
||||
```python
|
||||
from langgraph.types import interrupt
|
||||
|
||||
def supervisor_node(state):
|
||||
# Pause and ask the user (like Loki's user__ask)
|
||||
answer = interrupt({
|
||||
"question": "How should we structure the authentication?",
|
||||
"options": [
|
||||
"JWT with httpOnly cookies (Recommended)",
|
||||
"Session-based with Redis",
|
||||
"OAuth2 with external provider",
|
||||
],
|
||||
})
|
||||
# `answer` contains the user's selection when the graph resumes
|
||||
```
|
||||
|
||||
## Project Structure
|
||||
|
||||
```
|
||||
examples/langchain-sisyphus/
|
||||
├── pyproject.toml # Dependencies & build config
|
||||
├── README.md # This file
|
||||
└── sisyphus_langchain/
|
||||
├── __init__.py # Package entry point
|
||||
├── cli.py # CLI (REPL + one-shot mode)
|
||||
├── graph.py # Graph assembly (wires nodes + edges)
|
||||
├── state.py # Shared state schema (TypedDict)
|
||||
├── agents/
|
||||
│ ├── __init__.py
|
||||
│ ├── supervisor.py # Sisyphus orchestrator (intent → routing)
|
||||
│ ├── explore.py # Read-only codebase researcher
|
||||
│ ├── oracle.py # Architecture/debugging advisor
|
||||
│ └── coder.py # Implementation worker
|
||||
└── tools/
|
||||
├── __init__.py
|
||||
├── filesystem.py # File read/write/search/glob tools
|
||||
└── project.py # Project detection, build, test tools
|
||||
```
|
||||
|
||||
### File-to-Loki Mapping
|
||||
|
||||
| This Project | Loki Equivalent |
|
||||
|---|---|
|
||||
| `state.py` | Session context + todo state (implicit in Loki) |
|
||||
| `graph.py` | `src/supervisor/mod.rs` (runtime orchestration) |
|
||||
| `cli.py` | `src/main.rs` (CLI entry point) |
|
||||
| `agents/supervisor.py` | `assets/agents/sisyphus/config.yaml` |
|
||||
| `agents/explore.py` | `assets/agents/explore/config.yaml` + `tools.sh` |
|
||||
| `agents/oracle.py` | `assets/agents/oracle/config.yaml` + `tools.sh` |
|
||||
| `agents/coder.py` | `assets/agents/coder/config.yaml` + `tools.sh` |
|
||||
| `tools/filesystem.py` | `assets/functions/tools/fs_*.sh` |
|
||||
| `tools/project.py` | `assets/agents/.shared/utils.sh` + `sisyphus/tools.sh` |
|
||||
|
||||
## Further Reading
|
||||
|
||||
- [LangGraph Documentation](https://docs.langchain.com/langgraph/)
|
||||
- [LangGraph Multi-Agent Tutorial](https://docs.langchain.com/langgraph/how-tos/multi-agent-systems)
|
||||
- [Loki Agents Documentation](../../docs/AGENTS.md)
|
||||
- [Loki Sisyphus README](../../assets/agents/sisyphus/README.md)
|
||||
- [LangGraph Supervisor Library](https://github.com/langchain-ai/langgraph-supervisor-py)
|
||||
@@ -0,0 +1,29 @@
|
||||
[project]
|
||||
name = "sisyphus-langchain"
|
||||
version = "0.1.0"
|
||||
description = "Loki's Sisyphus multi-agent orchestrator recreated in LangChain/LangGraph"
|
||||
readme = "README.md"
|
||||
requires-python = ">=3.11"
|
||||
dependencies = [
|
||||
"langgraph>=0.3.0",
|
||||
"langchain>=0.3.0",
|
||||
"langchain-openai>=0.3.0",
|
||||
"langchain-anthropic>=0.3.0",
|
||||
"langchain-core>=0.3.0",
|
||||
]
|
||||
|
||||
[project.optional-dependencies]
|
||||
dev = [
|
||||
"pytest>=8.0",
|
||||
"ruff>=0.8.0",
|
||||
]
|
||||
server = [
|
||||
"langgraph-api>=0.1.0",
|
||||
]
|
||||
|
||||
[project.scripts]
|
||||
sisyphus = "sisyphus_langchain.cli:main"
|
||||
|
||||
[build-system]
|
||||
requires = ["hatchling"]
|
||||
build-backend = "hatchling.build"
|
||||
@@ -0,0 +1,5 @@
|
||||
"""Sisyphus multi-agent orchestrator — a LangGraph recreation of Loki's Sisyphus agent."""
|
||||
|
||||
from sisyphus_langchain.graph import build_graph
|
||||
|
||||
__all__ = ["build_graph"]
|
||||
@@ -0,0 +1 @@
|
||||
"""Agent node definitions for the Sisyphus orchestrator."""
|
||||
@@ -0,0 +1,145 @@
|
||||
"""
|
||||
Coder agent node — the implementation worker.
|
||||
|
||||
Loki equivalent: assets/agents/coder/config.yaml + tools.sh
|
||||
|
||||
In Loki, the coder is the ONLY agent that modifies files. It:
|
||||
- Receives a structured prompt from sisyphus with code patterns to follow
|
||||
- Writes files via the write_file tool (never pastes code in chat)
|
||||
- Verifies builds after every change
|
||||
- Signals CODER_COMPLETE or CODER_FAILED
|
||||
|
||||
In LangGraph, coder is a node with write-capable tools (read_file, write_file,
|
||||
search_content, execute_command, verify_build). The supervisor formats a
|
||||
structured delegation prompt (Goal / Reference Files / Code Patterns /
|
||||
Conventions / Constraints) and routes to this node.
|
||||
|
||||
Key Loki→LangGraph mapping:
|
||||
- Loki's "Coder Delegation Format" → the supervisor builds this as a
|
||||
HumanMessage before routing to the coder node.
|
||||
- Loki's auto_continue (up to 15) → the supervisor can re-route to coder
|
||||
if verification fails, up to iteration_count limits.
|
||||
- Loki's todo system for multi-file changes → the coder updates
|
||||
state["todos"] as it completes each file.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
from langchain_core.messages import SystemMessage
|
||||
from langchain_openai import ChatOpenAI
|
||||
|
||||
from sisyphus_langchain.state import SisyphusState
|
||||
from sisyphus_langchain.tools.filesystem import (
|
||||
read_file,
|
||||
search_content,
|
||||
search_files,
|
||||
write_file,
|
||||
)
|
||||
from sisyphus_langchain.tools.project import (
|
||||
execute_command,
|
||||
run_tests,
|
||||
verify_build,
|
||||
)
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# System prompt — faithfully mirrors coder/config.yaml
|
||||
# ---------------------------------------------------------------------------
|
||||
CODER_SYSTEM_PROMPT = """\
|
||||
You are a senior engineer. You write code that works on the first try.
|
||||
|
||||
## Your Mission
|
||||
|
||||
Given an implementation task:
|
||||
1. Check for context provided in the conversation (patterns, conventions, reference files).
|
||||
2. Fill gaps only — read files NOT already covered in context.
|
||||
3. Write the code using the write_file tool (NEVER output code in chat).
|
||||
4. Verify it compiles/builds using verify_build.
|
||||
5. Provide a summary of what you implemented.
|
||||
|
||||
## Using Provided Context (IMPORTANT)
|
||||
|
||||
Your prompt often contains prior findings from the explore agent: file paths,
|
||||
code patterns, and conventions.
|
||||
|
||||
**If context is provided:**
|
||||
1. Use it as your primary reference. Don't re-read files already summarized.
|
||||
2. Follow the code patterns shown — snippets in context ARE the style guide.
|
||||
3. Read referenced files ONLY IF you need more detail (full signatures, imports).
|
||||
4. If context includes a "Conventions" section, follow it exactly.
|
||||
|
||||
**If context is NOT provided or is too vague:**
|
||||
Fall back to self-exploration: search for similar files, read 1-2 examples,
|
||||
match their style.
|
||||
|
||||
## Writing Code
|
||||
|
||||
CRITICAL: Write code using the write_file tool. NEVER paste code in chat.
|
||||
|
||||
## Pattern Matching
|
||||
|
||||
Before writing ANY file:
|
||||
1. Find a similar existing file.
|
||||
2. Match its style: imports, naming, structure.
|
||||
3. Follow the same patterns exactly.
|
||||
|
||||
## Verification
|
||||
|
||||
After writing files:
|
||||
1. Run verify_build to check compilation.
|
||||
2. If it fails, fix the error (minimal change).
|
||||
3. Don't move on until build passes.
|
||||
|
||||
## Rules
|
||||
|
||||
1. Write code via tools — never output code to chat.
|
||||
2. Follow patterns — read existing files first.
|
||||
3. Verify builds — don't finish without checking.
|
||||
4. Minimal fixes — if build fails, fix precisely.
|
||||
5. No refactoring — only implement what's asked.
|
||||
"""
|
||||
|
||||
# Full tool set — coder gets write access and command execution
|
||||
CODER_TOOLS = [
|
||||
read_file,
|
||||
write_file,
|
||||
search_content,
|
||||
search_files,
|
||||
execute_command,
|
||||
verify_build,
|
||||
run_tests,
|
||||
]
|
||||
|
||||
|
||||
def create_coder_node(model_name: str = "gpt-4o", temperature: float = 0.1):
|
||||
"""
|
||||
Factory that returns a coder node function.
|
||||
|
||||
Coder needs a capable model because it writes production code. In Loki,
|
||||
coder uses the same model as the parent by default.
|
||||
|
||||
Args:
|
||||
model_name: Model identifier.
|
||||
temperature: LLM temperature (Loki coder uses 0.1 for consistency).
|
||||
"""
|
||||
llm = ChatOpenAI(model=model_name, temperature=temperature).bind_tools(CODER_TOOLS)
|
||||
|
||||
def coder_node(state: SisyphusState) -> dict:
|
||||
"""
|
||||
LangGraph node: run the coder agent.
|
||||
|
||||
Reads conversation history (including the supervisor's structured
|
||||
delegation prompt), invokes the LLM with write-capable tools,
|
||||
and returns the result.
|
||||
"""
|
||||
response = llm.invoke(
|
||||
[SystemMessage(content=CODER_SYSTEM_PROMPT)] + state["messages"]
|
||||
)
|
||||
return {
|
||||
"messages": [response],
|
||||
"agent_outputs": {
|
||||
**state.get("agent_outputs", {}),
|
||||
"coder": response.content,
|
||||
},
|
||||
}
|
||||
|
||||
return coder_node
|
||||
@@ -0,0 +1,110 @@
|
||||
"""
|
||||
Explore agent node — the read-only codebase researcher.
|
||||
|
||||
Loki equivalent: assets/agents/explore/config.yaml + tools.sh
|
||||
|
||||
In Loki, the explore agent is spawned via `agent__spawn --agent explore --prompt "..."`
|
||||
and runs as an isolated subprocess with its own session. It ends with
|
||||
"EXPLORE_COMPLETE" so the parent knows it's finished.
|
||||
|
||||
In LangGraph, the explore agent is a *node* in the graph. The supervisor routes
|
||||
to it via `Command(goto="explore")`. It reads the latest message (the supervisor's
|
||||
delegation prompt), calls the LLM with read-only tools, and writes its findings
|
||||
back to the shared message list. The graph edge then returns control to the
|
||||
supervisor.
|
||||
|
||||
Key differences from Loki:
|
||||
- No isolated session — shares the graph's message list (but has its own
|
||||
system prompt and tool set, just like Loki's per-agent config).
|
||||
- No "EXPLORE_COMPLETE" sentinel — the graph edge handles control flow.
|
||||
- No output summarization — LangGraph's state handles context management.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
from langchain_core.messages import SystemMessage
|
||||
from langchain_openai import ChatOpenAI
|
||||
|
||||
from sisyphus_langchain.state import SisyphusState
|
||||
from sisyphus_langchain.tools.filesystem import (
|
||||
list_directory,
|
||||
read_file,
|
||||
search_content,
|
||||
search_files,
|
||||
)
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# System prompt — faithfully mirrors explore/config.yaml
|
||||
# ---------------------------------------------------------------------------
|
||||
EXPLORE_SYSTEM_PROMPT = """\
|
||||
You are a codebase explorer. Your job: Search, find, report. Nothing else.
|
||||
|
||||
## Your Mission
|
||||
|
||||
Given a search task, you:
|
||||
1. Search for relevant files and patterns
|
||||
2. Read key files to understand structure
|
||||
3. Report findings concisely
|
||||
|
||||
## Strategy
|
||||
|
||||
1. **Find first, read second** — Never read a file without knowing why.
|
||||
2. **Use search_content to locate** — find exactly where things are defined.
|
||||
3. **Use search_files to discover** — find files by name pattern.
|
||||
4. **Read targeted sections** — use offset and limit to read only relevant lines.
|
||||
5. **Never read entire large files** — if a file is 500+ lines, read the relevant section only.
|
||||
|
||||
## Output Format
|
||||
|
||||
Always end your response with a structured findings summary:
|
||||
|
||||
FINDINGS:
|
||||
- [Key finding 1]
|
||||
- [Key finding 2]
|
||||
- Relevant files: [list of paths]
|
||||
|
||||
## Rules
|
||||
|
||||
1. Be fast — don't read every file, read representative ones.
|
||||
2. Be focused — answer the specific question asked.
|
||||
3. Be concise — report findings, not your process.
|
||||
4. Never modify files — you are read-only.
|
||||
5. Limit reads — max 5 file reads per exploration.
|
||||
"""
|
||||
|
||||
# Read-only tools — mirrors explore's tool set (no write_file, no execute_command)
|
||||
EXPLORE_TOOLS = [read_file, search_content, search_files, list_directory]
|
||||
|
||||
|
||||
def create_explore_node(model_name: str = "gpt-4o-mini", temperature: float = 0.1):
|
||||
"""
|
||||
Factory that returns an explore node function bound to a specific model.
|
||||
|
||||
In Loki, the model is set per-agent in config.yaml. Here we parameterize it
|
||||
so you can use a cheap model for exploration (cost optimization).
|
||||
|
||||
Args:
|
||||
model_name: OpenAI model identifier.
|
||||
temperature: LLM temperature (Loki explore uses 0.1).
|
||||
"""
|
||||
llm = ChatOpenAI(model=model_name, temperature=temperature).bind_tools(EXPLORE_TOOLS)
|
||||
|
||||
def explore_node(state: SisyphusState) -> dict:
|
||||
"""
|
||||
LangGraph node: run the explore agent.
|
||||
|
||||
Reads the conversation history, applies the explore system prompt,
|
||||
invokes the LLM with read-only tools, and returns the response.
|
||||
"""
|
||||
response = llm.invoke(
|
||||
[SystemMessage(content=EXPLORE_SYSTEM_PROMPT)] + state["messages"]
|
||||
)
|
||||
return {
|
||||
"messages": [response],
|
||||
"agent_outputs": {
|
||||
**state.get("agent_outputs", {}),
|
||||
"explore": response.content,
|
||||
},
|
||||
}
|
||||
|
||||
return explore_node
|
||||
@@ -0,0 +1,124 @@
|
||||
"""
|
||||
Oracle agent node — the high-IQ architecture and debugging advisor.
|
||||
|
||||
Loki equivalent: assets/agents/oracle/config.yaml + tools.sh
|
||||
|
||||
In Loki, the oracle is a READ-ONLY advisor spawned for:
|
||||
- Architecture decisions and multi-system tradeoffs
|
||||
- Complex debugging (after 2+ failed fix attempts)
|
||||
- Code/design review
|
||||
- Risk assessment
|
||||
|
||||
It uses temperature 0.2 (slightly higher than explore/coder for more creative
|
||||
reasoning) and ends with "ORACLE_COMPLETE".
|
||||
|
||||
In LangGraph, oracle is a node that receives the full message history, reasons
|
||||
about the problem, and writes structured advice back. It has read-only tools
|
||||
only — it never modifies files.
|
||||
|
||||
Key Loki→LangGraph mapping:
|
||||
- Loki oracle triggers (the "MUST spawn oracle when..." rules in sisyphus)
|
||||
become routing conditions in the supervisor node.
|
||||
- Oracle's structured output format (Analysis/Recommendation/Reasoning/Risks)
|
||||
is enforced via the system prompt, same as in Loki.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
from langchain_core.messages import SystemMessage
|
||||
from langchain_openai import ChatOpenAI
|
||||
|
||||
from sisyphus_langchain.state import SisyphusState
|
||||
from sisyphus_langchain.tools.filesystem import (
|
||||
list_directory,
|
||||
read_file,
|
||||
search_content,
|
||||
search_files,
|
||||
)
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# System prompt — faithfully mirrors oracle/config.yaml
|
||||
# ---------------------------------------------------------------------------
|
||||
ORACLE_SYSTEM_PROMPT = """\
|
||||
You are Oracle — a senior architect and debugger consulted for complex decisions.
|
||||
|
||||
## Your Role
|
||||
|
||||
You are READ-ONLY. You analyze, advise, and recommend. You do NOT implement.
|
||||
|
||||
## When You're Consulted
|
||||
|
||||
1. **Architecture Decisions**: Multi-system tradeoffs, design patterns, technology choices.
|
||||
2. **Complex Debugging**: After 2+ failed fix attempts, deep analysis needed.
|
||||
3. **Code Review**: Evaluating proposed designs or implementations.
|
||||
4. **Risk Assessment**: Security, performance, or reliability concerns.
|
||||
|
||||
## Your Process
|
||||
|
||||
1. **Understand**: Read relevant code, understand the full context.
|
||||
2. **Analyze**: Consider multiple angles and tradeoffs.
|
||||
3. **Recommend**: Provide clear, actionable advice.
|
||||
4. **Justify**: Explain your reasoning.
|
||||
|
||||
## Output Format
|
||||
|
||||
Structure your response as:
|
||||
|
||||
## Analysis
|
||||
[Your understanding of the situation]
|
||||
|
||||
## Recommendation
|
||||
[Clear, specific advice]
|
||||
|
||||
## Reasoning
|
||||
[Why this is the right approach]
|
||||
|
||||
## Risks/Considerations
|
||||
[What to watch out for]
|
||||
|
||||
## Rules
|
||||
|
||||
1. Never modify files — you advise, others implement.
|
||||
2. Be thorough — read all relevant context before advising.
|
||||
3. Be specific — general advice isn't helpful.
|
||||
4. Consider tradeoffs — there are rarely perfect solutions.
|
||||
5. Stay focused — answer the specific question asked.
|
||||
"""
|
||||
|
||||
# Read-only tools — same set as explore (oracle never writes)
|
||||
ORACLE_TOOLS = [read_file, search_content, search_files, list_directory]
|
||||
|
||||
|
||||
def create_oracle_node(model_name: str = "gpt-4o", temperature: float = 0.2):
|
||||
"""
|
||||
Factory that returns an oracle node function.
|
||||
|
||||
Oracle uses a more expensive model than explore because it needs deeper
|
||||
reasoning. In Loki, the model is inherited from the global config unless
|
||||
overridden in oracle/config.yaml.
|
||||
|
||||
Args:
|
||||
model_name: Model identifier (use a strong reasoning model).
|
||||
temperature: LLM temperature (Loki oracle uses 0.2).
|
||||
"""
|
||||
llm = ChatOpenAI(model=model_name, temperature=temperature).bind_tools(ORACLE_TOOLS)
|
||||
|
||||
def oracle_node(state: SisyphusState) -> dict:
|
||||
"""
|
||||
LangGraph node: run the oracle agent.
|
||||
|
||||
Reads conversation history, applies the oracle system prompt,
|
||||
invokes the LLM, and returns structured advice.
|
||||
"""
|
||||
response = llm.invoke(
|
||||
[SystemMessage(content=ORACLE_SYSTEM_PROMPT)] + state["messages"]
|
||||
)
|
||||
return {
|
||||
"messages": [response],
|
||||
"agent_outputs": {
|
||||
**state.get("agent_outputs", {}),
|
||||
"oracle": response.content,
|
||||
},
|
||||
}
|
||||
|
||||
return oracle_node
|
||||
@@ -0,0 +1,227 @@
|
||||
"""
|
||||
Sisyphus supervisor node — the orchestrator that classifies intent and routes.
|
||||
|
||||
Loki equivalent: assets/agents/sisyphus/config.yaml
|
||||
|
||||
This is the brain of the system. In Loki, Sisyphus is the top-level agent that:
|
||||
1. Classifies every incoming request (trivial / exploration / implementation /
|
||||
architecture / ambiguous)
|
||||
2. Routes to the appropriate sub-agent (explore, coder, oracle)
|
||||
3. Manages the todo list for multi-step tasks
|
||||
4. Verifies results and decides when the task is complete
|
||||
|
||||
In LangGraph, the supervisor is a node that returns `Command(goto="agent_name")`
|
||||
to route control. This replaces Loki's `agent__spawn` + `agent__collect` pattern
|
||||
with a declarative graph edge.
|
||||
|
||||
Key Loki→LangGraph mapping:
|
||||
- agent__spawn --agent explore → Command(goto="explore")
|
||||
- agent__spawn --agent coder → Command(goto="coder")
|
||||
- agent__spawn --agent oracle → Command(goto="oracle")
|
||||
- agent__check / agent__collect → (implicit: graph edges return to supervisor)
|
||||
- todo__init / todo__add → state["todos"] updates
|
||||
- user__ask / user__confirm → interrupt() for human-in-the-loop
|
||||
|
||||
Parallel execution note:
|
||||
Loki can spawn multiple explore agents in parallel. In LangGraph, you'd use
|
||||
the Send() API for dynamic fan-out. For simplicity, this implementation uses
|
||||
sequential routing. See the README for how to add parallel fan-out.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
from typing import Literal
|
||||
|
||||
from langchain_core.messages import SystemMessage
|
||||
from langchain_openai import ChatOpenAI
|
||||
from langgraph.types import Command
|
||||
from pydantic import BaseModel, Field
|
||||
|
||||
from sisyphus_langchain.state import SisyphusState
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Maximum iterations before forcing completion (safety valve)
|
||||
# Mirrors Loki's max_auto_continues: 25
|
||||
# ---------------------------------------------------------------------------
|
||||
MAX_ITERATIONS = 15
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Structured output schema for the supervisor's routing decision.
|
||||
#
|
||||
# In Loki, the supervisor is an LLM that produces free-text and calls tools
|
||||
# like agent__spawn. In LangGraph, we use structured output to force the
|
||||
# LLM into a typed routing decision — more reliable than parsing free text.
|
||||
# ---------------------------------------------------------------------------
|
||||
class RoutingDecision(BaseModel):
|
||||
"""The supervisor's decision about what to do next."""
|
||||
|
||||
intent: Literal["trivial", "exploration", "implementation", "architecture", "ambiguous"] = Field(
|
||||
description="Classified intent of the user's request."
|
||||
)
|
||||
next_agent: Literal["explore", "oracle", "coder", "FINISH"] = Field(
|
||||
description=(
|
||||
"Which agent to route to. 'explore' for research/discovery, "
|
||||
"'oracle' for architecture/design/debugging advice, "
|
||||
"'coder' for implementation, 'FINISH' if the task is complete."
|
||||
)
|
||||
)
|
||||
delegation_notes: str = Field(
|
||||
description=(
|
||||
"Brief instructions for the target agent: what to look for (explore), "
|
||||
"what to analyze (oracle), or what to implement (coder). "
|
||||
"For FINISH, summarize what was accomplished."
|
||||
)
|
||||
)
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Supervisor system prompt — faithfully mirrors sisyphus/config.yaml
|
||||
# ---------------------------------------------------------------------------
|
||||
SUPERVISOR_SYSTEM_PROMPT = """\
|
||||
You are Sisyphus — an orchestrator that drives coding tasks to completion.
|
||||
|
||||
Your job: Classify → Delegate → Verify → Complete.
|
||||
|
||||
## Intent Classification (BEFORE every action)
|
||||
|
||||
| Type | Signal | Action |
|
||||
|-----------------|-----------------------------------------------------|----------------------|
|
||||
| trivial | Single file, known location, typo fix | Route to FINISH |
|
||||
| exploration | "Find X", "Where is Y", "List all Z" | Route to explore |
|
||||
| implementation | "Add feature", "Fix bug", "Write code" | Route to coder |
|
||||
| architecture | See oracle triggers below | Route to oracle |
|
||||
| ambiguous | Unclear scope, multiple interpretations | Route to FINISH with a clarifying question |
|
||||
|
||||
## Oracle Triggers (MUST route to oracle when you see these)
|
||||
|
||||
Route to oracle ANY time the user asks about:
|
||||
- "How should I..." / "What's the best way to..." — design/approach questions
|
||||
- "Why does X keep..." / "What's wrong with..." — complex debugging
|
||||
- "Should I use X or Y?" — technology or pattern choices
|
||||
- "How should this be structured?" — architecture
|
||||
- "Review this" / "What do you think of..." — code/design review
|
||||
- Tradeoff questions, multi-component questions, vague/open-ended questions
|
||||
|
||||
## Agent Specializations
|
||||
|
||||
| Agent | Use For |
|
||||
|---------|-----------------------------------------------|
|
||||
| explore | Find patterns, understand code, search |
|
||||
| coder | Write/edit files, implement features |
|
||||
| oracle | Architecture decisions, complex debugging |
|
||||
|
||||
## Workflow Patterns
|
||||
|
||||
### Implementation task: explore → coder
|
||||
1. Route to explore to find existing patterns and conventions.
|
||||
2. Review explore findings.
|
||||
3. Route to coder with a structured prompt including the explore findings.
|
||||
4. Verify the coder's output (check for CODER_COMPLETE or CODER_FAILED).
|
||||
|
||||
### Architecture question: explore + oracle
|
||||
1. Route to explore to find relevant code.
|
||||
2. Route to oracle with the explore findings for analysis.
|
||||
|
||||
### Simple question: oracle directly
|
||||
For pure design/architecture questions, route to oracle directly.
|
||||
|
||||
## Rules
|
||||
|
||||
1. Always classify before acting.
|
||||
2. You are a coordinator, not an implementer.
|
||||
3. Route to oracle for ANY design/architecture question.
|
||||
4. When routing to coder, include code patterns from explore findings.
|
||||
5. Route to FINISH when the task is fully addressed.
|
||||
|
||||
## Current State
|
||||
|
||||
Iteration: {iteration_count}/{max_iterations}
|
||||
Previous agent outputs: {agent_outputs}
|
||||
"""
|
||||
|
||||
|
||||
def create_supervisor_node(model_name: str = "gpt-4o", temperature: float = 0.1):
|
||||
"""
|
||||
Factory that returns a supervisor node function.
|
||||
|
||||
The supervisor uses a capable model for accurate routing.
|
||||
|
||||
Args:
|
||||
model_name: Model identifier.
|
||||
temperature: LLM temperature (low for consistent routing).
|
||||
"""
|
||||
llm = ChatOpenAI(model=model_name, temperature=temperature).with_structured_output(
|
||||
RoutingDecision
|
||||
)
|
||||
|
||||
def supervisor_node(
|
||||
state: SisyphusState,
|
||||
) -> Command[Literal["explore", "oracle", "coder", "__end__"]]:
|
||||
"""
|
||||
LangGraph node: the Sisyphus supervisor.
|
||||
|
||||
Classifies the user's intent, decides which agent to route to,
|
||||
and returns a Command that directs graph execution.
|
||||
"""
|
||||
iteration = state.get("iteration_count", 0)
|
||||
|
||||
# Safety valve — prevent infinite loops
|
||||
if iteration >= MAX_ITERATIONS:
|
||||
return Command(
|
||||
goto="__end__",
|
||||
update={
|
||||
"final_output": "Reached maximum iterations. Here's what was accomplished:\n"
|
||||
+ "\n".join(
|
||||
f"- {k}: {v[:200]}" for k, v in state.get("agent_outputs", {}).items()
|
||||
),
|
||||
},
|
||||
)
|
||||
|
||||
# Format the system prompt with current state
|
||||
prompt = SUPERVISOR_SYSTEM_PROMPT.format(
|
||||
iteration_count=iteration,
|
||||
max_iterations=MAX_ITERATIONS,
|
||||
agent_outputs=_summarize_outputs(state.get("agent_outputs", {})),
|
||||
)
|
||||
|
||||
# Invoke the LLM to get a structured routing decision
|
||||
decision: RoutingDecision = llm.invoke(
|
||||
[SystemMessage(content=prompt)] + state["messages"]
|
||||
)
|
||||
|
||||
# Route to FINISH
|
||||
if decision.next_agent == "FINISH":
|
||||
return Command(
|
||||
goto="__end__",
|
||||
update={
|
||||
"intent": decision.intent,
|
||||
"next_agent": "FINISH",
|
||||
"final_output": decision.delegation_notes,
|
||||
},
|
||||
)
|
||||
|
||||
# Route to a worker agent
|
||||
return Command(
|
||||
goto=decision.next_agent,
|
||||
update={
|
||||
"intent": decision.intent,
|
||||
"next_agent": decision.next_agent,
|
||||
"iteration_count": iteration + 1,
|
||||
},
|
||||
)
|
||||
|
||||
return supervisor_node
|
||||
|
||||
|
||||
def _summarize_outputs(outputs: dict[str, str]) -> str:
|
||||
"""Summarize agent outputs for the supervisor's context window."""
|
||||
if not outputs:
|
||||
return "(none yet)"
|
||||
parts = []
|
||||
for agent, output in outputs.items():
|
||||
# Truncate long outputs to keep supervisor context manageable
|
||||
# This mirrors Loki's summarization_threshold behavior
|
||||
if len(output) > 2000:
|
||||
output = output[:2000] + "... (truncated)"
|
||||
parts.append(f"[{agent}]: {output}")
|
||||
return "\n\n".join(parts)
|
||||
@@ -0,0 +1,155 @@
|
||||
"""
|
||||
CLI entry point for the Sisyphus LangChain agent.
|
||||
|
||||
This mirrors Loki's `loki --agent sisyphus` entry point.
|
||||
|
||||
In Loki:
|
||||
loki --agent sisyphus
|
||||
# Starts a REPL with the sisyphus agent loaded
|
||||
|
||||
In this LangChain version:
|
||||
python -m sisyphus_langchain.cli
|
||||
# or: sisyphus (if installed via pip)
|
||||
|
||||
Usage:
|
||||
# Interactive REPL mode
|
||||
sisyphus
|
||||
|
||||
# One-shot query
|
||||
sisyphus "Add a health check endpoint to the API"
|
||||
|
||||
# With custom models
|
||||
sisyphus --supervisor-model gpt-4o --explore-model gpt-4o-mini "Find auth patterns"
|
||||
|
||||
Environment variables:
|
||||
OPENAI_API_KEY — Required for OpenAI models
|
||||
ANTHROPIC_API_KEY — Required if using Anthropic models
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import argparse
|
||||
import sys
|
||||
import uuid
|
||||
|
||||
from langchain_core.messages import HumanMessage
|
||||
|
||||
from sisyphus_langchain.graph import build_graph
|
||||
|
||||
|
||||
def run_query(graph, query: str, thread_id: str) -> str:
|
||||
"""
|
||||
Run a single query through the Sisyphus graph.
|
||||
|
||||
Args:
|
||||
graph: Compiled LangGraph.
|
||||
query: User's natural language request.
|
||||
thread_id: Session identifier for checkpointing.
|
||||
|
||||
Returns:
|
||||
The final output string.
|
||||
"""
|
||||
result = graph.invoke(
|
||||
{
|
||||
"messages": [HumanMessage(content=query)],
|
||||
"intent": "ambiguous",
|
||||
"next_agent": "",
|
||||
"iteration_count": 0,
|
||||
"todos": [],
|
||||
"agent_outputs": {},
|
||||
"final_output": "",
|
||||
"project_dir": ".",
|
||||
},
|
||||
config={
|
||||
"configurable": {"thread_id": thread_id},
|
||||
"recursion_limit": 50,
|
||||
},
|
||||
)
|
||||
return result.get("final_output", "(no output)")
|
||||
|
||||
|
||||
def repl(graph, thread_id: str) -> None:
|
||||
"""
|
||||
Interactive REPL loop — mirrors Loki's REPL mode.
|
||||
|
||||
Maintains conversation across turns via the thread_id (checkpointer).
|
||||
"""
|
||||
print("Sisyphus (LangChain) — type 'quit' to exit")
|
||||
print("=" * 50)
|
||||
|
||||
while True:
|
||||
try:
|
||||
query = input("\n> ").strip()
|
||||
except (EOFError, KeyboardInterrupt):
|
||||
print("\nBye.")
|
||||
break
|
||||
|
||||
if not query:
|
||||
continue
|
||||
if query.lower() in ("quit", "exit", "q"):
|
||||
print("Bye.")
|
||||
break
|
||||
|
||||
try:
|
||||
output = run_query(graph, query, thread_id)
|
||||
print(f"\n{output}")
|
||||
except Exception as e:
|
||||
print(f"\nError: {e}")
|
||||
|
||||
|
||||
def main() -> None:
|
||||
"""CLI entry point."""
|
||||
parser = argparse.ArgumentParser(
|
||||
description="Sisyphus — multi-agent coding orchestrator (LangChain edition)"
|
||||
)
|
||||
parser.add_argument(
|
||||
"query",
|
||||
nargs="?",
|
||||
help="One-shot query (omit for REPL mode)",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--supervisor-model",
|
||||
default="gpt-4o",
|
||||
help="Model for the supervisor (default: gpt-4o)",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--explore-model",
|
||||
default="gpt-4o-mini",
|
||||
help="Model for the explore agent (default: gpt-4o-mini)",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--oracle-model",
|
||||
default="gpt-4o",
|
||||
help="Model for the oracle agent (default: gpt-4o)",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--coder-model",
|
||||
default="gpt-4o",
|
||||
help="Model for the coder agent (default: gpt-4o)",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--thread-id",
|
||||
default=None,
|
||||
help="Session thread ID for persistence (auto-generated if omitted)",
|
||||
)
|
||||
|
||||
args = parser.parse_args()
|
||||
|
||||
graph = build_graph(
|
||||
supervisor_model=args.supervisor_model,
|
||||
explore_model=args.explore_model,
|
||||
oracle_model=args.oracle_model,
|
||||
coder_model=args.coder_model,
|
||||
)
|
||||
|
||||
thread_id = args.thread_id or f"sisyphus-{uuid.uuid4().hex[:8]}"
|
||||
|
||||
if args.query:
|
||||
output = run_query(graph, args.query, thread_id)
|
||||
print(output)
|
||||
else:
|
||||
repl(graph, thread_id)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
@@ -0,0 +1,115 @@
|
||||
"""
|
||||
Graph assembly — wires together the supervisor and worker nodes.
|
||||
|
||||
This is the LangGraph equivalent of Loki's runtime agent execution engine
|
||||
(src/supervisor/mod.rs + src/config/request_context.rs).
|
||||
|
||||
In Loki, the runtime:
|
||||
1. Loads the agent config (config.yaml)
|
||||
2. Compiles tools (tools.sh → binary)
|
||||
3. Starts a chat loop: user → LLM → tool calls → LLM → ...
|
||||
4. For orchestrators with can_spawn_agents: true, the supervisor module
|
||||
manages child agent lifecycle (spawn, check, collect, cancel).
|
||||
|
||||
In LangGraph, all of this is declarative:
|
||||
1. Define nodes (supervisor, explore, oracle, coder)
|
||||
2. Define edges (workers always return to supervisor)
|
||||
3. Compile the graph (with optional checkpointer for persistence)
|
||||
4. Invoke with initial state
|
||||
|
||||
The graph topology:
|
||||
|
||||
┌─────────────────────────────────────────────┐
|
||||
│ SUPERVISOR │
|
||||
│ (classifies intent, routes to workers) │
|
||||
└─────┬──────────┬──────────┬─────────────────┘
|
||||
│ │ │
|
||||
▼ ▼ ▼
|
||||
┌────────┐ ┌────────┐ ┌────────┐
|
||||
│EXPLORE │ │ ORACLE │ │ CODER │
|
||||
│(search)│ │(advise)│ │(build) │
|
||||
└───┬────┘ └───┬────┘ └───┬────┘
|
||||
│ │ │
|
||||
└──────────┼──────────┘
|
||||
│
|
||||
(back to supervisor)
|
||||
|
||||
Every worker returns to the supervisor. The supervisor decides what to do next:
|
||||
route to another worker, or end the graph.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
from langgraph.checkpoint.memory import MemorySaver
|
||||
from langgraph.graph import END, START, StateGraph
|
||||
|
||||
from sisyphus_langchain.agents.coder import create_coder_node
|
||||
from sisyphus_langchain.agents.explore import create_explore_node
|
||||
from sisyphus_langchain.agents.oracle import create_oracle_node
|
||||
from sisyphus_langchain.agents.supervisor import create_supervisor_node
|
||||
from sisyphus_langchain.state import SisyphusState
|
||||
|
||||
|
||||
def build_graph(
|
||||
*,
|
||||
supervisor_model: str = "gpt-4o",
|
||||
explore_model: str = "gpt-4o-mini",
|
||||
oracle_model: str = "gpt-4o",
|
||||
coder_model: str = "gpt-4o",
|
||||
use_checkpointer: bool = True,
|
||||
):
|
||||
"""
|
||||
Build and compile the Sisyphus LangGraph.
|
||||
|
||||
This is the main entry point for creating the agent system. It wires
|
||||
together all nodes and edges, optionally adds a checkpointer for
|
||||
persistence, and returns a compiled graph ready to invoke.
|
||||
|
||||
Args:
|
||||
supervisor_model: Model for the routing supervisor.
|
||||
explore_model: Model for the explore agent (can be cheaper).
|
||||
oracle_model: Model for the oracle agent (should be strong).
|
||||
coder_model: Model for the coder agent.
|
||||
use_checkpointer: Whether to add MemorySaver for session persistence.
|
||||
|
||||
Returns:
|
||||
A compiled LangGraph ready to .invoke() or .stream().
|
||||
|
||||
Model cost optimization (mirrors Loki's per-agent model config):
|
||||
- supervisor: expensive (accurate routing is critical)
|
||||
- explore: cheap (just searching, not reasoning deeply)
|
||||
- oracle: expensive (deep reasoning, architecture advice)
|
||||
- coder: expensive (writing correct code matters)
|
||||
"""
|
||||
# Create the graph builder with our typed state
|
||||
builder = StateGraph(SisyphusState)
|
||||
|
||||
# ── Register nodes ─────────────────────────────────────────────────
|
||||
# Each node is a function that takes state and returns state updates.
|
||||
# This mirrors Loki's agent registration (agents are discovered by
|
||||
# their config.yaml in the agents/ directory).
|
||||
builder.add_node("supervisor", create_supervisor_node(supervisor_model))
|
||||
builder.add_node("explore", create_explore_node(explore_model))
|
||||
builder.add_node("oracle", create_oracle_node(oracle_model))
|
||||
builder.add_node("coder", create_coder_node(coder_model))
|
||||
|
||||
# ── Define edges ───────────────────────────────────────────────────
|
||||
# Entry point: every invocation starts at the supervisor
|
||||
builder.add_edge(START, "supervisor")
|
||||
|
||||
# Workers always return to supervisor (the hub-and-spoke pattern).
|
||||
# In Loki, this is implicit: agent__collect returns output to the parent,
|
||||
# and the parent (sisyphus) decides what to do next.
|
||||
builder.add_edge("explore", "supervisor")
|
||||
builder.add_edge("oracle", "supervisor")
|
||||
builder.add_edge("coder", "supervisor")
|
||||
|
||||
# The supervisor node itself uses Command(goto=...) to route,
|
||||
# so we don't need add_conditional_edges — the Command API
|
||||
# handles dynamic routing internally.
|
||||
|
||||
# ── Compile ────────────────────────────────────────────────────────
|
||||
checkpointer = MemorySaver() if use_checkpointer else None
|
||||
graph = builder.compile(checkpointer=checkpointer)
|
||||
|
||||
return graph
|
||||
@@ -0,0 +1,100 @@
|
||||
"""
|
||||
Shared state schema for the Sisyphus orchestrator graph.
|
||||
|
||||
In LangGraph, state is the single source of truth that flows through every node.
|
||||
This is analogous to Loki's per-agent session context, but unified into one typed
|
||||
dictionary that the entire graph shares.
|
||||
|
||||
Loki Concept Mapping:
|
||||
- Loki session context → SisyphusState (TypedDict)
|
||||
- Loki todo__init / todo__add → SisyphusState.todos list
|
||||
- Loki agent__spawn outputs → SisyphusState.agent_outputs dict
|
||||
- Loki intent classification → SisyphusState.intent field
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
from dataclasses import dataclass, field
|
||||
from typing import Annotated, Literal
|
||||
|
||||
from langchain_core.messages import BaseMessage
|
||||
from langgraph.graph.message import add_messages
|
||||
from typing_extensions import TypedDict
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Intent types — mirrors Loki's Sisyphus classification table
|
||||
# ---------------------------------------------------------------------------
|
||||
IntentType = Literal[
|
||||
"trivial", # Single file, known location, typo fix → handle yourself
|
||||
"exploration", # "Find X", "Where is Y" → spawn explore
|
||||
"implementation", # "Add feature", "Fix bug" → spawn coder
|
||||
"architecture", # Design questions, oracle triggers → spawn oracle
|
||||
"ambiguous", # Unclear scope → ask user
|
||||
]
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Todo item — mirrors Loki's built-in todo system
|
||||
# ---------------------------------------------------------------------------
|
||||
@dataclass
|
||||
class TodoItem:
|
||||
"""A single task in the orchestrator's todo list."""
|
||||
id: int
|
||||
task: str
|
||||
done: bool = False
|
||||
|
||||
|
||||
def _merge_todos(existing: list[TodoItem], new: list[TodoItem]) -> list[TodoItem]:
|
||||
"""
|
||||
Reducer for the todos field.
|
||||
|
||||
LangGraph requires a reducer for any state field that can be written by
|
||||
multiple nodes. This merges by id: if a todo with the same id already
|
||||
exists, the incoming version wins (allows marking done).
|
||||
"""
|
||||
by_id = {t.id: t for t in existing}
|
||||
for t in new:
|
||||
by_id[t.id] = t
|
||||
return list(by_id.values())
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Core graph state
|
||||
# ---------------------------------------------------------------------------
|
||||
class SisyphusState(TypedDict):
|
||||
"""
|
||||
The shared state that flows through every node in the Sisyphus graph.
|
||||
|
||||
Annotated fields use *reducers* — functions that merge concurrent writes.
|
||||
Without reducers, parallel node outputs would overwrite each other.
|
||||
"""
|
||||
|
||||
# Conversation history — the `add_messages` reducer appends new messages
|
||||
# instead of replacing the list. This is critical: every node adds its
|
||||
# response here, and downstream nodes see the full history.
|
||||
#
|
||||
# Loki equivalent: each agent's chat session accumulates messages the same
|
||||
# way, but messages are scoped per-agent. In LangGraph the shared message
|
||||
# list IS the inter-agent communication channel.
|
||||
messages: Annotated[list[BaseMessage], add_messages]
|
||||
|
||||
# Classified intent for the current request
|
||||
intent: IntentType
|
||||
|
||||
# Which agent the supervisor routed to last
|
||||
next_agent: str
|
||||
|
||||
# Iteration counter — safety valve analogous to Loki's max_auto_continues
|
||||
iteration_count: int
|
||||
|
||||
# Todo list for multi-step tracking (mirrors Loki's todo__* tools)
|
||||
todos: Annotated[list[TodoItem], _merge_todos]
|
||||
|
||||
# Accumulated outputs from sub-agent nodes, keyed by agent name.
|
||||
# The supervisor reads these to decide what to do next.
|
||||
agent_outputs: dict[str, str]
|
||||
|
||||
# Final synthesized answer to return to the user
|
||||
final_output: str
|
||||
|
||||
# The working directory / project path (mirrors Loki's project_dir variable)
|
||||
project_dir: str
|
||||
@@ -0,0 +1 @@
|
||||
"""Tool definitions for Sisyphus agents."""
|
||||
@@ -0,0 +1,175 @@
|
||||
"""
|
||||
Filesystem tools for Sisyphus agents.
|
||||
|
||||
These are the LangChain equivalents of Loki's global tools:
|
||||
- fs_read.sh → read_file
|
||||
- fs_grep.sh → search_content
|
||||
- fs_glob.sh → search_files
|
||||
- fs_ls.sh → list_directory
|
||||
- fs_write.sh → write_file
|
||||
- fs_patch.sh → (omitted — write_file covers full rewrites)
|
||||
|
||||
Loki Concept Mapping:
|
||||
Loki tools are bash scripts with @cmd annotations that Loki's compiler
|
||||
turns into function-calling declarations. In LangChain, we use the @tool
|
||||
decorator which serves the same purpose: it generates the JSON schema
|
||||
that the LLM sees, and wraps the Python function for execution.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import fnmatch
|
||||
import os
|
||||
import re
|
||||
import subprocess
|
||||
|
||||
from langchain_core.tools import tool
|
||||
|
||||
|
||||
@tool
|
||||
def read_file(path: str, offset: int = 1, limit: int = 200) -> str:
|
||||
"""Read a file's contents with optional line range.
|
||||
|
||||
Args:
|
||||
path: Path to the file (absolute or relative to cwd).
|
||||
offset: 1-based line number to start from.
|
||||
limit: Maximum number of lines to return.
|
||||
"""
|
||||
path = os.path.expanduser(path)
|
||||
if not os.path.isfile(path):
|
||||
return f"Error: file not found: {path}"
|
||||
|
||||
try:
|
||||
with open(path, "r", encoding="utf-8", errors="replace") as f:
|
||||
lines = f.readlines()
|
||||
except Exception as e:
|
||||
return f"Error reading {path}: {e}"
|
||||
|
||||
total = len(lines)
|
||||
start = max(0, offset - 1)
|
||||
end = min(total, start + limit)
|
||||
selected = lines[start:end]
|
||||
|
||||
result = f"File: {path} (lines {start + 1}-{end} of {total})\n\n"
|
||||
for i, line in enumerate(selected, start=start + 1):
|
||||
result += f"{i}: {line}"
|
||||
|
||||
if end < total:
|
||||
result += f"\n... truncated ({total} total lines)"
|
||||
|
||||
return result
|
||||
|
||||
|
||||
@tool
|
||||
def write_file(path: str, content: str) -> str:
|
||||
"""Write complete contents to a file, creating parent directories as needed.
|
||||
|
||||
Args:
|
||||
path: Path for the file.
|
||||
content: Complete file contents to write.
|
||||
"""
|
||||
path = os.path.expanduser(path)
|
||||
os.makedirs(os.path.dirname(path) or ".", exist_ok=True)
|
||||
try:
|
||||
with open(path, "w", encoding="utf-8") as f:
|
||||
f.write(content)
|
||||
return f"Wrote: {path}"
|
||||
except Exception as e:
|
||||
return f"Error writing {path}: {e}"
|
||||
|
||||
|
||||
@tool
|
||||
def search_content(pattern: str, directory: str = ".", file_type: str = "") -> str:
|
||||
"""Search for a text/regex pattern in files under a directory.
|
||||
|
||||
Args:
|
||||
pattern: Text or regex pattern to search for.
|
||||
directory: Root directory to search in.
|
||||
file_type: Optional file extension filter (e.g. "py", "rs").
|
||||
"""
|
||||
directory = os.path.expanduser(directory)
|
||||
cmd = ["grep", "-rn"]
|
||||
if file_type:
|
||||
cmd += [f"--include=*.{file_type}"]
|
||||
cmd += [pattern, directory]
|
||||
|
||||
try:
|
||||
result = subprocess.run(cmd, capture_output=True, text=True, timeout=30)
|
||||
lines = result.stdout.strip().splitlines()
|
||||
except Exception as e:
|
||||
return f"Error: {e}"
|
||||
|
||||
# Filter noise
|
||||
noise = {"/.git/", "/node_modules/", "/target/", "/dist/", "/__pycache__/"}
|
||||
filtered = [l for l in lines if not any(n in l for n in noise)][:30]
|
||||
|
||||
if not filtered:
|
||||
return "No matches found."
|
||||
return "\n".join(filtered)
|
||||
|
||||
|
||||
@tool
|
||||
def search_files(pattern: str, directory: str = ".") -> str:
|
||||
"""Find files matching a glob pattern.
|
||||
|
||||
Args:
|
||||
pattern: Glob pattern (e.g. '*.py', 'config*', '*test*').
|
||||
directory: Directory to search in.
|
||||
"""
|
||||
directory = os.path.expanduser(directory)
|
||||
noise = {".git", "node_modules", "target", "dist", "__pycache__"}
|
||||
matches: list[str] = []
|
||||
|
||||
for root, dirs, files in os.walk(directory):
|
||||
dirs[:] = [d for d in dirs if d not in noise]
|
||||
for name in files:
|
||||
if fnmatch.fnmatch(name, pattern):
|
||||
matches.append(os.path.join(root, name))
|
||||
if len(matches) >= 25:
|
||||
break
|
||||
if len(matches) >= 25:
|
||||
break
|
||||
|
||||
if not matches:
|
||||
return "No files found."
|
||||
return "\n".join(matches)
|
||||
|
||||
|
||||
@tool
|
||||
def list_directory(path: str = ".", max_depth: int = 3) -> str:
|
||||
"""List directory tree structure.
|
||||
|
||||
Args:
|
||||
path: Directory to list.
|
||||
max_depth: Maximum depth to recurse.
|
||||
"""
|
||||
path = os.path.expanduser(path)
|
||||
if not os.path.isdir(path):
|
||||
return f"Error: not a directory: {path}"
|
||||
|
||||
noise = {".git", "node_modules", "target", "dist", "__pycache__", ".venv", "venv"}
|
||||
lines: list[str] = []
|
||||
|
||||
def _walk(dir_path: str, prefix: str, depth: int) -> None:
|
||||
if depth > max_depth:
|
||||
return
|
||||
try:
|
||||
entries = sorted(os.listdir(dir_path))
|
||||
except PermissionError:
|
||||
return
|
||||
|
||||
dirs = [e for e in entries if os.path.isdir(os.path.join(dir_path, e)) and e not in noise]
|
||||
files = [e for e in entries if os.path.isfile(os.path.join(dir_path, e))]
|
||||
|
||||
for f in files[:20]:
|
||||
lines.append(f"{prefix}{f}")
|
||||
if len(files) > 20:
|
||||
lines.append(f"{prefix}... ({len(files) - 20} more files)")
|
||||
|
||||
for d in dirs:
|
||||
lines.append(f"{prefix}{d}/")
|
||||
_walk(os.path.join(dir_path, d), prefix + " ", depth + 1)
|
||||
|
||||
lines.append(f"{os.path.basename(path) or path}/")
|
||||
_walk(path, " ", 1)
|
||||
return "\n".join(lines[:200])
|
||||
@@ -0,0 +1,142 @@
|
||||
"""
|
||||
Project detection and build/test tools.
|
||||
|
||||
These mirror Loki's .shared/utils.sh detect_project() heuristic and the
|
||||
sisyphus/coder tools.sh run_build / run_tests / verify_build commands.
|
||||
|
||||
Loki Concept Mapping:
|
||||
Loki uses a heuristic cascade: check for Cargo.toml → go.mod → package.json
|
||||
etc., then falls back to an LLM call for unknown projects. We replicate the
|
||||
heuristic portion here. The LLM fallback is omitted since the agents
|
||||
themselves can reason about unknown project types.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import json
|
||||
import os
|
||||
import subprocess
|
||||
|
||||
from langchain_core.tools import tool
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Project detection (mirrors _detect_heuristic in utils.sh)
|
||||
# ---------------------------------------------------------------------------
|
||||
_HEURISTICS: list[tuple[str, dict[str, str]]] = [
|
||||
("Cargo.toml", {"type": "rust", "build": "cargo build", "test": "cargo test", "check": "cargo check"}),
|
||||
("go.mod", {"type": "go", "build": "go build ./...", "test": "go test ./...", "check": "go vet ./..."}),
|
||||
("package.json", {"type": "nodejs", "build": "npm run build", "test": "npm test", "check": "npm run lint"}),
|
||||
("pyproject.toml", {"type": "python", "build": "", "test": "pytest", "check": "ruff check ."}),
|
||||
("pom.xml", {"type": "java", "build": "mvn compile", "test": "mvn test", "check": "mvn verify"}),
|
||||
("Makefile", {"type": "make", "build": "make build", "test": "make test", "check": "make lint"}),
|
||||
]
|
||||
|
||||
|
||||
def detect_project(directory: str) -> dict[str, str]:
|
||||
"""Detect project type and return build/test commands."""
|
||||
for marker, info in _HEURISTICS:
|
||||
if os.path.exists(os.path.join(directory, marker)):
|
||||
return info
|
||||
return {"type": "unknown", "build": "", "test": "", "check": ""}
|
||||
|
||||
|
||||
@tool
|
||||
def get_project_info(directory: str = ".") -> str:
|
||||
"""Detect the project type and show structure overview.
|
||||
|
||||
Args:
|
||||
directory: Project root directory.
|
||||
"""
|
||||
directory = os.path.expanduser(directory)
|
||||
info = detect_project(directory)
|
||||
result = f"Project: {os.path.abspath(directory)}\n"
|
||||
result += f"Type: {info['type']}\n"
|
||||
result += f"Build: {info['build'] or '(none)'}\n"
|
||||
result += f"Test: {info['test'] or '(none)'}\n"
|
||||
result += f"Check: {info['check'] or '(none)'}\n"
|
||||
return result
|
||||
|
||||
|
||||
def _run_project_command(directory: str, command_key: str) -> str:
|
||||
"""Run a detected project command (build/test/check)."""
|
||||
directory = os.path.expanduser(directory)
|
||||
info = detect_project(directory)
|
||||
cmd = info.get(command_key, "")
|
||||
|
||||
if not cmd:
|
||||
return f"No {command_key} command detected for this project."
|
||||
|
||||
try:
|
||||
result = subprocess.run(
|
||||
cmd,
|
||||
shell=True,
|
||||
capture_output=True,
|
||||
text=True,
|
||||
cwd=directory,
|
||||
timeout=300,
|
||||
)
|
||||
output = result.stdout + result.stderr
|
||||
status = "SUCCESS" if result.returncode == 0 else f"FAILED (exit {result.returncode})"
|
||||
return f"Running: {cmd}\n\n{output}\n\n{command_key.upper()}: {status}"
|
||||
except subprocess.TimeoutExpired:
|
||||
return f"{command_key.upper()}: TIMEOUT after 300s"
|
||||
except Exception as e:
|
||||
return f"{command_key.upper()}: ERROR — {e}"
|
||||
|
||||
|
||||
@tool
|
||||
def run_build(directory: str = ".") -> str:
|
||||
"""Run the project's build command.
|
||||
|
||||
Args:
|
||||
directory: Project root directory.
|
||||
"""
|
||||
return _run_project_command(directory, "build")
|
||||
|
||||
|
||||
@tool
|
||||
def run_tests(directory: str = ".") -> str:
|
||||
"""Run the project's test suite.
|
||||
|
||||
Args:
|
||||
directory: Project root directory.
|
||||
"""
|
||||
return _run_project_command(directory, "test")
|
||||
|
||||
|
||||
@tool
|
||||
def verify_build(directory: str = ".") -> str:
|
||||
"""Run the project's check/lint command to verify correctness.
|
||||
|
||||
Args:
|
||||
directory: Project root directory.
|
||||
"""
|
||||
return _run_project_command(directory, "check")
|
||||
|
||||
|
||||
@tool
|
||||
def execute_command(command: str, directory: str = ".") -> str:
|
||||
"""Execute a shell command and return its output.
|
||||
|
||||
Args:
|
||||
command: Shell command to execute.
|
||||
directory: Working directory.
|
||||
"""
|
||||
directory = os.path.expanduser(directory)
|
||||
try:
|
||||
result = subprocess.run(
|
||||
command,
|
||||
shell=True,
|
||||
capture_output=True,
|
||||
text=True,
|
||||
cwd=directory,
|
||||
timeout=120,
|
||||
)
|
||||
output = (result.stdout + result.stderr).strip()
|
||||
if result.returncode != 0:
|
||||
return f"Command failed (exit {result.returncode}):\n{output}"
|
||||
return output or "(no output)"
|
||||
except subprocess.TimeoutExpired:
|
||||
return "Command timed out after 120s."
|
||||
except Exception as e:
|
||||
return f"Error: {e}"
|
||||
Reference in New Issue
Block a user