loki/examples/langchain-sisyphus/README.md

# Sisyphus in LangChain/LangGraph

A faithful recreation of [Loki's Sisyphus agent](../../assets/agents/sisyphus/) using [LangGraph](https://docs.langchain.com/langgraph/) — LangChain's framework for stateful, multi-agent workflows.

This project exists to help you understand LangChain/LangGraph by mapping every concept to its Loki equivalent.

## Architecture Overview

```
┌─────────────────────────────────────────────────────────────┐
│                     SUPERVISOR NODE                          │
│  Intent classification → Routing decision → Command(goto=)   │
│                                                              │
│  Loki equivalent: sisyphus/config.yaml                       │
│  (agent__spawn → Command, agent__collect → graph edge)       │
└──────────┬──────────────┬──────────────┬────────────────────┘
           │              │              │
           ▼              ▼              ▼
    ┌────────────┐ ┌────────────┐ ┌────────────┐
    │  EXPLORE   │ │   ORACLE   │ │   CODER    │
    │ (research) │ │  (advise)  │ │  (build)   │
    │            │ │            │ │            │
    │ read-only  │ │ read-only  │ │ read+write │
    │ tools      │ │ tools      │ │ tools      │
    └─────┬──────┘ └─────┬──────┘ └─────┬──────┘
          │              │              │
          └──────────────┼──────────────┘
                         │
                  back to supervisor
```

## Concept Map: Loki → LangGraph

This is the key reference.  Every row maps a Loki concept to its LangGraph equivalent.

### Core Architecture

| Loki Concept | LangGraph Equivalent | Where in Code |
|---|---|---|
| Agent config (config.yaml) | Node function + system prompt | `agents/explore.py`, etc. |
| Agent instructions | System prompt string | `EXPLORE_SYSTEM_PROMPT`, etc. |
| Agent tools (tools.sh) | `@tool`-decorated Python functions | `tools/filesystem.py`, `tools/project.py` |
| Agent session (chat loop) | Graph state + message list | `state.py` → `SisyphusState.messages` |
| `agent__spawn --agent X` | `Command(goto="X")` | `agents/supervisor.py` |
| `agent__collect --id` | Graph edge (implicit — workers return to supervisor) | `graph.py` → `add_edge("explore", "supervisor")` |
| `agent__check` (non-blocking) | Not needed (graph handles scheduling) | — |
| `agent__cancel` | Not needed (graph handles lifecycle) | — |
| `can_spawn_agents: true` | Node has routing logic (supervisor) | `agents/supervisor.py` |
| `max_concurrent_agents: 4` | `Send()` API for parallel fan-out | See [Parallel Execution](#parallel-execution) |
| `max_agent_depth: 3` | `recursion_limit` in config | `cli.py` → `recursion_limit: 50` |
| `summarization_threshold` | Manual truncation in supervisor | `supervisor.py` → `_summarize_outputs()` |

### Tool System

| Loki Concept | LangGraph Equivalent | Notes |
|---|---|---|
| `tools.sh` with `@cmd` annotations | `@tool` decorator | Loki compiles bash annotations to JSON schema; LangChain generates schema from the Python function signature + docstring |
| `@option --pattern!` (required arg) | Function parameter without default | `def search_content(pattern: str)` |
| `@option --lines` (optional arg) | Parameter with default | `def read_file(path: str, limit: int = 200)` |
| `@env LLM_OUTPUT=/dev/stdout` | Return value | LangChain tools return strings; Loki tools write to `$LLM_OUTPUT` |
| `@describe` | Docstring | The tool's docstring becomes the description the LLM sees |
| Global tools (`fs_read.sh`, etc.) | Shared tool imports | Both agents import from `tools/filesystem.py` |
| Agent-specific tools | Per-node tool binding | `llm.bind_tools(EXPLORE_TOOLS)` vs `llm.bind_tools(CODER_TOOLS)` |
| `.shared/utils.sh` | `tools/project.py` | Shared project detection utilities |
| `detect_project()` heuristic | `detect_project()` in Python | Same logic: check Cargo.toml → go.mod → package.json → etc. |
| LLM fallback for unknown projects | (omitted) | The agents themselves can reason about unknown project types |

### State & Memory

| Loki Concept | LangGraph Equivalent | Notes |
|---|---|---|
| Agent session (conversation history) | `SisyphusState.messages` | `Annotated[list, add_messages]` — the reducer appends instead of replacing |
| `agent_session: temp` | `MemorySaver` checkpointer | Loki's temp sessions are ephemeral; MemorySaver is in-memory (lost on restart) |
| Per-agent isolation | Per-node system prompt + tools | In Loki agents have separate sessions; in LangGraph they share messages but have different system prompts |
| `{{project_dir}}` variable | `SisyphusState.project_dir` | Loki interpolates variables into prompts; LangGraph stores them in state |
| `{{__tools__}}` injection | `llm.bind_tools()` | Loki injects tool descriptions into the prompt; LangChain attaches them to the API call |

### Orchestration

| Loki Concept | LangGraph Equivalent | Notes |
|---|---|---|
| Intent classification table | `RoutingDecision` structured output | Loki does this in free text; LangGraph forces typed JSON |
| Oracle triggers ("How should I...") | Supervisor prompt + structured output | Same trigger phrases, enforced via system prompt |
| Coder delegation format | Supervisor builds HumanMessage | The structured prompt (Goal/Reference Files/Conventions/Constraints) |
| `agent__spawn` (parallel) | `Send()` API | Dynamic fan-out to multiple nodes |
| Todo system (`todo__init`, etc.) | `SisyphusState.todos` | State field with a merge reducer |
| `auto_continue: true` | Supervisor loop (iteration counter) | Supervisor re-routes until FINISH or max iterations |
| `max_auto_continues: 25` | `MAX_ITERATIONS = 15` | Safety valve to prevent infinite loops |
| `user__ask` / `user__confirm` | `interrupt()` API | Pauses graph, surfaces question to caller, resumes with answer |
| Escalation (child → parent → user) | `interrupt()` in any node | Any node can pause; the caller handles the interaction |

### Execution Model

| Loki Concept | LangGraph Equivalent | Notes |
|---|---|---|
| `loki --agent sisyphus` | `python -m sisyphus_langchain.cli` | CLI entry point |
| REPL mode | `cli.py` → `repl()` | Interactive loop with thread persistence |
| One-shot mode | `cli.py` → `run_query()` | Single query, print result, exit |
| Streaming output | `graph.stream()` | LangGraph supports per-node streaming |
| `inject_spawn_instructions` | (always on) | System prompts are always included |
| `inject_todo_instructions` | (always on) | Todo instructions could be added to prompts |

## How the Execution Flow Works

### 1. User sends a message

```python
graph.invoke({"messages": [HumanMessage("Add a health check endpoint")]})
```

### 2. Supervisor classifies intent

The supervisor LLM reads the message and produces a `RoutingDecision`:
```json
{
  "intent": "implementation",
  "next_agent": "explore",
  "delegation_notes": "Find existing API endpoint patterns, route structure, and health check conventions"
}
```

### 3. Supervisor routes via Command

```python
return Command(goto="explore", update={"intent": "implementation", "iteration_count": 1})
```

### 4. Explore agent runs

- Receives the full message history (including the user's request)
- Calls read-only tools (search_content, search_files, read_file)
- Returns findings in messages

### 5. Control returns to supervisor

The graph edge `explore → supervisor` fires automatically.

### 6. Supervisor reviews and routes again

Now it has explore's findings.  It routes to coder with context:
```json
{
  "intent": "implementation",
  "next_agent": "coder",
  "delegation_notes": "Implement health check endpoint following patterns found in src/routes/"
}
```

### 7. Coder implements

- Reads explore's findings from the message history
- Writes files via `write_file` tool
- Runs `verify_build` to check compilation

### 8. Supervisor verifies and finishes

```json
{
  "intent": "implementation",
  "next_agent": "FINISH",
  "delegation_notes": "Added /health endpoint in src/routes/health.py. Build passes."
}
```

## Key Differences from Loki

### What LangGraph does better

1. **Declarative graph** — The topology is visible and debuggable.  Loki's orchestration is emergent from the LLM's tool calls.
2. **Typed state** — `SisyphusState` is a TypedDict with reducers.  Loki's state is implicit in the conversation.
3. **Checkpointing** — Built-in persistence.  Loki manages sessions manually.
4. **Time-travel debugging** — Inspect any checkpoint.  Loki has no equivalent.
5. **Structured routing** — `RoutingDecision` forces valid JSON.  Loki relies on the LLM calling the right tool.

### What Loki does better

1. **True parallelism** — `agent__spawn` runs multiple agents concurrently in separate threads.  This LangGraph implementation is sequential (see [Parallel Execution](#parallel-execution) for how to add it).
2. **Agent isolation** — Each Loki agent has its own session, tools, and config.  LangGraph nodes share state.
3. **Teammate messaging** — Loki agents can send messages to siblings.  LangGraph nodes communicate only through shared state.
4. **Dynamic tool compilation** — Loki compiles bash/python/typescript tools at startup.  LangChain tools are statically defined.
5. **Escalation protocol** — Loki's child-to-parent escalation is sophisticated.  LangGraph's `interrupt()` is simpler but less structured.
6. **Task queues with dependencies** — Loki's `agent__task_create` supports dependency DAGs.  LangGraph's routing is simpler (hub-and-spoke).

## Running It

### Prerequisites

```bash
# Python 3.11+
python --version

# Set your API key
export OPENAI_API_KEY="sk-..."
```

### Install

```bash
cd examples/langchain-sisyphus

# With pip
pip install -e .

# Or with uv (recommended)
uv pip install -e .
```

### Usage

```bash
# Interactive REPL (like `loki --agent sisyphus`)
sisyphus

# One-shot query
sisyphus "Find all TODO comments in the codebase"

# With custom models (cost optimization)
sisyphus --explore-model gpt-4o-mini --coder-model gpt-4o "Add input validation to the API"

# Programmatic usage
python -c "
from sisyphus_langchain import build_graph
from langchain_core.messages import HumanMessage

graph = build_graph()
result = graph.invoke({
    'messages': [HumanMessage('What patterns does this codebase use?')],
    'intent': 'ambiguous',
    'next_agent': '',
    'iteration_count': 0,
    'todos': [],
    'agent_outputs': {},
    'final_output': '',
    'project_dir': '.',
}, config={'configurable': {'thread_id': 'demo'}, 'recursion_limit': 50})
print(result['final_output'])
"
```

### Using Anthropic Models

Replace `ChatOpenAI` with `ChatAnthropic` in the agent factories:

```python
from langchain_anthropic import ChatAnthropic

# In agents/oracle.py:
llm = ChatAnthropic(model="claude-sonnet-4-20250514", temperature=0.2).bind_tools(ORACLE_TOOLS)
```

## Deployment

### Option 1: Standalone Script (Simplest)

Just run the CLI directly. No infrastructure needed.

```bash
sisyphus "Add a health check endpoint"
```

### Option 2: FastAPI Server

```python
# server.py
from fastapi import FastAPI
from langserve import add_routes
from sisyphus_langchain import build_graph

app = FastAPI(title="Sisyphus API")
graph = build_graph()
add_routes(app, graph, path="/agent")

# Run: uvicorn server:app --host 0.0.0.0 --port 8000
# Call: POST http://localhost:8000/agent/invoke
```

### Option 3: LangGraph Platform (Production)

Create a `langgraph.json` at the project root:

```json
{
  "graphs": {
    "sisyphus": "./sisyphus_langchain/graph.py:build_graph"
  },
  "dependencies": ["./sisyphus_langchain"],
  "env": ".env"
}
```

Then deploy:
```bash
pip install langgraph-cli
langgraph deploy
```

This gives you:
- Durable checkpointing (PostgreSQL)
- Background runs
- Streaming API
- Zero-downtime deployments
- Built-in observability

### Option 4: Docker

```dockerfile
FROM python:3.12-slim
WORKDIR /app
COPY . .
RUN pip install -e .
CMD ["sisyphus"]
```

```bash
docker build -t sisyphus .
docker run -it -e OPENAI_API_KEY=$OPENAI_API_KEY sisyphus
```

## Parallel Execution

This implementation routes sequentially for simplicity. To add Loki-style parallel agent execution, use LangGraph's `Send()` API:

```python
from langgraph.types import Send

def supervisor_node(state):
    # Fan out to multiple explore agents in parallel
    # (like Loki's agent__spawn called multiple times)
    return [
        Send("explore", {
            **state,
            "messages": state["messages"] + [
                HumanMessage("Find existing API endpoint patterns")
            ],
        }),
        Send("explore", {
            **state,
            "messages": state["messages"] + [
                HumanMessage("Find data models and database patterns")
            ],
        }),
    ]
```

This is equivalent to Loki's pattern of spawning multiple explore agents:
```
agent__spawn --agent explore --prompt "Find API patterns"
agent__spawn --agent explore --prompt "Find database patterns"
agent__collect --id <id1>
agent__collect --id <id2>
```

## Adding Human-in-the-Loop

To replicate Loki's `user__ask` / `user__confirm` tools, use LangGraph's `interrupt()`:

```python
from langgraph.types import interrupt

def supervisor_node(state):
    # Pause and ask the user (like Loki's user__ask)
    answer = interrupt({
        "question": "How should we structure the authentication?",
        "options": [
            "JWT with httpOnly cookies (Recommended)",
            "Session-based with Redis",
            "OAuth2 with external provider",
        ],
    })
    # `answer` contains the user's selection when the graph resumes
```

## Project Structure

```
examples/langchain-sisyphus/
├── pyproject.toml                          # Dependencies & build config
├── README.md                               # This file
└── sisyphus_langchain/
    ├── __init__.py                          # Package entry point
    ├── cli.py                              # CLI (REPL + one-shot mode)
    ├── graph.py                            # Graph assembly (wires nodes + edges)
    ├── state.py                            # Shared state schema (TypedDict)
    ├── agents/
    │   ├── __init__.py
    │   ├── supervisor.py                   # Sisyphus orchestrator (intent → routing)
    │   ├── explore.py                      # Read-only codebase researcher
    │   ├── oracle.py                       # Architecture/debugging advisor
    │   └── coder.py                        # Implementation worker
    └── tools/
        ├── __init__.py
        ├── filesystem.py                   # File read/write/search/glob tools
        └── project.py                      # Project detection, build, test tools
```

### File-to-Loki Mapping

| This Project | Loki Equivalent |
|---|---|
| `state.py` | Session context + todo state (implicit in Loki) |
| `graph.py` | `src/supervisor/mod.rs` (runtime orchestration) |
| `cli.py` | `src/main.rs` (CLI entry point) |
| `agents/supervisor.py` | `assets/agents/sisyphus/config.yaml` |
| `agents/explore.py` | `assets/agents/explore/config.yaml` + `tools.sh` |
| `agents/oracle.py` | `assets/agents/oracle/config.yaml` + `tools.sh` |
| `agents/coder.py` | `assets/agents/coder/config.yaml` + `tools.sh` |
| `tools/filesystem.py` | `assets/functions/tools/fs_*.sh` |
| `tools/project.py` | `assets/agents/.shared/utils.sh` + `sisyphus/tools.sh` |

## Further Reading

- [LangGraph Documentation](https://docs.langchain.com/langgraph/)
- [LangGraph Multi-Agent Tutorial](https://docs.langchain.com/langgraph/how-tos/multi-agent-systems)
- [Loki Agents Documentation](../../docs/AGENTS.md)
- [Loki Sisyphus README](../../assets/agents/sisyphus/README.md)
- [LangGraph Supervisor Library](https://github.com/langchain-ai/langgraph-supervisor-py)