feat: Implemented a built-in task management system to help smaller LLMs complete larger multistep tasks and minimize context drift

2026-02-09 12:49:06 -07:00
parent 8a37a88ffd
commit a935add2a7
13 changed files with 868 additions and 9 deletions
@@ -34,6 +34,7 @@ If you're looking for more example agents, refer to the [built-in agents](../ass
  - [Python-Based Agent Tools](#python-based-agent-tools)
  - [Bash-Based Agent Tools](#bash-based-agent-tools)
 - [5. Conversation Starters](#5-conversation-starters)
+- [6. Todo System & Auto-Continuation](#6-todo-system--auto-continuation)
 - [Built-In Agents](#built-in-agents)
 <!--toc:end-->

@@ -81,6 +82,11 @@ global_tools:                        # Optional list of additional global tools
  - web_search
  - fs
  - python
+# Todo System & Auto-Continuation (see "Todo System & Auto-Continuation" section below)
+auto_continue: false                 # Enable automatic continuation when incomplete todos remain
+max_auto_continues: 10               # Maximum continuation attempts before stopping
+inject_todo_instructions: true       # Inject todo tool instructions into system prompt
+continuation_prompt: null            # Custom prompt for continuations (optional)
 ```

 As mentioned previously: Agents utilize function calling to extend a model's capabilities. However, agents operate in 
@@ -421,6 +427,50 @@ conversation_starters:

 ![Example Conversation Starters](./images/agents/conversation-starters.gif)

+## 6. Todo System & Auto-Continuation
+
+Loki includes a built-in task tracking system designed to improve the reliability of agents, especially when using
+smaller language models. The Todo System helps models:
+
+- Break complex tasks into manageable steps
+- Track progress through multi-step workflows
+- Automatically continue work until all tasks are complete
+
+### Quick Configuration
+
+```yaml
+# agents/my-agent/config.yaml
+auto_continue: true              # Enable auto-continuation
+max_auto_continues: 10           # Max continuation attempts
+inject_todo_instructions: true   # Include the default todo instructions into prompt
+```
+
+### How It Works
+
+1. When `inject_todo_instructions` is enabled, agents receive instructions on using four built-in tools:
+    - `todo__init`: Initialize a todo list with a goal
+    - `todo__add`: Add a task to the list
+    - `todo__done`: Mark a task complete
+    - `todo__list`: View current todo state
+   
+   These instructions are a reasonable default that detail how to use Loki's To-Do System. If you wish, 
+   you can disable the injection of the default instructions and specify your own instructions for how 
+   to use the To-Do System into your main `instructions` for the agent.
+
+2. When `auto_continue` is enabled and the model stops with incomplete tasks, Loki automatically sends a
+   continuation prompt with the current todo state, nudging the model to continue working.
+
+3. This continues until all tasks are done or `max_auto_continues` is reached.
+
+### When to Use
+
+- Multistep tasks where the model might lose track
+- Smaller models that need more structure
+- Workflows requiring guaranteed completion of all steps
+
+For complete documentation including all configuration options, tool details, and best practices, see the
+[Todo System Guide](./TODO-SYSTEM.md).
+
 ## Built-In Agents
 Loki comes packaged with some useful built-in agents:
 * `coder`: An agent to assist you with all your coding tasks
@@ -0,0 +1,234 @@
+# Todo System
+
+Loki's Todo System is a built-in task tracking feature designed to improve the reliability and effectiveness of LLM agents,
+especially smaller models. It provides structured task management that helps models:
+
+- Break complex tasks into manageable steps
+- Track progress through multistep workflows
+- Automatically continue work until all tasks are complete
+- Avoid forgetting steps or losing context
+
+![Todo System Example](./images/agents/todo-system.png)
+
+## Quick Links
+<!--toc:start-->
+- [Why Use the Todo System?](#why-use-the-todo-system)
+- [How It Works](#how-it-works)
+- [Configuration Options](#configuration-options)
+- [Available Tools](#available-tools)
+- [Auto-Continuation](#auto-continuation)
+- [Best Practices](#best-practices)
+- [Example Workflow](#example-workflow)
+- [Troubleshooting](#troubleshooting)
+<!--toc:end-->
+
+## Why Use the Todo System?
+Smaller language models often struggle with:
+- **Context drift**: Forgetting earlier steps in a multi-step task
+- **Incomplete execution**: Stopping before all work is done
+- **Lack of structure**: Jumping between tasks without clear organization
+
+The Loki Todo System addresses these issues by giving the model explicit tools to plan, track, and verify task completion.
+The system automatically prompts the model to continue when incomplete tasks remain, ensuring work gets finished.
+
+## How It Works
+1. **Planning Phase**: The model initializes a todo list with a goal and adds individual tasks
+2. **Execution Phase**: The model works through tasks, marking each done immediately after completion
+3. **Continuation Phase**: If incomplete tasks remain, the system automatically prompts the model to continue
+4. **Completion**: When all tasks are marked done, the workflow ends naturally
+
+The todo state is preserved across the conversation (and any compressions), and injected into continuation prompts,
+keeping the model focused on remaining work.
+
+## Configuration Options
+The Todo System is configured per-agent in `<loki-config-dir>/agents/<agent-name>/config.yaml`:
+
+| Setting                    | Type    | Default     | Description                                                                     |
+|----------------------------|---------|-------------|---------------------------------------------------------------------------------|
+| `auto_continue`            | boolean | `false`     | Enable the To-Do system for automatic continuation when incomplete todos remain |
+| `max_auto_continues`       | integer | `10`        | Maximum number of automatic continuations before stopping                       |
+| `inject_todo_instructions` | boolean | `true`      | Inject the default todo tool usage instructions into the agent's system prompt  |
+| `continuation_prompt`      | string  | (see below) | Custom prompt used when auto-continuing                                         |
+
+### Example Configuration
+```yaml
+# agents/my-agent/config.yaml
+model: openai:gpt-4o
+auto_continue: true              # Enable auto-continuation
+max_auto_continues: 15           # Allow up to 15 automatic continuations
+inject_todo_instructions: true   # Include todo instructions in system prompt
+continuation_prompt: |           # Optional: customize the continuation prompt
+  [CONTINUE]
+  You have unfinished tasks. Proceed with the next pending item.
+  Do not explain—just execute.
+```
+
+### Default Continuation Prompt
+If `continuation_prompt` is not specified, the following default is used:
+
+```
+[SYSTEM REMINDER - TODO CONTINUATION]
+You have incomplete tasks in your todo list. Continue with the next pending item.
+Call tools immediately. Do not explain what you will do.
+```
+
+## Available Tools
+When `inject_todo_instructions` is enabled (the default), agents have access to four built-in todo management tools:
+
+### `todo__init`
+Initialize a new todo list with a goal. Clears any existing todos.
+
+**Parameters:**
+- `goal` (string, required): The overall goal to achieve when all todos are completed
+
+**Example:**
+```json
+{"goal": "Refactor the authentication module"}
+```
+
+### `todo__add`
+Add a new todo item to the list.
+
+**Parameters:**
+- `task` (string, required): Description of the todo task
+
+**Example:**
+```json
+{"task": "Extract password validation into separate function"}
+```
+
+**Returns:** The assigned task ID
+
+### `todo__done`
+Mark a todo item as done by its ID.
+
+**Parameters:**
+- `id` (integer, required): The ID of the todo item to mark as done
+
+**Example:**
+```json
+{"id": 1}
+```
+
+### `todo__list`
+Display the current todo list with status of each item.
+
+**Parameters:** None
+
+**Returns:** The full todo list with goal, progress, and item statuses
+
+## Auto-Continuation
+When `auto_continue` is enabled, Loki automatically sends a continuation prompt if:
+
+1. The agent's response completes (model stops generating)
+2. There are incomplete tasks in the todo list
+3. The continuation count hasn't exceeded `max_auto_continues`
+4. The response isn't identical to the previous continuation (prevents loops)
+
+### What Gets Injected
+Each continuation prompt includes:
+- The continuation prompt text (default or custom)
+- The current todo list state showing:
+  - The goal
+  - Progress (e.g., "3/5 completed")
+  - Each task with status (✓ done, ○ pending)
+
+**Example continuation context:**
+```
+[SYSTEM REMINDER - TODO CONTINUATION]
+You have incomplete tasks in your todo list. Continue with the next pending item.
+Call tools immediately. Do not explain what you will do.
+
+Goal: Refactor the authentication module
+Progress: 2/4 completed
+  ✓ 1. Extract password validation into separate function
+  ✓ 2. Add unit tests for password validation
+  ○ 3. Update login handler to use new validation
+  ○ 4. Update registration handler to use new validation
+```
+
+### Visual Feedback
+During auto-continuation, you'll see a message in your terminal:
+```
+📋 Auto-continuing (3/10): 2 incomplete todo(s) remain
+```
+
+## Best Practices
+
+### For Agent Developers
+1. **Enable for complex workflows**: Use `auto_continue: true` for agents that handle multistep tasks
+2. **Set reasonable limits**: Adjust `max_auto_continues` based on typical task complexity
+3. **Customize the prompt**: If your agent needs specific continuation behavior, set a custom `continuation_prompt`
+
+### For Model Behavior
+The injected instructions tell models to:
+
+1. **Always create a todo list before starting work**: This ensures planning happens upfront
+2. **Mark each task done immediately after finishing**: Don't batch completions
+3. **Add all planned steps before starting**: Complete planning before execution
+
+### When to Use vs. Skip
+**Use the Todo System when:**
+- Tasks have 3+ distinct steps
+- The model might lose track of progress
+- You want guaranteed completion of all steps
+- Working with smaller/less capable models
+
+**Skip the Todo System when:**
+- Single-step, simple tasks
+- Interactive Q&A sessions
+- The overhead of task tracking isn't worth it
+
+## Example Workflow
+Here's how a typical todo-driven workflow looks:
+
+**User prompt:** "Add input validation to the user registration form"
+
+**Model actions:**
+```
+1. todo__init(goal="Add input validation to user registration form")
+2. todo__add(task="Analyze current registration form fields")
+3. todo__add(task="Create validation rules for email field")
+4. todo__add(task="Create validation rules for password field")
+5. todo__add(task="Implement client-side validation")
+6. todo__add(task="Add server-side validation")
+7. todo__add(task="Write tests for validation logic")
+```
+
+**Model executes first task, then:**
+```
+8. todo__done(id=1)
+9. [Proceeds with task 2...]
+10. todo__done(id=2)
+...
+```
+
+**If model stops with incomplete tasks:**
+- System automatically sends continuation prompt
+- Model sees remaining tasks and continues
+- Repeats until all tasks are done or max continuations reached
+
+## Troubleshooting
+
+### Model Not Using Todo Tools
+- Verify `inject_todo_instructions: true` in your agent config
+- Check that the agent is properly loaded (not just a role)
+- Some models may need explicit prompting to use the tools
+
+### Too Many Continuations
+- Lower `max_auto_continues` to a reasonable limit
+- Check if the model is creating new tasks without completing old ones
+- Ensure tasks are appropriately scoped (not too granular)
+
+### Continuation Loop
+The system detects when a model's response is identical to its previous continuation response and stops
+automatically. If you're seeing loops:
+- The model may be stuck; check if a task is impossible to complete
+- Consider adjusting the `continuation_prompt` to be more directive
+
+---
+
+## Additional Docs
+- [Agents](./AGENTS.md) — Full agent configuration guide
+- [Function Calling](./function-calling/TOOLS.md) — How tools work in Loki
+- [Sessions](./SESSIONS.md) — How conversation state is managed
@@ -16,6 +16,10 @@ loki --info | grep functions_dir | awk '{print $2}'
  - [Enabling/Disabling Global Tools](#enablingdisabling-global-tools)
  - [Role Configuration](#role-configuration)
  - [Agent Configuration](#agent-configuration)
+- [Tool Error Handling](#tool-error-handling)
+  - [Native/Shell Tool Errors](#nativeshell-tool-errors)
+  - [MCP Errors](#mcp-tool-errors)
+  - [Why Tool Error Handling Is Important](#why-this-matters)
 <!--toc:end-->

 ---
@@ -137,3 +141,47 @@ The values for `mapping_tools` are inherited from the [global configuration](#gl
 For more information about agents, refer to the [Agents](../AGENTS.md) documentation.

 For a full example configuration for an agent, see the [Agent Configuration Example](../../config.agent.example.yaml) file.
+
+---
+
+## Tool Error Handling
+When tools fail, Loki captures error information and passes it back to the model so it can diagnose issues and 
+potentially retry or adjust its approach.
+
+### Native/Shell Tool Errors
+When a shell-based tool exits with a non-zero exit code, the model receives:
+
+```json
+{
+  "tool_call_error": "Tool call 'my_tool' exited with code 1",
+  "stderr": "Error: file not found: config.json"
+}
+```
+
+The `stderr` field contains the actual error output from the tool, giving the model context about what went wrong.
+If the tool produces no stderr output, only the `tool_call_error` field is included.
+
+**Note:** Tool stdout streams to your terminal in real-time so you can see progress. Only stderr is captured for 
+error reporting.
+
+### MCP Tool Errors
+When an MCP (Model Context Protocol) tool invocation fails due to connection issues, timeouts, or server errors,
+the model receives:
+
+```json
+{
+  "tool_call_error": "MCP tool invocation failed: connection refused"
+}
+```
+
+This allows the model to understand that an external service failed and take appropriate action (retry, use an 
+alternative approach, or inform the user).
+
+### Why This Matters
+Without proper error propagation, models would only know that "something went wrong" without understanding *what*
+went wrong. By including stderr output and detailed error messages, models can:
+
+- Diagnose the root cause of failures
+- Suggest fixes (e.g., "the file doesn't exist, should I create it?")
+- Retry with corrected parameters
+- Fall back to alternative approaches when appropriate