feat: Implemented a built-in task management system to help smaller LLMs complete larger multistep tasks and minimize context drift

2026-02-09 12:49:06 -07:00
parent 8a37a88ffd
commit a935add2a7
13 changed files with 868 additions and 9 deletions
@@ -36,6 +36,7 @@ Coming from [AIChat](https://github.com/sigoden/aichat)? Follow the [migration g
 * [Sessions](/docs/SESSIONS.md): Manage and persist conversational contexts and settings across multiple interactions.
 * [Roles](./docs/ROLES.md): Customize model behavior for specific tasks or domains.
 * [Agents](/docs/AGENTS.md): Leverage AI agents to perform complex tasks and workflows.
+    * [Todo System](./docs/TODO-SYSTEM.md): Built-in task tracking for improved agent reliability with smaller models.
 * [Environment Variables](./docs/ENVIRONMENT-VARIABLES.md): Override and customize your Loki configuration at runtime with environment variables.
 * [Client Configurations](./docs/clients/CLIENTS.md): Configuration instructions for various LLM providers.
    * [Patching API Requests](./docs/clients/PATCHES.md): Learn how to patch API requests for advanced customization.
@@ -17,6 +17,13 @@ agent_session: null              # Set a session to use when starting the agent.
 name: <agent-name>               # Name of the agent, used in the UI and logs
 description: <description>       # Description of the agent, used in the UI
 version: 1                       # Version of the agent
+# Todo System & Auto-Continuation
+# These settings help smaller models handle multi-step tasks more reliably.
+# See docs/TODO-SYSTEM.md for detailed documentation.
+auto_continue: false             # Enable automatic continuation when incomplete todos remain
+max_auto_continues: 10           # Maximum number of automatic continuations before stopping
+inject_todo_instructions: true   # Inject the default todo tool usage instructions into the agent's system prompt
+continuation_prompt: null        # Custom prompt used when auto-continuing (optional; uses default if null)
 mcp_servers:                     # Optional list of MCP servers that the agent utilizes
  - github                       # Corresponds to the name of an MCP server in the `<loki-config-dir>/functions/mcp.json` file
 global_tools:                    # Optional list of additional global tools to enable for the agent; i.e. not tools specific to the agent
@@ -34,6 +34,7 @@ If you're looking for more example agents, refer to the [built-in agents](../ass
  - [Python-Based Agent Tools](#python-based-agent-tools)
  - [Bash-Based Agent Tools](#bash-based-agent-tools)
 - [5. Conversation Starters](#5-conversation-starters)
+- [6. Todo System & Auto-Continuation](#6-todo-system--auto-continuation)
 - [Built-In Agents](#built-in-agents)
 <!--toc:end-->

@@ -81,6 +82,11 @@ global_tools:                        # Optional list of additional global tools
  - web_search
  - fs
  - python
+# Todo System & Auto-Continuation (see "Todo System & Auto-Continuation" section below)
+auto_continue: false                 # Enable automatic continuation when incomplete todos remain
+max_auto_continues: 10               # Maximum continuation attempts before stopping
+inject_todo_instructions: true       # Inject todo tool instructions into system prompt
+continuation_prompt: null            # Custom prompt for continuations (optional)
 ```

 As mentioned previously: Agents utilize function calling to extend a model's capabilities. However, agents operate in 
@@ -421,6 +427,50 @@ conversation_starters:

 ![Example Conversation Starters](./images/agents/conversation-starters.gif)

+## 6. Todo System & Auto-Continuation
+
+Loki includes a built-in task tracking system designed to improve the reliability of agents, especially when using
+smaller language models. The Todo System helps models:
+
+- Break complex tasks into manageable steps
+- Track progress through multi-step workflows
+- Automatically continue work until all tasks are complete
+
+### Quick Configuration
+
+```yaml
+# agents/my-agent/config.yaml
+auto_continue: true              # Enable auto-continuation
+max_auto_continues: 10           # Max continuation attempts
+inject_todo_instructions: true   # Include the default todo instructions into prompt
+```
+
+### How It Works
+
+1. When `inject_todo_instructions` is enabled, agents receive instructions on using four built-in tools:
+    - `todo__init`: Initialize a todo list with a goal
+    - `todo__add`: Add a task to the list
+    - `todo__done`: Mark a task complete
+    - `todo__list`: View current todo state
+   
+   These instructions are a reasonable default that detail how to use Loki's To-Do System. If you wish, 
+   you can disable the injection of the default instructions and specify your own instructions for how 
+   to use the To-Do System into your main `instructions` for the agent.
+
+2. When `auto_continue` is enabled and the model stops with incomplete tasks, Loki automatically sends a
+   continuation prompt with the current todo state, nudging the model to continue working.
+
+3. This continues until all tasks are done or `max_auto_continues` is reached.
+
+### When to Use
+
+- Multistep tasks where the model might lose track
+- Smaller models that need more structure
+- Workflows requiring guaranteed completion of all steps
+
+For complete documentation including all configuration options, tool details, and best practices, see the
+[Todo System Guide](./TODO-SYSTEM.md).
+
 ## Built-In Agents
 Loki comes packaged with some useful built-in agents:
 * `coder`: An agent to assist you with all your coding tasks
@@ -0,0 +1,234 @@
+# Todo System
+
+Loki's Todo System is a built-in task tracking feature designed to improve the reliability and effectiveness of LLM agents,
+especially smaller models. It provides structured task management that helps models:
+
+- Break complex tasks into manageable steps
+- Track progress through multistep workflows
+- Automatically continue work until all tasks are complete
+- Avoid forgetting steps or losing context
+
+![Todo System Example](./images/agents/todo-system.png)
+
+## Quick Links
+<!--toc:start-->
+- [Why Use the Todo System?](#why-use-the-todo-system)
+- [How It Works](#how-it-works)
+- [Configuration Options](#configuration-options)
+- [Available Tools](#available-tools)
+- [Auto-Continuation](#auto-continuation)
+- [Best Practices](#best-practices)
+- [Example Workflow](#example-workflow)
+- [Troubleshooting](#troubleshooting)
+<!--toc:end-->
+
+## Why Use the Todo System?
+Smaller language models often struggle with:
+- **Context drift**: Forgetting earlier steps in a multi-step task
+- **Incomplete execution**: Stopping before all work is done
+- **Lack of structure**: Jumping between tasks without clear organization
+
+The Loki Todo System addresses these issues by giving the model explicit tools to plan, track, and verify task completion.
+The system automatically prompts the model to continue when incomplete tasks remain, ensuring work gets finished.
+
+## How It Works
+1. **Planning Phase**: The model initializes a todo list with a goal and adds individual tasks
+2. **Execution Phase**: The model works through tasks, marking each done immediately after completion
+3. **Continuation Phase**: If incomplete tasks remain, the system automatically prompts the model to continue
+4. **Completion**: When all tasks are marked done, the workflow ends naturally
+
+The todo state is preserved across the conversation (and any compressions), and injected into continuation prompts,
+keeping the model focused on remaining work.
+
+## Configuration Options
+The Todo System is configured per-agent in `<loki-config-dir>/agents/<agent-name>/config.yaml`:
+
+| Setting                    | Type    | Default     | Description                                                                     |
+|----------------------------|---------|-------------|---------------------------------------------------------------------------------|
+| `auto_continue`            | boolean | `false`     | Enable the To-Do system for automatic continuation when incomplete todos remain |
+| `max_auto_continues`       | integer | `10`        | Maximum number of automatic continuations before stopping                       |
+| `inject_todo_instructions` | boolean | `true`      | Inject the default todo tool usage instructions into the agent's system prompt  |
+| `continuation_prompt`      | string  | (see below) | Custom prompt used when auto-continuing                                         |
+
+### Example Configuration
+```yaml
+# agents/my-agent/config.yaml
+model: openai:gpt-4o
+auto_continue: true              # Enable auto-continuation
+max_auto_continues: 15           # Allow up to 15 automatic continuations
+inject_todo_instructions: true   # Include todo instructions in system prompt
+continuation_prompt: |           # Optional: customize the continuation prompt
+  [CONTINUE]
+  You have unfinished tasks. Proceed with the next pending item.
+  Do not explain—just execute.
+```
+
+### Default Continuation Prompt
+If `continuation_prompt` is not specified, the following default is used:
+
+```
+[SYSTEM REMINDER - TODO CONTINUATION]
+You have incomplete tasks in your todo list. Continue with the next pending item.
+Call tools immediately. Do not explain what you will do.
+```
+
+## Available Tools
+When `inject_todo_instructions` is enabled (the default), agents have access to four built-in todo management tools:
+
+### `todo__init`
+Initialize a new todo list with a goal. Clears any existing todos.
+
+**Parameters:**
+- `goal` (string, required): The overall goal to achieve when all todos are completed
+
+**Example:**
+```json
+{"goal": "Refactor the authentication module"}
+```
+
+### `todo__add`
+Add a new todo item to the list.
+
+**Parameters:**
+- `task` (string, required): Description of the todo task
+
+**Example:**
+```json
+{"task": "Extract password validation into separate function"}
+```
+
+**Returns:** The assigned task ID
+
+### `todo__done`
+Mark a todo item as done by its ID.
+
+**Parameters:**
+- `id` (integer, required): The ID of the todo item to mark as done
+
+**Example:**
+```json
+{"id": 1}
+```
+
+### `todo__list`
+Display the current todo list with status of each item.
+
+**Parameters:** None
+
+**Returns:** The full todo list with goal, progress, and item statuses
+
+## Auto-Continuation
+When `auto_continue` is enabled, Loki automatically sends a continuation prompt if:
+
+1. The agent's response completes (model stops generating)
+2. There are incomplete tasks in the todo list
+3. The continuation count hasn't exceeded `max_auto_continues`
+4. The response isn't identical to the previous continuation (prevents loops)
+
+### What Gets Injected
+Each continuation prompt includes:
+- The continuation prompt text (default or custom)
+- The current todo list state showing:
+  - The goal
+  - Progress (e.g., "3/5 completed")
+  - Each task with status (✓ done, ○ pending)
+
+**Example continuation context:**
+```
+[SYSTEM REMINDER - TODO CONTINUATION]
+You have incomplete tasks in your todo list. Continue with the next pending item.
+Call tools immediately. Do not explain what you will do.
+
+Goal: Refactor the authentication module
+Progress: 2/4 completed
+  ✓ 1. Extract password validation into separate function
+  ✓ 2. Add unit tests for password validation
+  ○ 3. Update login handler to use new validation
+  ○ 4. Update registration handler to use new validation
+```
+
+### Visual Feedback
+During auto-continuation, you'll see a message in your terminal:
+```
+📋 Auto-continuing (3/10): 2 incomplete todo(s) remain
+```
+
+## Best Practices
+
+### For Agent Developers
+1. **Enable for complex workflows**: Use `auto_continue: true` for agents that handle multistep tasks
+2. **Set reasonable limits**: Adjust `max_auto_continues` based on typical task complexity
+3. **Customize the prompt**: If your agent needs specific continuation behavior, set a custom `continuation_prompt`
+
+### For Model Behavior
+The injected instructions tell models to:
+
+1. **Always create a todo list before starting work**: This ensures planning happens upfront
+2. **Mark each task done immediately after finishing**: Don't batch completions
+3. **Add all planned steps before starting**: Complete planning before execution
+
+### When to Use vs. Skip
+**Use the Todo System when:**
+- Tasks have 3+ distinct steps
+- The model might lose track of progress
+- You want guaranteed completion of all steps
+- Working with smaller/less capable models
+
+**Skip the Todo System when:**
+- Single-step, simple tasks
+- Interactive Q&A sessions
+- The overhead of task tracking isn't worth it
+
+## Example Workflow
+Here's how a typical todo-driven workflow looks:
+
+**User prompt:** "Add input validation to the user registration form"
+
+**Model actions:**
+```
+1. todo__init(goal="Add input validation to user registration form")
+2. todo__add(task="Analyze current registration form fields")
+3. todo__add(task="Create validation rules for email field")
+4. todo__add(task="Create validation rules for password field")
+5. todo__add(task="Implement client-side validation")
+6. todo__add(task="Add server-side validation")
+7. todo__add(task="Write tests for validation logic")
+```
+
+**Model executes first task, then:**
+```
+8. todo__done(id=1)
+9. [Proceeds with task 2...]
+10. todo__done(id=2)
+...
+```
+
+**If model stops with incomplete tasks:**
+- System automatically sends continuation prompt
+- Model sees remaining tasks and continues
+- Repeats until all tasks are done or max continuations reached
+
+## Troubleshooting
+
+### Model Not Using Todo Tools
+- Verify `inject_todo_instructions: true` in your agent config
+- Check that the agent is properly loaded (not just a role)
+- Some models may need explicit prompting to use the tools
+
+### Too Many Continuations
+- Lower `max_auto_continues` to a reasonable limit
+- Check if the model is creating new tasks without completing old ones
+- Ensure tasks are appropriately scoped (not too granular)
+
+### Continuation Loop
+The system detects when a model's response is identical to its previous continuation response and stops
+automatically. If you're seeing loops:
+- The model may be stuck; check if a task is impossible to complete
+- Consider adjusting the `continuation_prompt` to be more directive
+
+---
+
+## Additional Docs
+- [Agents](./AGENTS.md) — Full agent configuration guide
+- [Function Calling](./function-calling/TOOLS.md) — How tools work in Loki
+- [Sessions](./SESSIONS.md) — How conversation state is managed
@@ -16,6 +16,10 @@ loki --info | grep functions_dir | awk '{print $2}'
  - [Enabling/Disabling Global Tools](#enablingdisabling-global-tools)
  - [Role Configuration](#role-configuration)
  - [Agent Configuration](#agent-configuration)
+- [Tool Error Handling](#tool-error-handling)
+  - [Native/Shell Tool Errors](#nativeshell-tool-errors)
+  - [MCP Errors](#mcp-tool-errors)
+  - [Why Tool Error Handling Is Important](#why-this-matters)
 <!--toc:end-->

 ---
@@ -137,3 +141,47 @@ The values for `mapping_tools` are inherited from the [global configuration](#gl
 For more information about agents, refer to the [Agents](../AGENTS.md) documentation.

 For a full example configuration for an agent, see the [Agent Configuration Example](../../config.agent.example.yaml) file.
+
+---
+
+## Tool Error Handling
+When tools fail, Loki captures error information and passes it back to the model so it can diagnose issues and 
+potentially retry or adjust its approach.
+
+### Native/Shell Tool Errors
+When a shell-based tool exits with a non-zero exit code, the model receives:
+
+```json
+{
+  "tool_call_error": "Tool call 'my_tool' exited with code 1",
+  "stderr": "Error: file not found: config.json"
+}
+```
+
+The `stderr` field contains the actual error output from the tool, giving the model context about what went wrong.
+If the tool produces no stderr output, only the `tool_call_error` field is included.
+
+**Note:** Tool stdout streams to your terminal in real-time so you can see progress. Only stderr is captured for 
+error reporting.
+
+### MCP Tool Errors
+When an MCP (Model Context Protocol) tool invocation fails due to connection issues, timeouts, or server errors,
+the model receives:
+
+```json
+{
+  "tool_call_error": "MCP tool invocation failed: connection refused"
+}
+```
+
+This allows the model to understand that an external service failed and take appropriate action (retry, use an 
+alternative approach, or inform the user).
+
+### Why This Matters
+Without proper error propagation, models would only know that "something went wrong" without understanding *what*
+went wrong. By including stderr output and detailed error messages, models can:
+
+- Diagnose the root cause of failures
+- Suggest fixes (e.g., "the file doesn't exist, should I create it?")
+- Retry with corrected parameters
+- Fall back to alternative approaches when appropriate
@@ -1,3 +1,4 @@
+use super::todo::TodoList;
 use super::*;

 use crate::{
@@ -14,6 +15,18 @@ use serde::{Deserialize, Serialize};
 use std::{ffi::OsStr, path::Path};

 const DEFAULT_AGENT_NAME: &str = "rag";
+const DEFAULT_TODO_INSTRUCTIONS: &str = "\
+\n## Task Tracking\n\
+You have built-in task tracking tools. Use them to track your progress:\n\
+- `todo__init`: Initialize a todo list with a goal. Call this at the start of every multi-step task.\n\
+- `todo__add`: Add individual tasks. Add all planned steps before starting work.\n\
+- `todo__done`: Mark a task done by id. Call this immediately after completing each step.\n\
+- `todo__list`: Show the current todo list.\n\
+\n\
+RULES:\n\
+- Always create a todo list before starting work.\n\
+- Mark each task done as soon as you finish it — do not batch.\n\
+- If you stop with incomplete tasks, the system will automatically prompt you to continue.";

 pub type AgentVariables = IndexMap<String, String>;

@@ -33,6 +46,9 @@ pub struct Agent {
    rag: Option<Arc<Rag>>,
    model: Model,
    vault: GlobalVault,
+    todo_list: TodoList,
+    continuation_count: usize,
+    last_continuation_response: Option<String>,
 }

 impl Agent {
@@ -188,6 +204,10 @@ impl Agent {
            None
        };

+        if agent_config.auto_continue {
+            functions.append_todo_functions();
+        }
+
        Ok(Self {
            name: name.to_string(),
            config: agent_config,
@@ -199,6 +219,9 @@ impl Agent {
            rag,
            model,
            vault: Arc::clone(&config.read().vault),
+            todo_list: TodoList::default(),
+            continuation_count: 0,
+            last_continuation_response: None,
        })
    }

@@ -309,11 +332,16 @@ impl Agent {
    }

    pub fn interpolated_instructions(&self) -> String {
-        let output = self
+        let mut output = self
            .session_dynamic_instructions
            .clone()
            .or_else(|| self.shared_dynamic_instructions.clone())
            .unwrap_or_else(|| self.config.instructions.clone());
+
+        if self.config.auto_continue && self.config.inject_todo_instructions {
+            output.push_str(DEFAULT_TODO_INSTRUCTIONS);
+        }
+
        self.interpolate_text(&output)
    }

@@ -376,6 +404,67 @@ impl Agent {
        self.session_dynamic_instructions = None;
    }

+    pub fn auto_continue_enabled(&self) -> bool {
+        self.config.auto_continue
+    }
+
+    pub fn max_auto_continues(&self) -> usize {
+        self.config.max_auto_continues
+    }
+
+    pub fn continuation_count(&self) -> usize {
+        self.continuation_count
+    }
+
+    pub fn increment_continuation(&mut self) {
+        self.continuation_count += 1;
+    }
+
+    pub fn reset_continuation(&mut self) {
+        self.continuation_count = 0;
+        self.last_continuation_response = None;
+    }
+
+    pub fn is_stale_response(&self, response: &str) -> bool {
+        self.last_continuation_response
+            .as_ref()
+            .is_some_and(|last| last == response)
+    }
+
+    pub fn set_last_continuation_response(&mut self, response: String) {
+        self.last_continuation_response = Some(response);
+    }
+
+    pub fn todo_list(&self) -> &TodoList {
+        &self.todo_list
+    }
+
+    pub fn init_todo_list(&mut self, goal: &str) {
+        self.todo_list = TodoList::new(goal);
+    }
+
+    pub fn add_todo(&mut self, task: &str) -> usize {
+        self.todo_list.add(task)
+    }
+
+    pub fn mark_todo_done(&mut self, id: usize) -> bool {
+        self.todo_list.mark_done(id)
+    }
+
+    pub fn continuation_prompt(&self) -> String {
+        self.config.continuation_prompt.clone().unwrap_or_else(|| {
+            "[SYSTEM REMINDER - TODO CONTINUATION]\n\
+                 You have incomplete tasks in your todo list. \
+                 Continue with the next pending item. \
+                 Call tools immediately. Do not explain what you will do."
+                .to_string()
+        })
+    }
+
+    pub fn compression_threshold(&self) -> Option<usize> {
+        self.config.compression_threshold
+    }
+
    pub fn is_dynamic_instructions(&self) -> bool {
        self.config.dynamic_instructions
    }
@@ -498,6 +587,14 @@ pub struct AgentConfig {
    #[serde(skip_serializing_if = "Option::is_none")]
    pub agent_session: Option<String>,
    #[serde(default)]
+    pub auto_continue: bool,
+    #[serde(default = "default_max_auto_continues")]
+    pub max_auto_continues: usize,
+    #[serde(default = "default_true")]
+    pub inject_todo_instructions: bool,
+    #[serde(skip_serializing_if = "Option::is_none")]
+    pub compression_threshold: Option<usize>,
+    #[serde(default)]
    pub description: String,
    #[serde(default)]
    pub version: String,
@@ -505,6 +602,8 @@ pub struct AgentConfig {
    pub mcp_servers: Vec<String>,
    #[serde(default)]
    pub global_tools: Vec<String>,
+    #[serde(skip_serializing_if = "Option::is_none")]
+    pub continuation_prompt: Option<String>,
    #[serde(default)]
    pub instructions: String,
    #[serde(default)]
@@ -517,6 +616,14 @@ pub struct AgentConfig {
    pub documents: Vec<String>,
 }

+fn default_max_auto_continues() -> usize {
+    10
+}
+
+fn default_true() -> bool {
+    true
+}
+
 impl AgentConfig {
    pub fn load(path: &Path) -> Result<Self> {
        let contents = read_to_string(path)
@@ -3,6 +3,7 @@ mod input;
 mod macros;
 mod role;
 mod session;
+pub(crate) mod todo;

 pub use self::agent::{Agent, AgentVariables, complete_agent_variables, list_agents};
 pub use self::input::Input;
@@ -1573,8 +1574,18 @@ impl Config {
            .summary_context_prompt
            .clone()
            .unwrap_or_else(|| SUMMARY_CONTEXT_PROMPT.into());
+
+        let todo_prefix = config
+            .read()
+            .agent
+            .as_ref()
+            .map(|agent| agent.todo_list())
+            .filter(|todos| !todos.is_empty())
+            .map(|todos| format!("[ACTIVE TODO LIST]\n{}\n\n", todos.render_for_model()))
+            .unwrap_or_default();
+
        if let Some(session) = config.write().session.as_mut() {
-            session.compress(format!("{summary_context_prompt}{summary}"));
+            session.compress(format!("{todo_prefix}{summary_context_prompt}{summary}"));
        }
        config.write().discontinuous_last_message();
        Ok(())
@@ -299,6 +299,9 @@ impl Session {
        self.role_prompt = agent.interpolated_instructions();
        self.agent_variables = agent.variables().clone();
        self.agent_instructions = self.role_prompt.clone();
+        if let Some(threshold) = agent.compression_threshold() {
+            self.set_compression_threshold(Some(threshold));
+        }
    }

    pub fn agent_variables(&self) -> &AgentVariables {
@@ -0,0 +1,165 @@
+use serde::{Deserialize, Serialize};
+
+#[derive(Debug, Clone, PartialEq, Eq, Serialize, Deserialize)]
+#[serde(rename_all = "lowercase")]
+pub enum TodoStatus {
+    Pending,
+    Done,
+}
+
+impl TodoStatus {
+    fn icon(&self) -> &'static str {
+        match self {
+            TodoStatus::Pending => "○",
+            TodoStatus::Done => "✓",
+        }
+    }
+}
+
+#[derive(Debug, Clone, Serialize, Deserialize)]
+pub struct TodoItem {
+    pub id: usize,
+    #[serde(alias = "description")]
+    pub desc: String,
+    pub done: bool,
+}
+
+#[derive(Debug, Clone, Default, Serialize, Deserialize)]
+pub struct TodoList {
+    #[serde(default)]
+    pub goal: String,
+    #[serde(default)]
+    pub todos: Vec<TodoItem>,
+}
+
+impl TodoList {
+    pub fn new(goal: &str) -> Self {
+        Self {
+            goal: goal.to_string(),
+            todos: Vec::new(),
+        }
+    }
+
+    pub fn add(&mut self, task: &str) -> usize {
+        let id = self.todos.iter().map(|t| t.id).max().unwrap_or(0) + 1;
+        self.todos.push(TodoItem {
+            id,
+            desc: task.to_string(),
+            done: false,
+        });
+        id
+    }
+
+    pub fn mark_done(&mut self, id: usize) -> bool {
+        if let Some(item) = self.todos.iter_mut().find(|t| t.id == id) {
+            item.done = true;
+            true
+        } else {
+            false
+        }
+    }
+
+    pub fn has_incomplete(&self) -> bool {
+        self.todos.iter().any(|item| !item.done)
+    }
+
+    pub fn is_empty(&self) -> bool {
+        self.todos.is_empty()
+    }
+
+    pub fn render_for_model(&self) -> String {
+        let mut lines = Vec::new();
+        if !self.goal.is_empty() {
+            lines.push(format!("Goal: {}", self.goal));
+        }
+        lines.push(format!(
+            "Progress: {}/{} completed",
+            self.completed_count(),
+            self.todos.len()
+        ));
+        for item in &self.todos {
+            let status = if item.done {
+                TodoStatus::Done
+            } else {
+                TodoStatus::Pending
+            };
+            lines.push(format!("  {} {}. {}", status.icon(), item.id, item.desc));
+        }
+        lines.join("\n")
+    }
+
+    pub fn incomplete_count(&self) -> usize {
+        self.todos.iter().filter(|item| !item.done).count()
+    }
+
+    pub fn completed_count(&self) -> usize {
+        self.todos.iter().filter(|item| item.done).count()
+    }
+}
+
+#[cfg(test)]
+mod tests {
+    use super::*;
+
+    #[test]
+    fn test_new_and_add() {
+        let mut list = TodoList::new("Map Labs");
+        assert_eq!(list.add("Discover"), 1);
+        assert_eq!(list.add("Map columns"), 2);
+        assert_eq!(list.todos.len(), 2);
+        assert!(list.has_incomplete());
+    }
+
+    #[test]
+    fn test_mark_done() {
+        let mut list = TodoList::new("Test");
+        list.add("Task 1");
+        list.add("Task 2");
+        assert!(list.mark_done(1));
+        assert!(!list.mark_done(99));
+        assert_eq!(list.completed_count(), 1);
+        assert_eq!(list.incomplete_count(), 1);
+    }
+
+    #[test]
+    fn test_empty_list() {
+        let list = TodoList::default();
+        assert!(!list.has_incomplete());
+        assert!(list.is_empty());
+    }
+
+    #[test]
+    fn test_all_done() {
+        let mut list = TodoList::new("Test");
+        list.add("Done task");
+        list.mark_done(1);
+        assert!(!list.has_incomplete());
+    }
+
+    #[test]
+    fn test_render_for_model() {
+        let mut list = TodoList::new("Map Labs");
+        list.add("Discover");
+        list.add("Map");
+        list.mark_done(1);
+        let rendered = list.render_for_model();
+        assert!(rendered.contains("Goal: Map Labs"));
+        assert!(rendered.contains("Progress: 1/2 completed"));
+        assert!(rendered.contains("✓ 1. Discover"));
+        assert!(rendered.contains("○ 2. Map"));
+    }
+
+    #[test]
+    fn test_serialization_roundtrip() {
+        let mut list = TodoList::new("Roundtrip");
+        list.add("Step 1");
+        list.add("Step 2");
+        list.mark_done(1);
+        let json = serde_json::to_string(&list).unwrap();
+        let deserialized: TodoList = serde_json::from_str(&json).unwrap();
+        assert_eq!(deserialized.goal, "Roundtrip");
+        assert_eq!(deserialized.todos.len(), 2);
+        assert!(deserialized.todos[0].done);
+        assert!(!deserialized.todos[1].done);
+    }
+}
@@ -1,3 +1,5 @@
+pub(crate) mod todo;
+
 use crate::{
    config::{Agent, Config, GlobalConfig},
    utils::*,
@@ -26,6 +28,7 @@ use std::{
    process::{Command, Stdio},
 };
 use strum_macros::AsRefStr;
+use todo::TODO_FUNCTION_PREFIX;

 #[derive(Embed)]
 #[folder = "assets/functions/"]
@@ -262,6 +265,10 @@ impl Functions {
        self.declarations.is_empty()
    }

+    pub fn append_todo_functions(&mut self) {
+        self.declarations.extend(todo::todo_function_declarations());
+    }
+
    pub fn clear_mcp_meta_functions(&mut self) {
        self.declarations.retain(|d| {
            !d.name.starts_with(MCP_INVOKE_META_FUNCTION_NAME_PREFIX)
@@ -850,7 +857,7 @@ impl ToolCall {
            _ if cmd_name.starts_with(MCP_SEARCH_META_FUNCTION_NAME_PREFIX) => {
                Self::search_mcp_tools(config, &cmd_name, &json_data).unwrap_or_else(|e| {
                    let error_msg = format!("MCP search failed: {e}");
-                    println!("{}", warning_text(&format!("⚠️ {error_msg} ⚠️")));
+                    eprintln!("{}", warning_text(&format!("⚠️ {error_msg} ⚠️")));
                    json!({"tool_call_error": error_msg})
                })
            }
@@ -859,7 +866,7 @@ impl ToolCall {
                    .await
                    .unwrap_or_else(|e| {
                        let error_msg = format!("MCP describe failed: {e}");
-                        println!("{}", warning_text(&format!("⚠️ {error_msg} ⚠️")));
+                        eprintln!("{}", warning_text(&format!("⚠️ {error_msg} ⚠️")));
                        json!({"tool_call_error": error_msg})
                    })
            }
@@ -868,10 +875,17 @@ impl ToolCall {
                    .await
                    .unwrap_or_else(|e| {
                        let error_msg = format!("MCP tool invocation failed: {e}");
-                        println!("{}", warning_text(&format!("⚠️ {error_msg} ⚠️")));
+                        eprintln!("{}", warning_text(&format!("⚠️ {error_msg} ⚠️")));
                        json!({"tool_call_error": error_msg})
                    })
            }
+            _ if cmd_name.starts_with(TODO_FUNCTION_PREFIX) => {
+                todo::handle_todo_tool(config, &cmd_name, &json_data).unwrap_or_else(|e| {
+                    let error_msg = format!("Todo tool failed: {e}");
+                    eprintln!("{}", warning_text(&format!("⚠️ {error_msg} ⚠️")));
+                    json!({"tool_call_error": error_msg})
+                })
+            }
            _ => match run_llm_function(cmd_name, cmd_args, envs, agent_name) {
                Ok(Some(contents)) => serde_json::from_str(&contents)
                    .ok()
@@ -1052,7 +1066,7 @@ pub fn run_llm_function(
            eprintln!("{stderr}");
        }
        let tool_error_message = format!("Tool call '{command_name}' exited with code {exit_code}");
-        println!("{}", warning_text(&format!("⚠️ {tool_error_message} ⚠️")));
+        eprintln!("{}", warning_text(&format!("⚠️ {tool_error_message} ⚠️")));
        let mut error_json = json!({"tool_call_error": tool_error_message});
        if !stderr.is_empty() {
            error_json["stderr"] = json!(stderr);
@@ -0,0 +1,160 @@
+use super::{FunctionDeclaration, JsonSchema};
+use crate::config::GlobalConfig;
+
+use anyhow::{Result, bail};
+use indexmap::IndexMap;
+use serde_json::{Value, json};
+
+pub const TODO_FUNCTION_PREFIX: &str = "todo__";
+
+pub fn todo_function_declarations() -> Vec<FunctionDeclaration> {
+    vec![
+        FunctionDeclaration {
+            name: format!("{TODO_FUNCTION_PREFIX}init"),
+            description: "Initialize a new todo list with a goal. Clears any existing todos."
+                .to_string(),
+            parameters: JsonSchema {
+                type_value: Some("object".to_string()),
+                properties: Some(IndexMap::from([(
+                    "goal".to_string(),
+                    JsonSchema {
+                        type_value: Some("string".to_string()),
+                        description: Some(
+                            "The overall goal to achieve when all todos are completed".into(),
+                        ),
+                        ..Default::default()
+                    },
+                )])),
+                required: Some(vec!["goal".to_string()]),
+                ..Default::default()
+            },
+            agent: false,
+        },
+        FunctionDeclaration {
+            name: format!("{TODO_FUNCTION_PREFIX}add"),
+            description: "Add a new todo item to the list.".to_string(),
+            parameters: JsonSchema {
+                type_value: Some("object".to_string()),
+                properties: Some(IndexMap::from([(
+                    "task".to_string(),
+                    JsonSchema {
+                        type_value: Some("string".to_string()),
+                        description: Some("Description of the todo task".into()),
+                        ..Default::default()
+                    },
+                )])),
+                required: Some(vec!["task".to_string()]),
+                ..Default::default()
+            },
+            agent: false,
+        },
+        FunctionDeclaration {
+            name: format!("{TODO_FUNCTION_PREFIX}done"),
+            description: "Mark a todo item as done by its id.".to_string(),
+            parameters: JsonSchema {
+                type_value: Some("object".to_string()),
+                properties: Some(IndexMap::from([(
+                    "id".to_string(),
+                    JsonSchema {
+                        type_value: Some("integer".to_string()),
+                        description: Some("The id of the todo item to mark as done".into()),
+                        ..Default::default()
+                    },
+                )])),
+                required: Some(vec!["id".to_string()]),
+                ..Default::default()
+            },
+            agent: false,
+        },
+        FunctionDeclaration {
+            name: format!("{TODO_FUNCTION_PREFIX}list"),
+            description: "Display the current todo list with status of each item.".to_string(),
+            parameters: JsonSchema {
+                type_value: Some("object".to_string()),
+                ..Default::default()
+            },
+            agent: false,
+        },
+    ]
+}
+
+pub fn handle_todo_tool(config: &GlobalConfig, cmd_name: &str, args: &Value) -> Result<Value> {
+    let action = cmd_name
+        .strip_prefix(TODO_FUNCTION_PREFIX)
+        .unwrap_or(cmd_name);
+
+    match action {
+        "init" => {
+            let goal = args.get("goal").and_then(Value::as_str).unwrap_or_default();
+            let mut cfg = config.write();
+            let agent = cfg.agent.as_mut();
+            match agent {
+                Some(agent) => {
+                    agent.init_todo_list(goal);
+                    Ok(json!({"status": "ok", "message": "Initialized new todo list"}))
+                }
+                None => bail!("No active agent"),
+            }
+        }
+        "add" => {
+            let task = args.get("task").and_then(Value::as_str).unwrap_or_default();
+            if task.is_empty() {
+                return Ok(json!({"error": "task description is required"}));
+            }
+            let mut cfg = config.write();
+            let agent = cfg.agent.as_mut();
+            match agent {
+                Some(agent) => {
+                    let id = agent.add_todo(task);
+                    Ok(json!({"status": "ok", "id": id}))
+                }
+                None => bail!("No active agent"),
+            }
+        }
+        "done" => {
+            let id = args
+                .get("id")
+                .and_then(|v| {
+                    v.as_u64()
+                        .or_else(|| v.as_str().and_then(|s| s.parse().ok()))
+                })
+                .map(|v| v as usize);
+            match id {
+                Some(id) => {
+                    let mut cfg = config.write();
+                    let agent = cfg.agent.as_mut();
+                    match agent {
+                        Some(agent) => {
+                            if agent.mark_todo_done(id) {
+                                Ok(
+                                    json!({"status": "ok", "message": format!("Marked todo {id} as done")}),
+                                )
+                            } else {
+                                Ok(json!({"error": format!("Todo {id} not found")}))
+                            }
+                        }
+                        None => bail!("No active agent"),
+                    }
+                }
+                None => Ok(json!({"error": "id is required and must be a number"})),
+            }
+        }
+        "list" => {
+            let cfg = config.read();
+            let agent = cfg.agent.as_ref();
+            match agent {
+                Some(agent) => {
+                    let list = agent.todo_list();
+                    if list.is_empty() {
+                        Ok(json!({"goal": "", "todos": []}))
+                    } else {
+                        Ok(serde_json::to_value(list)
+                            .unwrap_or(json!({"error": "serialization failed"})))
+                    }
+                }
+                None => bail!("No active agent"),
+            }
+        }
+        _ => bail!("Unknown todo action: {action}"),
+    }
+}
@@ -826,6 +826,14 @@ pub async fn run_repl_command(
            _ => unknown_command()?,
        },
        None => {
+            if config
+                .read()
+                .agent
+                .as_ref()
+                .is_some_and(|a| a.continuation_count() > 0)
+            {
+                config.write().agent.as_mut().unwrap().reset_continuation();
+            }
            let input = Input::from_str(config, line, None);
            ask(config, abort_signal.clone(), input, true).await?;
        }
@@ -874,9 +882,60 @@ async fn ask(
        )
        .await
    } else {
-        Config::maybe_autoname_session(config.clone());
-        Config::maybe_compress_session(config.clone());
-        Ok(())
+        let should_continue = {
+            let cfg = config.read();
+            if let Some(agent) = &cfg.agent {
+                agent.auto_continue_enabled()
+                    && agent.continuation_count() < agent.max_auto_continues()
+                    && !agent.is_stale_response(&output)
+                    && agent.todo_list().has_incomplete()
+            } else {
+                false
+            }
+        };
+
+        if should_continue {
+            let full_prompt = {
+                let mut cfg = config.write();
+                let agent = cfg.agent.as_mut().expect("agent checked above");
+                agent.set_last_continuation_response(output.clone());
+                agent.increment_continuation();
+                let count = agent.continuation_count();
+                let max = agent.max_auto_continues();
+
+                let todo_state = agent.todo_list().render_for_model();
+                let remaining = agent.todo_list().incomplete_count();
+                let prompt = agent.continuation_prompt();
+
+                let color = if cfg.light_theme() {
+                    nu_ansi_term::Color::LightGray
+                } else {
+                    nu_ansi_term::Color::DarkGray
+                };
+                eprintln!(
+                    "\n📋 {}",
+                    color.italic().paint(format!(
+                        "Auto-continuing ({count}/{max}): {remaining} incomplete todo(s) remain"
+                    ))
+                );
+
+                format!("{prompt}\n\n{todo_state}")
+            };
+            let continuation_input = Input::from_str(config, &full_prompt, None);
+            ask(config, abort_signal, continuation_input, false).await
+        } else {
+            if config
+                .read()
+                .agent
+                .as_ref()
+                .is_some_and(|a| a.continuation_count() > 0)
+            {
+                config.write().agent.as_mut().unwrap().reset_continuation();
+            }
+            Config::maybe_autoname_session(config.clone());
+            Config::maybe_compress_session(config.clone());
+            Ok(())
+        }
    }
 }