diff --git a/docs/testing/notes/ITERATION-11-NOTES.md b/docs/testing/notes/ITERATION-11-NOTES.md new file mode 100644 index 0000000..0e8b5a3 --- /dev/null +++ b/docs/testing/notes/ITERATION-11-NOTES.md @@ -0,0 +1,103 @@ +# Iteration 11 — Test Implementation Notes + +## Plan file addressed + +`docs/testing/plans/11-sub-agent-spawning.md` + +## Tests created + +### src/supervisor/escalation.rs (11 new tests) + +| Test name | What it verifies | +|---|---| +| `queue_default_has_no_pending` | Default queue empty | +| `submit_and_has_pending` | Submit makes has_pending true | +| `submit_returns_id` | Returns the request's id | +| `take_removes_request` | Take removes and empties queue | +| `take_nonexistent_returns_none` | Missing id → None | +| `pending_summary_contains_fields` | Summary has id, agent_id, question | +| `pending_summary_includes_options_when_present` | Options included | +| `pending_summary_empty_when_no_requests` | Empty queue → empty summary | +| `reply_reaches_receiver` | oneshot channel delivers reply | +| `new_escalation_id_has_prefix` | Starts with "esc_" | +| `new_escalation_id_unique` | Two calls produce different ids | + +### src/supervisor/mailbox.rs (8 new tests) + +| Test name | What it verifies | +|---|---| +| `inbox_new_is_empty` | New inbox drains empty | +| `inbox_default_is_empty` | Default inbox drains empty | +| `deliver_and_drain` | Deliver + drain returns message | +| `drain_empties_inbox` | Second drain returns empty | +| `drain_orders_shutdown_before_task_before_text` | Priority ordering | +| `clone_preserves_messages` | Clone has same messages | +| `clone_is_independent` | Clone doesn't share mutations | +| `multiple_deliveries` | 5 messages all drained | + +### src/supervisor/mod.rs (12 new tests) + +| Test name | What it verifies | +|---|---| +| `supervisor_new_empty` | Initial state: 0 active, correct limits | +| `supervisor_register_increments_count` | Register increases active_count | +| `supervisor_register_rejects_at_capacity` | At max → error with "at capacity" | +| `supervisor_register_rejects_exceeding_depth` | Over max_depth → error | +| `supervisor_register_allows_at_max_depth` | Exactly max_depth → ok | +| `supervisor_take_removes_handle` | Take decrements count | +| `supervisor_take_nonexistent_returns_none` | Missing → None | +| `supervisor_list_agents` | Lists all registered agent ids/names | +| `supervisor_inbox_returns_handle_inbox` | Inbox accessor works | +| `supervisor_task_queue_accessible` | task_queue/task_queue_mut work | +| `agent_exit_status_equality` | Completed == Completed, != Failed | + +### src/supervisor/taskqueue.rs (10 new tests, 16 total) + +| Test name | What it verifies | +|---|---| +| `test_fail_sets_status` | fail() sets TaskStatus::Failed | +| `test_get_returns_none_for_missing` | get() on nonexistent → None | +| `test_dispatch_agent_stored` | dispatch_agent and prompt captured | +| `test_claim_blocked_task_fails` | Can't claim blocked task | +| `test_list_sorted_by_id` | list() returns numeric order | +| `test_default_is_empty` | TaskQueue::default() empty | +| `test_dependency_on_nonexistent_task_errors` | Bad dep → error | +| `test_complete_nonexistent_returns_empty` | Complete unknown → empty | +| `test_task_node_is_runnable` | Pending + unblocked = runnable | +| `test_task_node_not_runnable_when_blocked` | Blocked = not runnable | + +**Total: 40 new tests (382 total in suite)** + +## Bugs discovered + +None. + +## Observations for future iterations + +1. **Supervisor.register enforces both capacity and depth**: These + are the two runaway safeguards. Both tested at boundaries + (at capacity, at max_depth, over max_depth). + +2. **EscalationQueue uses oneshot channels**: The reply_tx/rx pair + enables async blocking-wait semantics for child agents. The + channel delivery is verified end-to-end in the test. + +3. **Inbox drain ordering is a priority system**: Shutdown messages + come first, then task completions, then text. This ensures + lifecycle-critical messages aren't buried under chat. + +4. **AgentHandle requires a tokio JoinHandle**: Creating test + handles requires a tokio runtime. Used `rt.spawn()` with + `mem::forget(rt)` to keep the handle alive. This is a test-only + pattern — not ideal but necessary since JoinHandle can't be + mocked. + +5. **Most spawn/collect/check behaviors require integration tests**: + The actual agent__spawn handler needs a full RequestContext with + agent config on disk. The Supervisor struct itself is fully + testable in isolation. + +## Next iteration + +Plan file 12: RAG — RAG init/load/search, embeddings, document +management. diff --git a/docs/testing/plans/11-sub-agent-spawning.md b/docs/testing/plans/11-sub-agent-spawning.md index 1dcf3a9..82977c8 100644 --- a/docs/testing/plans/11-sub-agent-spawning.md +++ b/docs/testing/plans/11-sub-agent-spawning.md @@ -10,49 +10,68 @@ to request user input through the parent. ## Behaviors to test ### Spawn -- [ ] agent__spawn creates child agent in background -- [ ] Child gets own RequestContext with incremented depth -- [ ] Child gets own session, model, functions -- [ ] Child gets shared root_escalation_queue -- [ ] Child gets inbox for teammate messaging -- [ ] Child MCP servers acquired if configured -- [ ] Max concurrent agents enforced -- [ ] Max depth enforced -- [ ] Agent not found → error -- [ ] can_spawn_agents=false → no spawn tools available +- [ ] agent__spawn creates child agent in background (integration) +- [ ] Child gets own RequestContext with incremented depth (integration) +- [ ] Child gets own session, model, functions (integration) +- [ ] Child gets shared root_escalation_queue (integration) +- [ ] Child gets inbox for teammate messaging (integration) +- [ ] Child MCP servers acquired if configured (integration) +- [x] Max concurrent agents enforced (Supervisor.register) +- [x] Max depth enforced (Supervisor.register) +- [ ] Agent not found → error (integration) +- [ ] can_spawn_agents=false → no spawn tools available (integration) ### Collect/Check -- [ ] agent__check returns PENDING or result -- [ ] agent__collect blocks until done, returns output -- [ ] Output summarization when exceeds threshold -- [ ] Summarization uses configured model +- [ ] agent__check returns PENDING or result (integration) +- [ ] agent__collect blocks until done, returns output (integration) +- [ ] Output summarization when exceeds threshold (integration) +- [ ] Summarization uses configured model (integration) ### Task queue -- [ ] agent__task_create creates tasks with dependencies -- [ ] agent__task_complete marks done, unblocks dependents -- [ ] Auto-dispatch spawns agent for unblocked tasks -- [ ] agent__task_list shows all tasks with status +- [x] agent__task_create creates tasks with dependencies +- [x] agent__task_complete marks done, unblocks dependents +- [x] Auto-dispatch agent/prompt stored on task +- [x] agent__task_list shows all tasks with status ### Escalation -- [ ] Child calls user__ask → escalation created -- [ ] Parent sees pending_escalations notification -- [ ] agent__reply_escalation unblocks child -- [ ] Escalation timeout → fallback message +- [x] Escalation submitted and retrievable +- [x] Pending summary contains correct fields +- [x] Reply reaches receiver via oneshot channel +- [ ] Escalation timeout → fallback message (integration) ### Teammate messaging -- [ ] agent__send_message delivers to sibling inbox -- [ ] agent__check_inbox drains messages +- [x] Deliver to inbox +- [x] Drain empties inbox +- [x] Drain ordering: shutdown > task_completed > text ### Child agent lifecycle -- [ ] run_child_agent loops: create input → call completions → process results -- [ ] Child uses before/after_chat_completion -- [ ] Child tool calls evaluated via eval_tool_calls -- [ ] Child exits cleanly, supervisor cancels on completion +- [ ] run_child_agent loops (integration) +- [ ] Child uses before/after_chat_completion (integration) +- [ ] Child tool calls evaluated (integration) +- [ ] Child exits cleanly (integration) ## Context switching scenarios -- [ ] Parent spawns child with MCP → child MCP works independently -- [ ] Parent exits agent → all children cancelled -- [ ] Multiple children share escalation queue correctly +- [ ] Parent spawns child with MCP (integration) +- [ ] Parent exits agent → all children cancelled (integration) +- [ ] Multiple children share escalation queue correctly (integration) + +## Additional behaviors tested (not in original plan) + +- [x] EscalationQueue: default, submit, take, take_nonexistent, has_pending +- [x] EscalationQueue: pending_summary with/without options, empty +- [x] EscalationQueue: reply via oneshot channel +- [x] new_escalation_id: prefix and uniqueness +- [x] Inbox: new/default empty, deliver+drain, drain empties, multiple deliveries +- [x] Inbox: clone preserves messages, clone is independent +- [x] Supervisor: new defaults, register count, take removes, take nonexistent +- [x] Supervisor: inbox accessor, list_agents, task_queue accessible +- [x] Supervisor: register allows at max_depth boundary +- [x] AgentExitStatus: equality/inequality +- [x] TaskQueue: fail sets status, get missing returns None +- [x] TaskQueue: dispatch_agent/prompt stored, claim blocked fails +- [x] TaskQueue: list sorted by id, default empty +- [x] TaskQueue: dependency on nonexistent errors, complete nonexistent +- [x] TaskNode: is_runnable when pending+unblocked, not when blocked ## Old code reference - `src/function/supervisor.rs` — all handler functions diff --git a/src/supervisor/escalation.rs b/src/supervisor/escalation.rs index 18a4533..3ad79b3 100644 --- a/src/supervisor/escalation.rs +++ b/src/supervisor/escalation.rs @@ -78,3 +78,122 @@ pub fn new_escalation_id() -> String { let short = &Uuid::new_v4().to_string()[..8]; format!("esc_{short}") } + +#[cfg(test)] +mod tests { + use super::*; + + fn make_request( + id: &str, + agent_id: &str, + question: &str, + ) -> (EscalationRequest, oneshot::Receiver) { + let (tx, rx) = oneshot::channel(); + let req = EscalationRequest { + id: id.to_string(), + from_agent_id: agent_id.to_string(), + from_agent_name: "test-agent".to_string(), + question: question.to_string(), + options: None, + reply_tx: tx, + }; + (req, rx) + } + + #[test] + fn queue_default_has_no_pending() { + let queue = EscalationQueue::default(); + assert!(!queue.has_pending()); + } + + #[test] + fn submit_and_has_pending() { + let queue = EscalationQueue::new(); + let (req, _rx) = make_request("esc_1", "agent_1", "What color?"); + queue.submit(req); + assert!(queue.has_pending()); + } + + #[test] + fn submit_returns_id() { + let queue = EscalationQueue::new(); + let (req, _rx) = make_request("esc_42", "agent_1", "question"); + let id = queue.submit(req); + assert_eq!(id, "esc_42"); + } + + #[test] + fn take_removes_request() { + let queue = EscalationQueue::new(); + let (req, _rx) = make_request("esc_1", "agent_1", "question"); + queue.submit(req); + let taken = queue.take("esc_1"); + assert!(taken.is_some()); + assert!(!queue.has_pending()); + } + + #[test] + fn take_nonexistent_returns_none() { + let queue = EscalationQueue::new(); + assert!(queue.take("esc_missing").is_none()); + } + + #[test] + fn pending_summary_contains_fields() { + let queue = EscalationQueue::new(); + let (req, _rx) = make_request("esc_1", "agent_x", "What to do?"); + queue.submit(req); + let summary = queue.pending_summary(); + assert_eq!(summary.len(), 1); + assert_eq!(summary[0]["escalation_id"], "esc_1"); + assert_eq!(summary[0]["from_agent_id"], "agent_x"); + assert_eq!(summary[0]["question"], "What to do?"); + } + + #[test] + fn pending_summary_includes_options_when_present() { + let queue = EscalationQueue::new(); + let (tx, _rx) = oneshot::channel(); + let req = EscalationRequest { + id: "esc_1".into(), + from_agent_id: "a".into(), + from_agent_name: "agent".into(), + question: "Pick one".into(), + options: Some(vec!["A".into(), "B".into()]), + reply_tx: tx, + }; + queue.submit(req); + let summary = queue.pending_summary(); + assert!(summary[0].get("options").is_some()); + } + + #[test] + fn pending_summary_empty_when_no_requests() { + let queue = EscalationQueue::new(); + assert!(queue.pending_summary().is_empty()); + } + + #[test] + fn reply_reaches_receiver() { + let queue = EscalationQueue::new(); + let (req, rx) = make_request("esc_1", "a", "question"); + queue.submit(req); + let taken = queue.take("esc_1").unwrap(); + taken.reply_tx.send("the answer".into()).unwrap(); + assert_eq!(rx.blocking_recv().unwrap(), "the answer"); + } + + #[test] + fn new_escalation_id_has_prefix() { + let id = new_escalation_id(); + assert!(id.starts_with("esc_")); + assert!(id.len() > 4); + } + + #[test] + fn new_escalation_id_unique() { + let id1 = new_escalation_id(); + let id2 = new_escalation_id(); + assert_ne!(id1, id2); + } +} diff --git a/src/supervisor/mailbox.rs b/src/supervisor/mailbox.rs index e0853f3..396f4cb 100644 --- a/src/supervisor/mailbox.rs +++ b/src/supervisor/mailbox.rs @@ -58,3 +58,122 @@ impl Clone for Inbox { } } } + +#[cfg(test)] +mod tests { + use super::*; + use chrono::Utc; + + fn text_envelope(from: &str, to: &str, content: &str) -> Envelope { + Envelope { + from: from.to_string(), + to: to.to_string(), + payload: EnvelopePayload::Text { + content: content.to_string(), + }, + timestamp: Utc::now(), + } + } + + fn task_completed_envelope(from: &str, to: &str) -> Envelope { + Envelope { + from: from.to_string(), + to: to.to_string(), + payload: EnvelopePayload::TaskCompleted { + task_id: "t1".into(), + summary: "done".into(), + }, + timestamp: Utc::now(), + } + } + + fn shutdown_request_envelope(from: &str, to: &str) -> Envelope { + Envelope { + from: from.to_string(), + to: to.to_string(), + payload: EnvelopePayload::ShutdownRequest { + reason: "all done".into(), + }, + timestamp: Utc::now(), + } + } + + #[test] + fn inbox_new_is_empty() { + let inbox = Inbox::new(); + assert!(inbox.drain().is_empty()); + } + + #[test] + fn inbox_default_is_empty() { + let inbox = Inbox::default(); + assert!(inbox.drain().is_empty()); + } + + #[test] + fn deliver_and_drain() { + let inbox = Inbox::new(); + inbox.deliver(text_envelope("a", "b", "hello")); + let msgs = inbox.drain(); + assert_eq!(msgs.len(), 1); + assert_eq!(msgs[0].from, "a"); + } + + #[test] + fn drain_empties_inbox() { + let inbox = Inbox::new(); + inbox.deliver(text_envelope("a", "b", "hello")); + inbox.drain(); + assert!(inbox.drain().is_empty()); + } + + #[test] + fn drain_orders_shutdown_before_task_before_text() { + let inbox = Inbox::new(); + inbox.deliver(text_envelope("a", "b", "msg")); + inbox.deliver(task_completed_envelope("a", "b")); + inbox.deliver(shutdown_request_envelope("a", "b")); + + let msgs = inbox.drain(); + assert_eq!(msgs.len(), 3); + assert!(matches!( + msgs[0].payload, + EnvelopePayload::ShutdownRequest { .. } + )); + assert!(matches!( + msgs[1].payload, + EnvelopePayload::TaskCompleted { .. } + )); + assert!(matches!(msgs[2].payload, EnvelopePayload::Text { .. })); + } + + #[test] + fn clone_preserves_messages() { + let inbox = Inbox::new(); + inbox.deliver(text_envelope("a", "b", "hello")); + let cloned = inbox.clone(); + let msgs = cloned.drain(); + assert_eq!(msgs.len(), 1); + } + + #[test] + fn clone_is_independent() { + let inbox = Inbox::new(); + inbox.deliver(text_envelope("a", "b", "hello")); + let cloned = inbox.clone(); + inbox.deliver(text_envelope("a", "b", "second")); + let original_msgs = inbox.drain(); + let cloned_msgs = cloned.drain(); + assert_eq!(original_msgs.len(), 2); + assert_eq!(cloned_msgs.len(), 1); + } + + #[test] + fn multiple_deliveries() { + let inbox = Inbox::new(); + for i in 0..5 { + inbox.deliver(text_envelope("a", "b", &format!("msg {i}"))); + } + assert_eq!(inbox.drain().len(), 5); + } +} diff --git a/src/supervisor/mod.rs b/src/supervisor/mod.rs index 4fd7a21..94b56a6 100644 --- a/src/supervisor/mod.rs +++ b/src/supervisor/mod.rs @@ -126,3 +126,130 @@ impl Debug for Supervisor { .finish() } } + +#[cfg(test)] +mod tests { + use super::*; + use crate::utils::create_abort_signal; + + fn make_handle(id: &str, agent_name: &str, depth: usize) -> AgentHandle { + let rt = tokio::runtime::Builder::new_current_thread() + .enable_all() + .build() + .unwrap(); + let join_handle = rt.spawn(async { + Ok(AgentResult { + id: "done".into(), + agent_name: "test".into(), + output: "result".into(), + exit_status: AgentExitStatus::Completed, + }) + }); + std::mem::forget(rt); + AgentHandle { + id: id.to_string(), + agent_name: agent_name.to_string(), + depth, + inbox: Arc::new(Inbox::new()), + abort_signal: create_abort_signal(), + join_handle, + } + } + + #[test] + fn supervisor_new_empty() { + let sup = Supervisor::new(4, 3); + assert_eq!(sup.active_count(), 0); + assert_eq!(sup.max_concurrent(), 4); + assert_eq!(sup.max_depth(), 3); + } + + #[test] + fn supervisor_register_increments_count() { + let mut sup = Supervisor::new(4, 3); + sup.register(make_handle("a1", "explore", 1)).unwrap(); + assert_eq!(sup.active_count(), 1); + } + + #[test] + fn supervisor_register_rejects_at_capacity() { + let mut sup = Supervisor::new(1, 3); + sup.register(make_handle("a1", "explore", 1)).unwrap(); + let result = sup.register(make_handle("a2", "coder", 1)); + assert!(result.is_err()); + assert!(result.unwrap_err().to_string().contains("at capacity")); + } + + #[test] + fn supervisor_register_rejects_exceeding_depth() { + let mut sup = Supervisor::new(4, 2); + let result = sup.register(make_handle("a1", "explore", 3)); + assert!(result.is_err()); + assert!(result.unwrap_err().to_string().contains("max depth")); + } + + #[test] + fn supervisor_register_allows_at_max_depth() { + let mut sup = Supervisor::new(4, 2); + sup.register(make_handle("a1", "explore", 2)).unwrap(); + assert_eq!(sup.active_count(), 1); + } + + #[test] + fn supervisor_take_removes_handle() { + let mut sup = Supervisor::new(4, 3); + sup.register(make_handle("a1", "explore", 1)).unwrap(); + let taken = sup.take("a1"); + assert!(taken.is_some()); + assert_eq!(sup.active_count(), 0); + } + + #[test] + fn supervisor_take_nonexistent_returns_none() { + let mut sup = Supervisor::new(4, 3); + assert!(sup.take("missing").is_none()); + } + + #[test] + fn supervisor_list_agents() { + let mut sup = Supervisor::new(4, 3); + sup.register(make_handle("a1", "explore", 1)).unwrap(); + sup.register(make_handle("a2", "coder", 1)).unwrap(); + let list = sup.list_agents(); + assert_eq!(list.len(), 2); + let ids: Vec<&str> = list.iter().map(|(id, _)| *id).collect(); + assert!(ids.contains(&"a1")); + assert!(ids.contains(&"a2")); + } + + #[test] + fn supervisor_inbox_returns_handle_inbox() { + let mut sup = Supervisor::new(4, 3); + sup.register(make_handle("a1", "explore", 1)).unwrap(); + assert!(sup.inbox("a1").is_some()); + assert!(sup.inbox("missing").is_none()); + } + + #[test] + fn supervisor_task_queue_accessible() { + let mut sup = Supervisor::new(4, 3); + let id = sup + .task_queue_mut() + .create("task".into(), "desc".into(), None, None); + assert!(!id.is_empty()); + assert_eq!(sup.task_queue().list().len(), 1); + } + + #[test] + fn agent_exit_status_equality() { + assert_eq!(AgentExitStatus::Completed, AgentExitStatus::Completed); + assert_ne!( + AgentExitStatus::Completed, + AgentExitStatus::Failed("err".into()) + ); + assert_eq!( + AgentExitStatus::Failed("x".into()), + AgentExitStatus::Failed("x".into()) + ); + } +} diff --git a/src/supervisor/taskqueue.rs b/src/supervisor/taskqueue.rs index 6387122..e60f676 100644 --- a/src/supervisor/taskqueue.rs +++ b/src/supervisor/taskqueue.rs @@ -268,4 +268,87 @@ mod tests { assert!(!queue.claim(&id1, "worker-2")); assert_eq!(queue.get(&id1).unwrap().status, TaskStatus::InProgress); } + + #[test] + fn test_fail_sets_status() { + let mut queue = TaskQueue::new(); + let id = queue.create("Task".into(), "".into(), None, None); + queue.fail(&id); + assert_eq!(queue.get(&id).unwrap().status, TaskStatus::Failed); + } + + #[test] + fn test_get_returns_none_for_missing() { + let queue = TaskQueue::new(); + assert!(queue.get("nonexistent").is_none()); + } + + #[test] + fn test_dispatch_agent_stored() { + let mut queue = TaskQueue::new(); + let id = queue.create( + "Auto task".into(), + "desc".into(), + Some("coder".into()), + Some("implement feature".into()), + ); + let task = queue.get(&id).unwrap(); + assert_eq!(task.dispatch_agent.as_deref(), Some("coder")); + assert_eq!(task.prompt.as_deref(), Some("implement feature")); + } + + #[test] + fn test_claim_blocked_task_fails() { + let mut queue = TaskQueue::new(); + let id1 = queue.create("A".into(), "".into(), None, None); + let id2 = queue.create("B".into(), "".into(), None, None); + queue.add_dependency(&id2, &id1).unwrap(); + assert!(!queue.claim(&id2, "worker")); + } + + #[test] + fn test_list_sorted_by_id() { + let mut queue = TaskQueue::new(); + queue.create("Third".into(), "".into(), None, None); + queue.create("First".into(), "".into(), None, None); + queue.create("Second".into(), "".into(), None, None); + let tasks = queue.list(); + let ids: Vec<&str> = tasks.iter().map(|t| t.id.as_str()).collect(); + assert_eq!(ids, vec!["1", "2", "3"]); + } + + #[test] + fn test_default_is_empty() { + let queue = TaskQueue::default(); + assert!(queue.list().is_empty()); + } + + #[test] + fn test_dependency_on_nonexistent_task_errors() { + let mut queue = TaskQueue::new(); + let id1 = queue.create("A".into(), "".into(), None, None); + let result = queue.add_dependency(&id1, "nonexistent"); + assert!(result.is_err()); + } + + #[test] + fn test_complete_nonexistent_returns_empty() { + let mut queue = TaskQueue::new(); + let unblocked = queue.complete("nonexistent"); + assert!(unblocked.is_empty()); + } + + #[test] + fn test_task_node_is_runnable() { + let node = TaskNode::new("1".into(), "t".into(), "d".into(), None, None); + assert!(node.is_runnable()); + } + + #[test] + fn test_task_node_not_runnable_when_blocked() { + let mut node = TaskNode::new("1".into(), "t".into(), "d".into(), None, None); + node.blocked_by.insert("2".into()); + node.status = TaskStatus::Blocked; + assert!(!node.is_runnable()); + } }