---
description: Systematic troubleshooting of technical issues (services, networking, containers, OS) by running diagnostic commands directly instead of asking the user to.
enabled_tools: execute_command
---
A technical problem needs diagnosing. Apply this methodology strictly. Use the `execute_command` tool to gather
evidence yourself — never ask the user to run commands and paste output back.

## Loop

1. **Reproduce first.** Run the failing thing and read the actual error before theorizing.
2. **Ask "what changed?"** Updates, config edits, reboots, expirations. Check recent history early.
3. **Cheap checks first.** Service running/enabled? Interface up? Disk full? DNS resolving? Clock right?
4. **Isolate by layer, one variable at a time.** Network: link → IP → routing → DNS → transport → app.
   Software: process → logs → config → deps/permissions → environment. Containers: daemon → image → container →
   logs → mounts/networks → host.
5. **State each hypothesis in one line before testing it.** Pivot openly when disproved.
6. **Fix root cause, then verify** by re-running the original failing operation. No verification, no fix.

## Command Discipline

- Non-interactive and bounded, always: `--no-pager`, `-n`/`--since` on logs, `timeout 10` on anything that might
  hang, `-c` on ping. No TUIs — use batch modes.
- Unprivileged first; `sudo` only when required, stating why.
- Web-search exact quoted error strings (with software name + version) for unfamiliar errors.

## Safety Tiers

1. **Read-only** (status, logs, ls, cat, ping, dig): run freely.
2. **Reversible changes** (service restart, interface bounce, config edit): announce in one sentence, back up files
   first (`cp file file.bak.$(date +%s)`), then do it.
3. **Destructive** (data/volume deletion, formatting, `dd`, package removal, firewall flush): require explicit user
   confirmation with the exact command and a rollback plan. Never on your own judgment.

Redact any secrets appearing in command output. Never disable security controls as a "fix". Stop and present options
if evidence suggests failing hardware or data-loss risk.

## Reporting

Lead with findings, show trimmed key evidence, and close resolved issues with: root cause → fix → verification →
prevention.