docs: Updated docs for the LLM_*_RAW_JSON variable for agents and tools

2026-06-03 14:44:56 -06:00
parent 3bbe8f4b3d
commit 30c7854b31
2 changed files with 74 additions and 66 deletions
@@ -284,36 +284,35 @@ $ ./get_current_time.sh
 Fri Oct 24 05:55:04 PM MDT 2025
 ```

-# Handling large or special-character argument values
+# Reading argument values from `LLM_TOOL_RAW_JSON`

-Coyote dispatches a tool call by converting the LLM's JSON arguments into shell `--option=<value>` flags via `jq`, then 
-`eval`-ing the result. The flag values reach your `main` function as `argc_*` variables. For most calls this works fine.
+Coyote dispatches a bash tool call by converting the LLM's JSON arguments into shell `--option=<value>` flags via `jq`, 
+then `eval`-ing the result. The flag values reach your `main` function as `argc_*` variables. For short, single-line 
+values this works fine.

-However, for **very large values, multi-line values dense with newlines, or values with many markdown table pipes (`|`),
-single quotes, em-dashes, and other shell-significant characters**, the shell-quoting round-trip can occasionally drop 
-characters or truncate the value before it reaches your `argc_*` variable. Symptoms include `argc_*` being shorter than 
-what the LLM sent, or starting mid-content.
+However, for **large multi-line values, or values dense with shell-significant characters** (markdown table pipes (`|`),
+single quotes, em-dashes, etc.), the shell-quoting round-trip can occasionally drop characters or truncate the value 
+before it reaches your `argc_*` variable. Symptoms include `argc_*` being shorter than what the LLM sent, or starting 
+mid-content.

-## The escape hatch: `LLM_TOOL_RAW_JSON`
-
-Coyote exports the raw JSON envelope it received from the LLM as the `LLM_TOOL_RAW_JSON` environment variable on every 
-tool invocation. To bypass argc parsing for a specific option, re-derive its value directly from the JSON using `jq`:
+To sidestep the shell-quoting layer entirely, read the value directly from the raw JSON envelope that Coyote exports as 
+the `LLM_TOOL_RAW_JSON` environment variable:

 ```bash
-# Prefer the raw JSON when available, fall back to argc parsing if not
-if [[ -n "$LLM_TOOL_RAW_JSON" ]] && command -v jq >/dev/null 2>&1; then
+# shellcheck disable=SC2154
+main() {
    argc_contents="$(jq -r '.contents' <<< "$LLM_TOOL_RAW_JSON")"
    argc_path="$(jq -r '.path' <<< "$LLM_TOOL_RAW_JSON")"
-fi
+
+    # ... rest of your tool logic using $argc_contents and $argc_path
+}
 ```

 The `jq -r` ("raw") flag preserves every byte of the original LLM-sent value, including newlines, quotes, em-dashes, 
-and shell-special characters, without any shell-quoting layer in between. This is the same approach Coyote's bundled 
-`fs_write`, `fs_patch`, `execute_command`, `execute_sql_code`, and `send_mail` tools use for their large-payload options.
-
-The fallback (`fall back to argc parsing if not`) is intentional: if `LLM_TOOL_RAW_JSON` is unset or `jq` isn't 
-installed, the tool still works via the standard argc path. You're adding a more reliable code path, not replacing the 
-existing one.
+and shell-special characters, without any shell-quoting layer in between. This is the pattern Coyote's bundled 
+`fs_write`, `fs_patch`, `execute_command`, `execute_sql_code`, and `send_mail` tools use for their large-payload 
+options. The argc `# @option --foo!` directives stay in your script so Coyote can build the JSON schema for the LLM 
+and validate the call, but your `main()` reads from `LLM_TOOL_RAW_JSON` instead of trusting argc's value capture.

 ## When to use this

@@ -327,13 +326,12 @@ flow handles those reliably.

 ## For agent-local tools

-If you're writing tools inside an agent's `tools.sh` (under `<config_dir>/agents/<agent>/tools.sh`), the same env var 
-is exposed as `LLM_AGENT_RAW_JSON` (the raw JSON for the agent function call). The bypass pattern is identical:
+If you're writing tools inside an agent's `tools.sh` (under `<config_dir>/agents/<agent>/tools.sh`), the same value is 
+exposed as `LLM_AGENT_RAW_JSON` (the raw JSON for the agent function call). The semantics are identical; only the 
+variable name differs:

 ```bash
-if [[ -n "$LLM_AGENT_RAW_JSON" ]] && command -v jq >/dev/null 2>&1; then
-    argc_some_field="$(jq -r '.some_field' <<< "$LLM_AGENT_RAW_JSON")"
-fi
+argc_some_field="$(jq -r '.some_field' <<< "$LLM_AGENT_RAW_JSON")"
 ```

 ---
@@ -27,62 +27,72 @@ to enable it globally. See the [Tools](Tools#enablingdisabling-global-tools) doc
 ## Environment Variables
 All tools have access to the following environment variables that provide context about the current execution environment:

-| Variable             | Description                                                                                                                                                                                                           |
-|----------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
-| `LLM_OUTPUT`         | Indicates where the output of the tool should go. <br>In certain situations, this may be set to a temporary file instead of `/dev/stdout`.                                                                            |
-| `LLM_ROOT_DIR`       | The root `config_dir` directory for Coyote <br>(i.e. `dirname $(coyote --info \| grep config_file \| awk '{print $2}')`)                                                                                              |
-| `LLM_TOOL_NAME`      | The name of the tool being executed                                                                                                                                                                                   |
-| `LLM_TOOL_CACHE_DIR` | A directory specific to the tool for storing cache or temporary files                                                                                                                                                 |
-| `LLM_TOOL_RAW_JSON`  | The raw JSON envelope the LLM sent for this tool call, exactly as received. Useful as a robustness escape hatch. See [Handling large or special-character values](#handling-large-or-special-character-values) below. |
+| Variable             | Description                                                                                                                                                          |
+|----------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| `LLM_OUTPUT`         | Indicates where the output of the tool should go. <br>In certain situations, this may be set to a temporary file instead of `/dev/stdout`.                           |
+| `LLM_ROOT_DIR`       | The root `config_dir` directory for Coyote <br>(i.e. `dirname $(coyote --info \| grep config_file \| awk '{print $2}')`)                                             |
+| `LLM_TOOL_NAME`      | The name of the tool being executed                                                                                                                                  |
+| `LLM_TOOL_CACHE_DIR` | A directory specific to the tool for storing cache or temporary files                                                                                                |
+| `LLM_TOOL_RAW_JSON`  | The raw JSON envelope the LLM sent for this tool call, exactly as received. See [Reading values via LLM_TOOL_RAW_JSON](#reading-values-via-llm_tool_raw_json) below. |

 Coyote also searches the tools directory on startup for a `.env` file. If found, all tools in `functions/tools/` will have
 the environment variables defined in the `.env` file available to them.

-## Handling large or special-character values
+## Reading values via `LLM_TOOL_RAW_JSON`

-Coyote dispatches a tool call by converting the LLM's JSON arguments into shell `--option=<value>` flags via `jq`, then 
-`eval`-ing the result. For most calls this works fine. For very large values, multi-line values with many newlines, or 
-values dense with markdown table pipes, single quotes, and other shell-significant characters, the shell-quoting 
-round-trip can occasionally drop characters or truncate the value before it reaches your tool's `argc_*` variable.
+Coyote exports the raw JSON envelope it received from the LLM as the `LLM_TOOL_RAW_JSON` environment variable on every 
+tool invocation. Tools can use this to read option values directly from the JSON rather than going through the 
+`argc_*` variables.

-If you observe corruption or truncation of a tool argument (e.g., the value reaching your tool is shorter than what the 
-LLM sent, or starts mid-content), bypass the shell parsing entirely by reading directly from `LLM_TOOL_RAW_JSON`:
+### When to use it
+
+**Bash tools**: This is the recommended pattern for any option that may carry large multi-line content, code, file 
+contents, or values dense with shell-significant characters (markdown table pipes, single quotes, em-dashes, etc.). 
+Coyote's bash dispatcher converts JSON to shell `--option=<value>` flags via `jq` and `eval`-s the result; for large or 
+special-character values, that shell-quoting round-trip can occasionally drop characters or misalign content before it 
+reaches `argc_*`. Reading from `LLM_TOOL_RAW_JSON` bypasses the shell layer entirely.

 ```bash
-# Bash: prefer the raw JSON when available, fall back to argc parsing
-if [[ -n "$LLM_TOOL_RAW_JSON" ]] && command -v jq >/dev/null 2>&1; then
+main() {
    argc_contents="$(jq -r '.contents' <<< "$LLM_TOOL_RAW_JSON")"
    argc_path="$(jq -r '.path' <<< "$LLM_TOOL_RAW_JSON")"
-fi
-```

-```python
-# Python: read os.environ["LLM_TOOL_RAW_JSON"] and parse with json.loads
-import json, os
-raw = os.environ.get("LLM_TOOL_RAW_JSON")
-if raw:
-    payload = json.loads(raw)
-    contents = payload["contents"]
-    path = payload["path"]
-```
-
-```typescript
-// TypeScript: process.env.LLM_TOOL_RAW_JSON, then JSON.parse
-const raw = process.env.LLM_TOOL_RAW_JSON;
-if (raw) {
-    const payload = JSON.parse(raw);
-    const contents = payload.contents;
-    const path = payload.path;
+    # ... rest of your tool logic using $argc_contents and $argc_path
 }
 ```

-This is the same approach Coyote's bundled `fs_write`, `fs_patch`, `execute_command`, `execute_sql_code`, and 
-`send_mail` tools use for their large-payload options. It preserves every byte of the original LLM-sent value, including 
-newlines, quotes, em-dashes, and shell-special characters. If `jq` (bash) or your language's JSON parser is available, 
-prefer this path for any option that may carry user-generated multi-line content.
+This is the pattern Coyote's bundled `fs_write`, `fs_patch`, `execute_command`, `execute_sql_code`, and `send_mail` tools 
+use for their large-payload options. The argc `# @option --foo!` directives stay in your script so Coyote can build the 
+JSON schema for the LLM, but your `main()` reads from `LLM_TOOL_RAW_JSON` instead of trusting argc's value capture.

-For agent-local tools written under `<config_dir>/agents/<agent>/tools.sh`, the same env var is exposed as 
-`LLM_AGENT_RAW_JSON` (the raw JSON payload for the agent function call).
+**Python and TypeScript tools**: Coyote's Python and TypeScript dispatchers parse the JSON envelope natively (`json.loads` 
+/ `JSON.parse`) and pass values directly to your `run()` function as native types. They don't go through shell quoting, 
+so the `LLM_*_RAW_JSON` escape hatch that bash tools need doesn't affect them. Declared parameters arrive in your 
+function correctly without needing `LLM_TOOL_RAW_JSON`.
+
+Python and TypeScript tools may still want to read `LLM_TOOL_RAW_JSON` for other reasons:
+- Accessing fields the LLM passed that aren't declared in your `run()` signature (telemetry, optional metadata).
+- Auditing or logging the original LLM-sent JSON verbatim.
+- Debugging when a value isn't what you expected.
+
+```python
+# Python: parse the raw JSON when you need beyond-signature access
+import json, os
+payload = json.loads(os.environ["LLM_TOOL_RAW_JSON"])
+extra_field = payload.get("extra_field")
+```
+
+```typescript
+// TypeScript: parse the raw JSON when you need beyond-signature access
+const payload = JSON.parse(process.env.LLM_TOOL_RAW_JSON!);
+const extraField = (payload as Record<string, unknown>).extra_field;
+```
+
+### Agent-local tools
+
+For tools written under `<config_dir>/agents/<agent>/tools.sh` (or `.py` / `.ts`), the same value is exposed as 
+`LLM_AGENT_RAW_JSON`, the raw JSON payload for the agent function call. The semantics are identical; only the variable 
+name differs.

 ## Custom Bash-Based Tools
 To create a Bash-based tool, refer to the [custom bash tools documentation](Custom-Bash-Tools).