When to use this recipe
- A “fix-this-ticket” workflow where you wire an agent to your issue tracker.
- Batch refactors across many repos: the same agent task, fanned out.
- Evaluation harnesses that compare different agents on the same task.
- A scheduled background worker that picks up tasks from a queue and runs them.
Prerequisites
- A Brimble account with API access enabled.
- The SDK installed and
BRIMBLE_SANDBOX_KEYset. - An API key for whichever AI agent you’re running (Anthropic, OpenAI, etc.) to pass in as a sandbox env var.
- Optional but recommended: a Git provider token if the agent needs to clone private repos.
Recipe
Available agent templates (see the Templates list for the live catalog):claude-code, Anthropic’s Claude Code CLIcodex, OpenAI Codex CLIopencode, OpenCode, open-source agentdroid, Factory’s Droid agent
claude-code. Swap the template name and the invocation command for any of the others.
What’s happening
- Pick the agent template.
claude-codeships with the Claude Code CLI preinstalled and configured for unattended runs. Swap tocodex,droid, oropencodeand adjust the binary name in the exec command (codex --print ...,droid run ..., etc.). - Persistent disk on, 20 GB. The agent’s checkout,
node_modules, model caches, and intermediate files all live on the workspace volume. A snapshot at the end captures that state for replay or audit. autoDestroywith a 3-hour ceiling. Agent runs can be long; a 30-minute hard timeout would kill the work mid-flight. Three hours is a reasonable backstop. SetoneShot: trueinstead if you want the sandbox to terminate the momentclaude --printexits.- Outbound network is on. The agent needs to reach the model API (
api.anthropic.com,api.openai.com, etc.). If you want stricter isolation, pre-bake any tool the agent needs into the snapshot and turnblockOutbound: trueon for the actual run. - API key as a shell env var, not a build-time secret. Sandboxes don’t bake env vars into the template; you set them per-exec.
- Diff capture after the run.
git add -A && git diff --stagedshows everything the agent touched, including new files. Pipe it to your own review tool or PR-creation flow. - Snapshot before destroy. Cheap insurance: if the diff looks wrong or the agent did something unexpected, you can restore the snapshot into a fresh sandbox and inspect.
Variations
- Resume an interrupted run. Pass
fromSnapshotoncreateReadyto start a new sandbox from the agent’s last state. Useful when an agent hits the timeout mid-task and you want to continue. - Reuse a checkout across runs. Create a
sandbox-type volume, attach it on first run, and pass the samevolumeIdon subsequent runs. The repo is already cloned, dependencies are already installed. - Fan out across many tasks. Loop
runAgentover a task list. Each call provisions an isolated sandbox; the platform’s concurrency cap is your only ceiling. - Different agents, same task. Swap the template (
claude-code→codex→droid) on the same input and compare the resulting diffs. - Background mode with logs. Replace the inline exec with two calls: one to start the agent in the background (
nohup claude ... &), one to tail/work/agent.logwhile it runs. You’ll needgetFileto stream the log.
Next steps
- Sandboxes overview, templates and lifecycle.
- Snapshots, the snapshot lifecycle and restore semantics.
- Run untrusted code, the simpler one-shot pattern this recipe extends.