
Why Docker Sandboxes Exist
AI coding agents (like Claude Code or Gemini CLI) are becoming more autonomous. They can:
- Read your source code
- Install packages and tools
- Run shell commands and scripts
- Modify, move, or delete files
- Keep the state across many steps
This is powerful, but also risky if they run directly on your machine. Docker Sandboxes were created so that these agents can work with your real project files, but inside a controlled, isolated environment (a container) that mirrors your workspace while protecting your actual system.
Think of it as:
Give the agent a realistic copy of my development environment, but put a fence around it.
What Is a Docker Sandbox (In Simple Terms)?
A Docker Sandbox is:
- A local containerized environment created by Docker Desktop
- Specifically designed for AI agents and automation
- Linked to a workspace folder on your machine (for example:
~/my-project) - Reusable across multiple runs, which allows the agent to maintain:
→ Installed packages
→ Temporary files
→ Other local state
When you run:
docker sandbox run <agent>
Docker will:
- Create or utilize an existing container based on a template image.
- Mount the current directory into the container at the corresponding path.
- Inject the user’s Git username and email to ensure that commits are accurately attribute.
- Store sensitive credentials, such as Claude’s API key, in a persistent volume rather than in the host filesystem.
- Initiate the agent within the container, enabling it to execute commands, install packages, and modify files.
This is all integrated into a user-friendly command-line interface (CLI): docker sandbox.
Understanding How Docker Sandboxes Function Internally
1- One sandbox per workspace
The sandbox lifecycle is limited to a specific workspace directory. If you execute the following command:
cd ~/my-project
docker sandbox run claude
Docker offers two options for handling sandboxes: it can either create a new sandbox for ~/my-projector reuse an existing sandbox that was previously established for the designated folder.
Reusing the sandbox ensures that the agent operates within the same environment consistently, which includes:
- Previously installed npm/pip packages
- Generated caches, and
- Temporary files
This approach is distinct from a conventional ephemeral container docker run, where the container and all associated state are lost unless volume management is implemented.
2- Workspace mounting
This allows the agent inside the container to work directly on your real project files, instead of a copy, while still running in an isolated OS-level environment.
Docker automatically mounts your current directory into the container at the same path:
- Host:
/Users/you/my-project - Container:
/Users/you/my-project
This setup enables the agent inside the container to work directly with your actual project files instead of a copy, all while maintaining an isolated operating system-level environment.
3- Persistent volumes for credentials and state
Credentials, such as API keys, and other important data should be stored in Docker volumes rather than in your workspace. For instance, Claude’s credentials are stored in a designated volume such as docker-claude-sandbox-data.
This approach allows you to:
- Keep sensitive information out of your Git repository.
- Reuse credentials across sessions without the need for re-authentication.
- Easily delete the sandbox and its associated volume if you want to reset everything.
4- Container isolation
Because the agent operates within a container, it has its own filesystem view and runs as processes inside the Docker virtual machine (VM). It only has access to the directories, environment variables, and network permissions that you configure.
This level of isolation helps prevent agents from:
- Accessing sensitive system paths
- Modifying global system packages
- Escaping the defined workspace boundaries (under normal configurations)
Docker plans to transition from a “container in a VM” model to using microVM-based isolation for enhanced security and improved performance.
Basic Workflow: From Zero to Running a Sandbox
This section provides a comprehensive and practical guide for beginners.
1- Prerequisites
To use Docker Sandboxes locally, you need:
- Docker Desktop 4.50 or later
- An AI agent that supports sandbox mode (currently:
→ Claude Code
→ Gemini CLI
→ with more coming)
Docker Desktop is free for individual developers and small teams, provided that they comply with Docker’s licensing terms; no paid Docker plan is necessary.
2- Step-by-step: Run your first sandboxed agent
Assume you have a project in ~/my-project.
1- Open a terminal and go to your workspace:
cd ~/my-project
2- Run the agent in a sandbox:
docker sandbox run claude
docker sandbox run claude
3- Upon the initial setup, you will need to authenticate the agent by either logging into Claude Code or pasting an API key. Once authentication is completed, these credentials will be securely stored in a Docker volume for future use.
4- Claude Code (or your chosen agent) starts inside the container and accesses your workspace directory.
From this point forward, using the agent (for example, through an editor integration) will execute code, install packages, and modify files inside the sandbox container, rather than directly on your host operating system.
Managing Your Sandboxes
Docker provides a set of clear command-line interface (CLI) commands designed for the inspection and management of sandboxes.
1- List existing sandboxes
docker sandbox ls
This shows sandbox IDs, their workspace path, status, and creation time.
2- Inspect a sandbox
docker sandbox inspect <sandbox-id>
This returns JSON details that include the following:
- Template image used
- Mounted paths
- Volumes
- Environment configuration
3- Remove a sandbox
docker sandbox rm <sandbox-id>
Use this when:
- You want to reset the environment
- You changed environment variables, volume mounts, or Docker socket access and want them to take effect
- You are done with this project and want to clean up
Removing a sandbox deletes the container and associated persistent state. When you run docker sandbox run <agent> again in the same directory, a fresh sandbox is created with your new configuration.
Advanced Configuration: Customizing the Sandbox
Docker Sandboxes also provide advanced options, mainly via CLI flags.
1- Environment variables
You can pass environment variables into the sandbox, for example, to configure language, feature flags, or API endpoints:
docker sandbox run -e NODE_ENV=development -e DEBUG=true claude
If you change environment variables later, you need to:
1- Remove the previous sandbox:
docker sandbox ls
docker sandbox rm <sandbox-id>
2- Run the agent again with new flags, so Docker creates a sandbox with the updated configuration.
2- Additional volume mounts
You may want to give the agent access to extra directories, for example:
- A shared
/datasetsfolder - A directory with common libraries or templates
You can use -v like standard Docker:
docker sandbox run \
-v /path/to/datasets:/datasets \
claude
Again, changes to mounted volumes require recreating the sandbox to take effect.
3- Optional Docker socket access
Sometimes you want the agent to manage containers itself (for example, testing a containerized app). For that, you can enable Docker socket access from the sandbox:
docker sandbox run --mount-docker-socket claude
This gives the agent power to:
- Build images
- Start/stop containers
- Inspect running containers
You should use this carefully because it gives the agent a lot of control over your Docker environment.
How Is This Different from “Just Using Docker Containers”?
You might wonder: “Why not just run a normal container docker run and mount my project?”
You can, and many people do. But Docker Sandboxes add higher-level behavior on top of plain containers:
1- Higher-level abstraction for agents
With plain Docker, you must manage:
- Container creation
- Volume configuration
- Environment variables
- Persistent vs ephemeral state
- Credentials storage
- Lifecycle (when to stop/remove/reset)
With Sandboxes, Docker handles:
- One sandbox per workspace
- Automatic workspace mounting
- Persistent, scoped credentials
- Reuse of the environment across runs
- Simple
ls,inspect,rmcommands dedicated to sandboxes
2- Designed for AI coding workflows
Sandboxes are optimized for iterative, agent-driven development, where the agent:
- Runs many small commands over time
- Installs new tools as needed
- Modifies files gradually
- Needs to keep the state between sessions
Normal containers are usually:
- Started explicitly by you
- Short-lived or service-oriented (API, DB, etc.)
- Not automatically tied to a workspace or agent
Sandboxes are essentially containers with opinionated defaults and a lifecycle tailored to AI agents.
Realistic Example: An Agent That Refactors Your Code
Imagine you have a large Node.js project and you want an AI agent to:
- Convert it gradually to TypeScript
- Introduce a better folder structure
- Update imports and types
- Run tests and fix failures
This involves:
- Installing dev dependencies (TypeScript, type definitions, tools)
- Modifying many source files
- Running scripts (
npm test,npm run build) - Potentially generating helper scripts on the fly
Without a sandbox
If the agent runs directly on your host:
- It installs npm packages globally or into your project
- If a script is malicious or buggy, it can:
- Delete important files
- Corrupt your environment
- Access secrets and other directories
You must trust every command.
With a Docker Sandbox
- Start in your project directory:
cd ~/my-node-project
docker sandbox run claude
2. The agent runs inside a container that mirrors your project directory.
- It can safely do things like:
npm install --save-dev typescript @types/node ts-node
npx tsc --init
npm test
3. All package installs and environment tweaks live inside the sandbox. If something goes wrong:
docker sandbox rm <sandbox-id>
And you have a clean state the next time you run docker sandbox run claude.
This gives you:
- The productivity of an autonomous agent
- The safety of an isolated environment
As one article summarized it: Sandboxes provide “safe autonomy” for local coding agents by letting them execute commands, install packages, and modify files in an isolated workspace that mirrors your development setup.
When Should a Beginner Use Docker Sandboxes?
You should consider Docker Sandboxes if:
- You use (or plan to use) AI coding agents that:
- Run shell commands
- Edit many files
- Install tools and dependencies
- You want to experiment with scripts or automation, you don’t fully trust yet
- You care about keeping your local machine and global environment clean
- You like the idea of being able to “reset everything” with a single command
If you are just starting with Docker and not yet using AI agents or automated code tools, regular containers and Docker Compose may be enough. But as soon as you introduce autonomous agents touching your codebase, Sandboxes become much more attractive.
Summary
For a beginner, here is the core idea in one sentence:
Docker Sandboxes let AI agents work on your real code in a local environment that feels like your machine, but is actually a safe container you can inspect, reuse, or delete at any time.
Key takeaways:
- They are local, container-based environments for agents and automation.
- They mirror your workspace while isolating execution from your host system.
- They keep state across sessions (packages, temp files) per workspace.
- They offer simple CLI commands:
run,ls,inspect,rm. - They are currently experimental and require Docker Desktop 4.50+.
As one article summarized it: Sandboxes provide “safe autonomy” for local coding agents by letting them execute commands, install packages, and modify files in an isolated workspace that mirrors your development setup.
