Containers

Docker Sandboxes: A Complete Beginner’s Guide (End-to-End)

Why Docker Sandboxes Exist

AI coding agents (like Claude Code or Gemini CLI) are becoming more autonomous. They can:

  • Read your source code
  • Install packages and tools
  • Run shell commands and scripts
  • Modify, move, or delete files
  • Keep the state across many steps

This is powerful, but also risky if they run directly on your machine. Docker Sandboxes were created so that these agents can work with your real project files, but inside a controlled, isolated environment (a container) that mirrors your workspace while protecting your actual system.

Think of it as:

Give the agent a realistic copy of my development environment, but put a fence around it.

What Is a Docker Sandbox (In Simple Terms)?

Docker Sandbox is:

  • local containerized environment created by Docker Desktop
  • Specifically designed for AI agents and automation
  • Linked to a workspace folder on your machine (for example: ~/my-project)
  • Reusable across multiple runs, which allows the agent to maintain:

→ Installed packages

→ Temporary files

→ Other local state

When you run:

docker sandbox run <agent>

Docker will:

  1. Create or utilize an existing container based on a template image.
  2. Mount the current directory into the container at the corresponding path.
  3. Inject the user’s Git username and email to ensure that commits are accurately attribute.
  4. Store sensitive credentials, such as Claude’s API key, in a persistent volume rather than in the host filesystem.
  5. Initiate the agent within the container, enabling it to execute commands, install packages, and modify files.

This is all integrated into a user-friendly command-line interface (CLI): docker sandbox.

Understanding How Docker Sandboxes Function Internally

1- One sandbox per workspace

The sandbox lifecycle is limited to a specific workspace directory. If you execute the following command:

cd ~/my-project
docker sandbox run claude

Docker offers two options for handling sandboxes: it can either create a new sandbox for ~/my-projector reuse an existing sandbox that was previously established for the designated folder.

Reusing the sandbox ensures that the agent operates within the same environment consistently, which includes:

  • Previously installed npm/pip packages
  • Generated caches, and
  • Temporary files

This approach is distinct from a conventional ephemeral container docker run, where the container and all associated state are lost unless volume management is implemented.

2- Workspace mounting

This allows the agent inside the container to work directly on your real project files, instead of a copy, while still running in an isolated OS-level environment.

Docker automatically mounts your current directory into the container at the same path:

  • Host: /Users/you/my-project
  • Container: /Users/you/my-project

This setup enables the agent inside the container to work directly with your actual project files instead of a copy, all while maintaining an isolated operating system-level environment.

3- Persistent volumes for credentials and state

Credentials, such as API keys, and other important data should be stored in Docker volumes rather than in your workspace. For instance, Claude’s credentials are stored in a designated volume such as docker-claude-sandbox-data.

This approach allows you to:

  • Keep sensitive information out of your Git repository.
  • Reuse credentials across sessions without the need for re-authentication.
  • Easily delete the sandbox and its associated volume if you want to reset everything.

4- Container isolation

Because the agent operates within a container, it has its own filesystem view and runs as processes inside the Docker virtual machine (VM). It only has access to the directories, environment variables, and network permissions that you configure.

This level of isolation helps prevent agents from:

  • Accessing sensitive system paths
  • Modifying global system packages
  • Escaping the defined workspace boundaries (under normal configurations)

Docker plans to transition from a “container in a VM” model to using microVM-based isolation for enhanced security and improved performance.

Basic Workflow: From Zero to Running a Sandbox

This section provides a comprehensive and practical guide for beginners.

1- Prerequisites

To use Docker Sandboxes locally, you need:

  • Docker Desktop 4.50 or later
  • An AI agent that supports sandbox mode (currently:

→ Claude Code

→ Gemini CLI

→ with more coming)

Docker Desktop is free for individual developers and small teams, provided that they comply with Docker’s licensing terms; no paid Docker plan is necessary.

2- Step-by-step: Run your first sandboxed agent

Assume you have a project in ~/my-project.

1- Open a terminal and go to your workspace:

cd ~/my-project

2- Run the agent in a sandbox:

  • docker sandbox run claude
docker sandbox run claude

3- Upon the initial setup, you will need to authenticate the agent by either logging into Claude Code or pasting an API key. Once authentication is completed, these credentials will be securely stored in a Docker volume for future use.

4- Claude Code (or your chosen agent) starts inside the container and accesses your workspace directory.

From this point forward, using the agent (for example, through an editor integration) will execute code, install packages, and modify files inside the sandbox container, rather than directly on your host operating system.

Managing Your Sandboxes

Docker provides a set of clear command-line interface (CLI) commands designed for the inspection and management of sandboxes.

1- List existing sandboxes

docker sandbox ls

This shows sandbox IDs, their workspace path, status, and creation time.

2- Inspect a sandbox

docker sandbox inspect <sandbox-id>

This returns JSON details that include the following:

  • Template image used
  • Mounted paths
  • Volumes
  • Environment configuration

3- Remove a sandbox

docker sandbox rm <sandbox-id>

Use this when:

  • You want to reset the environment
  • You changed environment variables, volume mounts, or Docker socket access and want them to take effect
  • You are done with this project and want to clean up

Removing a sandbox deletes the container and associated persistent state. When you run docker sandbox run <agent> again in the same directory, a fresh sandbox is created with your new configuration.

Advanced Configuration: Customizing the Sandbox

Docker Sandboxes also provide advanced options, mainly via CLI flags.

1- Environment variables

You can pass environment variables into the sandbox, for example, to configure language, feature flags, or API endpoints:

docker sandbox run -e NODE_ENV=development -e DEBUG=true claude

If you change environment variables later, you need to:

1- Remove the previous sandbox:

docker sandbox ls 
docker sandbox rm <sandbox-id>

2- Run the agent again with new flags, so Docker creates a sandbox with the updated configuration.

2- Additional volume mounts

You may want to give the agent access to extra directories, for example:

  • A shared /datasets folder
  • A directory with common libraries or templates

You can use -v like standard Docker:

docker sandbox run \
-v /path/to/datasets:/datasets \
claude

Again, changes to mounted volumes require recreating the sandbox to take effect.

3- Optional Docker socket access

Sometimes you want the agent to manage containers itself (for example, testing a containerized app). For that, you can enable Docker socket access from the sandbox:

docker sandbox run --mount-docker-socket claude

This gives the agent power to:

  • Build images
  • Start/stop containers
  • Inspect running containers

You should use this carefully because it gives the agent a lot of control over your Docker environment.

How Is This Different from “Just Using Docker Containers”?

You might wonder: “Why not just run a normal container  docker run and mount my project?”

You can, and many people do. But Docker Sandboxes add higher-level behavior on top of plain containers:

1- Higher-level abstraction for agents

With plain Docker, you must manage:

  • Container creation
  • Volume configuration
  • Environment variables
  • Persistent vs ephemeral state
  • Credentials storage
  • Lifecycle (when to stop/remove/reset)

With Sandboxes, Docker handles:

  • One sandbox per workspace
  • Automatic workspace mounting
  • Persistent, scoped credentials
  • Reuse of the environment across runs
  • Simple lsinspectrm commands dedicated to sandboxes

2- Designed for AI coding workflows

Sandboxes are optimized for iterative, agent-driven development, where the agent:

  • Runs many small commands over time
  • Installs new tools as needed
  • Modifies files gradually
  • Needs to keep the state between sessions

Normal containers are usually:

  • Started explicitly by you
  • Short-lived or service-oriented (API, DB, etc.)
  • Not automatically tied to a workspace or agent

Sandboxes are essentially containers with opinionated defaults and a lifecycle tailored to AI agents.

Realistic Example: An Agent That Refactors Your Code

Imagine you have a large Node.js project and you want an AI agent to:

  • Convert it gradually to TypeScript
  • Introduce a better folder structure
  • Update imports and types
  • Run tests and fix failures

This involves:

  • Installing dev dependencies (TypeScript, type definitions, tools)
  • Modifying many source files
  • Running scripts (npm testnpm run build)
  • Potentially generating helper scripts on the fly

Without a sandbox

If the agent runs directly on your host:

  • It installs npm packages globally or into your project
  • If a script is malicious or buggy, it can:
  • Delete important files
  • Corrupt your environment
  • Access secrets and other directories

You must trust every command.

With a Docker Sandbox

  1. Start in your project directory:
cd ~/my-node-project 
docker sandbox run claude

2. The agent runs inside a container that mirrors your project directory.

  1. It can safely do things like:
npm install --save-dev typescript @types/node ts-node 
npx tsc --init
npm test

3. All package installs and environment tweaks live inside the sandbox. If something goes wrong:

docker sandbox rm <sandbox-id>

And you have a clean state the next time you run docker sandbox run claude.

This gives you:

  • The productivity of an autonomous agent
  • The safety of an isolated environment

As one article summarized it: Sandboxes provide “safe autonomy” for local coding agents by letting them execute commands, install packages, and modify files in an isolated workspace that mirrors your development setup.

When Should a Beginner Use Docker Sandboxes?

You should consider Docker Sandboxes if:

  • You use (or plan to use) AI coding agents that:
  • Run shell commands
  • Edit many files
  • Install tools and dependencies
  • You want to experiment with scripts or automation, you don’t fully trust yet
  • You care about keeping your local machine and global environment clean
  • You like the idea of being able to “reset everything” with a single command

If you are just starting with Docker and not yet using AI agents or automated code tools, regular containers and Docker Compose may be enough. But as soon as you introduce autonomous agents touching your codebase, Sandboxes become much more attractive.

Summary

For a beginner, here is the core idea in one sentence:

Docker Sandboxes let AI agents work on your real code in a local environment that feels like your machine, but is actually a safe container you can inspect, reuse, or delete at any time.

Key takeaways:

  • They are localcontainer-based environments for agents and automation.
  • They mirror your workspace while isolating execution from your host system.
  • They keep state across sessions (packages, temp files) per workspace.
  • They offer simple CLI commands: runlsinspectrm.
  • They are currently experimental and require Docker Desktop 4.50+.

As one article summarized it: Sandboxes provide “safe autonomy” for local coding agents by letting them execute commands, install packages, and modify files in an isolated workspace that mirrors your development setup.

Leave a Reply

Your email address will not be published. Required fields are marked *