Skip to main content

Container Isolation

All agents run in isolated containers for security and consistency. Container isolation is always enabled.

How it works

When al start runs:
  1. The base image (al-agent:latest) is built from docker/Dockerfile on first run
  2. Per-agent images are built for any agent that has a custom Dockerfile
  3. Each agent run launches a fresh container with:
    • Read-only root filesystem
    • Credentials mounted read-only at /credentials/
    • Writable tmpfs at /tmp (2GB)
    • All capabilities dropped, no-new-privileges
    • PID, memory, and CPU limits
    • Non-root user (uid 1000)
    • A unique shutdown secret for the anti-exfiltration kill switch

Container runtime

Each agent run is a short-lived container that boots, runs a single LLM session, and exits. The entry point is node /app/dist/agents/container-entry.js.

Environment

The container receives everything it needs via environment variables and mounts:
Env varDescription
AGENT_CONFIGJSON-serialized agent config (model, credentials, params) plus ACTIONS.md content
PROMPTThe fully assembled prompt (<agent-config> + <credential-context> + trigger text)
TIMEOUT_SECONDSMax runtime in seconds (default: 3600). The container self-terminates if exceeded
GATEWAY_URLHTTP URL of the host gateway (local Docker only — used for credential fetch and shutdown)
SHUTDOWN_SECRETUnique per-run secret for the anti-exfiltration kill switch (local Docker only)
Credentials are injected in one of three ways depending on the runtime:
RuntimeStrategyHow it works
Local DockerVolume mountFiles staged to a temp dir, mounted read-only at /credentials/<type>/<instance>/<field>
Cloud RunGateway fetchContainer fetches credentials from GATEWAY_URL/credentials/<secret> on startup
ECS FargateEnv varsSecrets Manager values injected as AL_SECRET_<type>__<instance>__<field> env vars
The container tries each strategy in order: volume mount, env vars, gateway. The first one that has data wins.

Startup sequence

  1. Set working directorychdir("/tmp")
  2. Start self-termination timer — kills the process with exit code 124 if TIMEOUT_SECONDS is exceeded
  3. Parse config — reads AGENT_CONFIG, extracts ACTIONS.md content
  4. Load credentials — from volume, env vars, or gateway (see table above)
  5. Inject env vars from credentials:
    • GITHUB_TOKEN / GH_TOKEN from github_token credential
    • SENTRY_AUTH_TOKEN from sentry_token credential
    • GIT_SSH_COMMAND pointing to the mounted SSH key from git_ssh credential
    • GIT_AUTHOR_NAME / GIT_AUTHOR_EMAIL / GIT_COMMITTER_NAME / GIT_COMMITTER_EMAIL from git_ssh credential
    • Git HTTPS credential helper configured if GITHUB_TOKEN is set
  6. Create pi-coding-agent session — initializes the LLM model, tools, and settings
  7. Send prompt — delivers the pre-built prompt to the session

Agent session

The prompt is sent to the LLM with rate-limit retry (up to 5 attempts with exponential backoff, 30s to 5min). The LLM runs autonomously — reading files, executing commands, making API calls — until it finishes or hits an error. Unrecoverable error detection: The container watches for repeated auth/permission failures (e.g. “bad credentials”, “permission denied”, “resource not accessible by personal access token”). After 3 such errors, it aborts early rather than burning through retries.

Exit codes

CodeMeaning
0Success — agent completed its work
1Error — missing config, credential failure, unrecoverable errors, or uncaught exception
124Timeout — TIMEOUT_SECONDS exceeded, container self-terminated

Log protocol

The container communicates with the scheduler via structured JSON lines on stdout. This is how the scheduler tracks progress, surfaces errors in the TUI, and writes log files. Structured log lines have the format:
{"_log": true, "level": "info", "msg": "bash", "cmd": "git clone ...", "ts": 1234567890}
The _log: true field distinguishes structured logs from plain text output. The scheduler parses these and forwards them to the logger at the appropriate level.
FieldDescription
_logAlways true — marker for structured log lines
level"debug", "info", "warn", or "error"
msgLog message (e.g. "bash", "tool error", "credentials loaded from volume")
tsUnix timestamp in milliseconds
...Additional fields vary by message (e.g. cmd, tool, error, result)
Key log messages emitted during a run:
MessageLevelWhen
"container starting"infoBoot, includes agentName and modelId
"credentials loaded from ..."infoAfter credential loading (volume, env vars, or gateway)
"SSH key configured for git"infoAfter SSH key setup
"creating agent session"infoBefore LLM session creation
"session created, sending prompt"infoPrompt delivery
"bash"infoEvery bash tool call, with cmd field
"tool error"errorFailed tool call, with tool, cmd, and result fields
"rate limited, retrying prompt"warnRate limit hit, with attempt and delayMs
"run completed"infoAgent finished successfully
"no work to do"infoAgent found nothing to act on
"container timeout reached, self-terminating"errorTimeout exceeded
Signal commands: The container has signal commands installed at /tmp/bin/ that write to $AL_SIGNAL_DIR:
CommandDescription
al-rerunRequest an immediate rerun to drain remaining backlog. Without this, the scheduler treats the run as complete and waits for the next scheduled tick.
al-status "<text>"Status update shown in the TUI. Example: al-status "reviewing PR #42"
al-return "<value>"Return a value to the calling agent. Used when this agent was invoked via al-call.
al-exit [code]Terminate with an exit code indicating an unrecoverable error. Defaults to 15.
Agent-to-agent calls: Agents can call other agents and retrieve structured results using shell commands injected into the container:
  • al-call <agent> — Call another agent. Pass context via stdin, get back a JSON response with a callId.
  • al-check <callId> — Check the status of a call (never blocks). Returns {"status":"pending|running|completed|error", ...}.
  • al-wait <callId> [callId...] [--timeout N] — Wait for one or more calls to complete (polls every 5s, default timeout 900s).
Example:
CALL_ID=$(echo "Review PR #42 on acme/app" | al-call reviewer | jq -r .callId)
# ... do other work ...
RESULT=$(al-wait "$CALL_ID" --timeout 600)
echo "$RESULT" | jq ".\"$CALL_ID\".returnValue"
The called agent receives an <agent-call> block with the caller name and context. To return a value, the called agent uses the al-return command:
al-return "PR looks good. Approved with minor suggestions."
Rules:
  • An agent cannot call itself (self-calls are rejected)
  • If all runners for the target agent are busy, the call is queued (up to workQueueSize limit in global config, default: 100)
  • Call chains are allowed (agent A calls B, B calls C) up to a configurable depth limit (maxCallDepth in config.toml, default: 3)
  • Called runs do not re-run — they respond to the single call
  • These commands require the gateway; they return errors if GATEWAY_URL is not set
Any stdout line that is not valid JSON with _log: true is treated as plain agent output (the LLM’s final text response).

Base image

The base image (al-agent:latest) is built automatically from the Action Llama package and includes the minimum needed for any agent:
PackageWhy
node:20-slimRuns the container entry point and pi-coding-agent SDK
gitClone repos, create branches, push commits
curlAPI calls (Sentry, arbitrary HTTP), anti-exfiltration shutdown
ca-certificatesHTTPS for git, curl, npm
openssh-clientSSH for GIT_SSH_COMMAND — git clone/push over SSH
The base image also copies the compiled Action Llama application (dist/) and installs its npm dependencies. The entry point is node /app/dist/agents/container-entry.js.

Project base image

The project Dockerfile (at the project root) lets you customize the base image for all agents in the project. It is created by al new and checked into git:
my-project/
  Dockerfile              <-- project base image (shared by all agents)
  config.toml
  dev/
    agent-config.toml
    ACTIONS.md
  reviewer/
    agent-config.toml
    ACTIONS.md
By default, the project Dockerfile is a bare FROM al-agent:latest with no customizations. In this state, it is skipped entirely — agents build directly on al-agent:latest with no overhead. To customize, add RUN, ENV, or other instructions:
FROM al-agent:latest

# Install tools shared by all agents
RUN apk add --no-cache python3 py3-pip github-cli

# Set shared environment variables
ENV MY_ORG=acme
When the project Dockerfile has customizations beyond the bare FROM, the build pipeline creates an intermediate image (al-project-base:latest) that all per-agent images layer on top of.

Image build order

al-agent:latest            ← Action Llama package (automatic)


al-project-base:latest     ← project Dockerfile (if customized)


al-<agent>:latest          ← per-agent Dockerfile (if present)
If the project Dockerfile is unmodified, the middle layer is skipped.

Custom agent images

Agents that need extra tools beyond what the project base provides can add a Dockerfile to their own directory:
my-project/
  Dockerfile              <-- project base (shared tools)
  dev/
    agent-config.toml
    ACTIONS.md
    Dockerfile            <-- custom image for this agent only
  reviewer/
    agent-config.toml
    ACTIONS.md
                          <-- no Dockerfile, uses project base

Extending the base image

Use FROM al-agent:latest and add what you need. The build pipeline automatically rewrites the FROM line to point at the correct base (either al-project-base:latest or the cloud registry URI). Switch to root to install packages, then back to node:
FROM al-agent:latest

USER root
RUN apk add --no-cache github-cli
USER node
This is a thin layer on top of the base — fast to build and shares most of the image. Tip: If multiple agents need the same tool, put it in the project Dockerfile instead of duplicating it across agent Dockerfiles. Common additions:
# GitHub CLI (for gh issue list, gh pr create, etc.)
RUN apk add --no-cache github-cli

# Python (for agents that run Python scripts)
RUN apk add --no-cache python3 py3-pip

# jq (for JSON processing in bash) — already in the base image
# RUN apk add --no-cache jq

Writing a standalone Dockerfile

If you need full control, you can write a Dockerfile from scratch. It must:
  1. Include Node.js 20+
  2. Copy the application code from the base image or install it
  3. Set ENTRYPOINT ["node", "/app/dist/agents/container-entry.js"]
  4. Use uid 1000 (USER node on node images) for compatibility with the container launcher
Example standalone Dockerfile:
FROM node:20-slim

# Install your tools
RUN apt-get update && apt-get install -y --no-install-recommends \
    git curl ca-certificates openssh-client gh jq python3 \
    && rm -rf /var/lib/apt/lists/*

# Copy app from the base image (avoids rebuilding from source)
COPY --from=al-agent:latest /app /app
WORKDIR /app

USER node
ENTRYPOINT ["node", "/app/dist/agents/container-entry.js"]
The key requirement is that /app/dist/agents/container-entry.js exists and can run. The entry point reads AGENT_CONFIG, PROMPT, GATEWAY_URL, and SHUTDOWN_SECRET from environment variables, and credentials from /credentials/.

Build behavior

  • The base image (al-agent:latest) is only built if it doesn’t exist yet
  • The project base image (al-project-base:latest) is rebuilt on every al start if the project Dockerfile has customizations
  • Agent images are named al-<agent-name>:latest (e.g. al-dev:latest) and are rebuilt on every al start to pick up Dockerfile changes
  • The build context is the Action Llama package root (not the project directory), so COPY paths in per-agent Dockerfiles reference the package’s dist/, package.json, etc.

Configuration

KeyDefaultDescription
local.image"al-agent:latest"Base Docker image name
local.memory"4g"Memory limit per container
local.cpus2CPU limit per container
local.timeout3600Max container runtime in seconds
For Cloud Run configuration, see Cloud Run docs. For ECS Fargate configuration, see ECS docs.

Container filesystem layout

PathModeContents
/appread-onlyAction Llama application + node_modules
/credentialsread-onlyMounted credential files (/<type>/<instance>/<field>)
/tmpread-write (tmpfs, 2GB)Agent working directory — repos, scratch files, SSH keys

Troubleshooting

“Docker is not running” — Start Docker Desktop or the Docker daemon before running al start. Base image build fails — Run docker build -t al-agent:latest -f docker/Dockerfile . from the Action Llama package directory to see the full build output. Project base image build fails — Check that the project Dockerfile starts with FROM al-agent:latest and that any apk add packages are spelled correctly. The base image uses Alpine Linux. Agent image build fails — Check that your agent’s Dockerfile starts with FROM al-agent:latest (the build pipeline rewrites this to the correct base) and that any package install commands are correct. Container exits immediately — Check al logs <agent> for the error. Common causes: missing credentials, missing ACTIONS.md, invalid model config.