Container Isolation

All agents run in isolated containers for security and consistency. Container isolation is always enabled.

How it works

When al start runs:

The base image (al-agent:latest) is built from docker/Dockerfile on first run
Per-agent images are built for any agent that has a custom Dockerfile
Each agent run launches a fresh container with:
- Read-only root filesystem
- Credentials mounted read-only at /credentials/
- Writable tmpfs at /tmp (2GB)
- All capabilities dropped, no-new-privileges
- PID, memory, and CPU limits
- Non-root user (uid 1000)
- A unique shutdown secret for the anti-exfiltration kill switch

Container runtime

Each agent run is a short-lived container that boots, runs a single LLM session, and exits. The entry point is node /app/dist/agents/container-entry.js.

Environment

The container receives everything it needs via environment variables and mounts:

Env var	Description
`AGENT_CONFIG`	JSON-serialized agent config (model, credentials, params) plus `ACTIONS.md` content
`PROMPT`	The fully assembled prompt (`<agent-config>` + `<credential-context>` + trigger text)
`TIMEOUT_SECONDS`	Max runtime in seconds (default: 3600). The container self-terminates if exceeded
`GATEWAY_URL`	HTTP URL of the host gateway (local Docker only — used for credential fetch and shutdown)
`SHUTDOWN_SECRET`	Unique per-run secret for the anti-exfiltration kill switch (local Docker only)

Credentials are injected in one of three ways depending on the runtime:

Runtime	Strategy	How it works
Local Docker	Volume mount	Files staged to a temp dir, mounted read-only at `/credentials/<type>/<instance>/<field>`
Cloud Run	Gateway fetch	Container fetches credentials from `GATEWAY_URL/credentials/<secret>` on startup
ECS Fargate	Env vars	Secrets Manager values injected as `AL_SECRET_<type>__<instance>__<field>` env vars

The container tries each strategy in order: volume mount, env vars, gateway. The first one that has data wins.

Startup sequence

Set working directory — chdir("/tmp")
Start self-termination timer — kills the process with exit code 124 if TIMEOUT_SECONDS is exceeded
Parse config — reads AGENT_CONFIG, extracts ACTIONS.md content
Load credentials — from volume, env vars, or gateway (see table above)
Inject env vars from credentials:
- GITHUB_TOKEN / GH_TOKEN from github_token credential
- SENTRY_AUTH_TOKEN from sentry_token credential
- GIT_SSH_COMMAND pointing to the mounted SSH key from git_ssh credential
- GIT_AUTHOR_NAME / GIT_AUTHOR_EMAIL / GIT_COMMITTER_NAME / GIT_COMMITTER_EMAIL from git_ssh credential
- Git HTTPS credential helper configured if GITHUB_TOKEN is set
Create pi-coding-agent session — initializes the LLM model, tools, and settings
Send prompt — delivers the pre-built prompt to the session

Agent session

The prompt is sent to the LLM with rate-limit retry (up to 5 attempts with exponential backoff, 30s to 5min). The LLM runs autonomously — reading files, executing commands, making API calls — until it finishes or hits an error. Unrecoverable error detection: The container watches for repeated auth/permission failures (e.g. “bad credentials”, “permission denied”, “resource not accessible by personal access token”). After 3 such errors, it aborts early rather than burning through retries.

Exit codes

Code	Meaning
0	Success — agent completed its work
1	Error — missing config, credential failure, unrecoverable errors, or uncaught exception
124	Timeout — `TIMEOUT_SECONDS` exceeded, container self-terminated

Log protocol

The container communicates with the scheduler via structured JSON lines on stdout. This is how the scheduler tracks progress, surfaces errors in the TUI, and writes log files. Structured log lines have the format:

{"_log": true, "level": "info", "msg": "bash", "cmd": "git clone ...", "ts": 1234567890}

The _log: true field distinguishes structured logs from plain text output. The scheduler parses these and forwards them to the logger at the appropriate level.

Field	Description
`_log`	Always `true` — marker for structured log lines
`level`	`"debug"`, `"info"`, `"warn"`, or `"error"`
`msg`	Log message (e.g. `"bash"`, `"tool error"`, `"credentials loaded from volume"`)
`ts`	Unix timestamp in milliseconds
`...`	Additional fields vary by message (e.g. `cmd`, `tool`, `error`, `result`)

Key log messages emitted during a run:

Message	Level	When
`"container starting"`	info	Boot, includes `agentName` and `modelId`
`"credentials loaded from ..."`	info	After credential loading (`volume`, `env vars`, or `gateway`)
`"SSH key configured for git"`	info	After SSH key setup
`"creating agent session"`	info	Before LLM session creation
`"session created, sending prompt"`	info	Prompt delivery
`"bash"`	info	Every bash tool call, with `cmd` field
`"tool error"`	error	Failed tool call, with `tool`, `cmd`, and `result` fields
`"rate limited, retrying prompt"`	warn	Rate limit hit, with `attempt` and `delayMs`
`"run completed"`	info	Agent finished successfully
`"no work to do"`	info	Agent found nothing to act on
`"container timeout reached, self-terminating"`	error	Timeout exceeded

Signal commands: The container has signal commands installed at /tmp/bin/ that write to $AL_SIGNAL_DIR:

Command	Description
`al-rerun`	Request an immediate rerun to drain remaining backlog. Without this, the scheduler treats the run as complete and waits for the next scheduled tick.
`al-status "<text>"`	Status update shown in the TUI. Example: `al-status "reviewing PR #42"`
`al-return "<value>"`	Return a value to the calling agent. Used when this agent was invoked via `al-call`.
`al-exit [code]`	Terminate with an exit code indicating an unrecoverable error. Defaults to 15.

Agent-to-agent calls: Agents can call other agents and retrieve structured results using shell commands injected into the container:

al-call <agent> — Call another agent. Pass context via stdin, get back a JSON response with a callId.
al-check <callId> — Check the status of a call (never blocks). Returns {"status":"pending|running|completed|error", ...}.
al-wait <callId> [callId...] [--timeout N] — Wait for one or more calls to complete (polls every 5s, default timeout 900s).

Example:

CALL_ID=$(echo "Review PR #42 on acme/app" | al-call reviewer | jq -r .callId)
# ... do other work ...
RESULT=$(al-wait "$CALL_ID" --timeout 600)
echo "$RESULT" | jq ".\"$CALL_ID\".returnValue"

The called agent receives an <agent-call> block with the caller name and context. To return a value, the called agent uses the al-return command:

al-return "PR looks good. Approved with minor suggestions."

Rules:

An agent cannot call itself (self-calls are rejected)
If all runners for the target agent are busy, the call is queued (up to workQueueSize limit in global config, default: 100)
Call chains are allowed (agent A calls B, B calls C) up to a configurable depth limit (maxCallDepth in config.toml, default: 3)
Called runs do not re-run — they respond to the single call
These commands require the gateway; they return errors if GATEWAY_URL is not set

Any stdout line that is not valid JSON with _log: true is treated as plain agent output (the LLM’s final text response).

Base image

The base image (al-agent:latest) is built automatically from the Action Llama package and includes the minimum needed for any agent:

Package	Why
`node:20-slim`	Runs the container entry point and pi-coding-agent SDK
`git`	Clone repos, create branches, push commits
`curl`	API calls (Sentry, arbitrary HTTP), anti-exfiltration shutdown
`ca-certificates`	HTTPS for git, curl, npm
`openssh-client`	SSH for `GIT_SSH_COMMAND` — git clone/push over SSH

The base image also copies the compiled Action Llama application (dist/) and installs its npm dependencies. The entry point is node /app/dist/agents/container-entry.js.

Project base image

The project Dockerfile (at the project root) lets you customize the base image for all agents in the project. It is created by al new and checked into git:

my-project/
  Dockerfile              <-- project base image (shared by all agents)
  config.toml
  dev/
    agent-config.toml
    ACTIONS.md
  reviewer/
    agent-config.toml
    ACTIONS.md

By default, the project Dockerfile is a bare FROM al-agent:latest with no customizations. In this state, it is skipped entirely — agents build directly on al-agent:latest with no overhead. To customize, add RUN, ENV, or other instructions:

FROM al-agent:latest

# Install tools shared by all agents
RUN apk add --no-cache python3 py3-pip github-cli

# Set shared environment variables
ENV MY_ORG=acme

When the project Dockerfile has customizations beyond the bare FROM, the build pipeline creates an intermediate image (al-project-base:latest) that all per-agent images layer on top of.

Image build order

al-agent:latest            ← Action Llama package (automatic)
    │
    ▼
al-project-base:latest     ← project Dockerfile (if customized)
    │
    ▼
al-<agent>:latest          ← per-agent Dockerfile (if present)

If the project Dockerfile is unmodified, the middle layer is skipped.

Custom agent images

Agents that need extra tools beyond what the project base provides can add a Dockerfile to their own directory:

my-project/
  Dockerfile              <-- project base (shared tools)
  dev/
    agent-config.toml
    ACTIONS.md
    Dockerfile            <-- custom image for this agent only
  reviewer/
    agent-config.toml
    ACTIONS.md
                          <-- no Dockerfile, uses project base

Extending the base image

Use FROM al-agent:latest and add what you need. The build pipeline automatically rewrites the FROM line to point at the correct base (either al-project-base:latest or the cloud registry URI). Switch to root to install packages, then back to node:

FROM al-agent:latest

USER root
RUN apk add --no-cache github-cli
USER node

This is a thin layer on top of the base — fast to build and shares most of the image. Tip: If multiple agents need the same tool, put it in the project Dockerfile instead of duplicating it across agent Dockerfiles. Common additions:

# GitHub CLI (for gh issue list, gh pr create, etc.)
RUN apk add --no-cache github-cli

# Python (for agents that run Python scripts)
RUN apk add --no-cache python3 py3-pip

# jq (for JSON processing in bash) — already in the base image
# RUN apk add --no-cache jq

Writing a standalone Dockerfile

If you need full control, you can write a Dockerfile from scratch. It must:

Include Node.js 20+
Copy the application code from the base image or install it
Set ENTRYPOINT ["node", "/app/dist/agents/container-entry.js"]
Use uid 1000 (USER node on node images) for compatibility with the container launcher

Example standalone Dockerfile:

FROM node:20-slim

# Install your tools
RUN apt-get update && apt-get install -y --no-install-recommends \
    git curl ca-certificates openssh-client gh jq python3 \
    && rm -rf /var/lib/apt/lists/*

# Copy app from the base image (avoids rebuilding from source)
COPY --from=al-agent:latest /app /app
WORKDIR /app

USER node
ENTRYPOINT ["node", "/app/dist/agents/container-entry.js"]

The key requirement is that /app/dist/agents/container-entry.js exists and can run. The entry point reads AGENT_CONFIG, PROMPT, GATEWAY_URL, and SHUTDOWN_SECRET from environment variables, and credentials from /credentials/.

Build behavior

The base image (al-agent:latest) is only built if it doesn’t exist yet
The project base image (al-project-base:latest) is rebuilt on every al start if the project Dockerfile has customizations
Agent images are named al-<agent-name>:latest (e.g. al-dev:latest) and are rebuilt on every al start to pick up Dockerfile changes
The build context is the Action Llama package root (not the project directory), so COPY paths in per-agent Dockerfiles reference the package’s dist/, package.json, etc.

Configuration

Key	Default	Description
`local.image`	`"al-agent:latest"`	Base Docker image name
`local.memory`	`"4g"`	Memory limit per container
`local.cpus`	`2`	CPU limit per container
`local.timeout`	`3600`	Max container runtime in seconds

For Cloud Run configuration, see Cloud Run docs. For ECS Fargate configuration, see ECS docs.

Container filesystem layout

Path	Mode	Contents
`/app`	read-only	Action Llama application + node_modules
`/credentials`	read-only	Mounted credential files (`/<type>/<instance>/<field>`)
`/tmp`	read-write (tmpfs, 2GB)	Agent working directory — repos, scratch files, SSH keys

Troubleshooting

“Docker is not running” — Start Docker Desktop or the Docker daemon before running al start. Base image build fails — Run docker build -t al-agent:latest -f docker/Dockerfile . from the Action Llama package directory to see the full build output. Project base image build fails — Check that the project Dockerfile starts with FROM al-agent:latest and that any apk add packages are spelled correctly. The base image uses Alpine Linux. Agent image build fails — Check that your agent’s Dockerfile starts with FROM al-agent:latest (the build pipeline rewrites this to the correct base) and that any package install commands are correct. Container exits immediately — Check al logs <agent> for the error. Common causes: missing credentials, missing ACTIONS.md, invalid model config.

Getting Started

Configuration

CLI

Features

Deployment

Examples

Docker

Container Isolation

How it works

Container runtime

Environment

Startup sequence

Agent session

Exit codes

Log protocol

Base image

Project base image

Image build order

Custom agent images

Extending the base image

Writing a standalone Dockerfile

Build behavior

Configuration

Container filesystem layout

Troubleshooting

Getting Started

Configuration

CLI

Features

Deployment

Examples

​Container Isolation

​How it works

​Container runtime

​Environment

​Startup sequence

​Agent session

​Exit codes

​Log protocol

​Base image

​Project base image

​Image build order

​Custom agent images

​Extending the base image

​Writing a standalone Dockerfile

​Build behavior

​Configuration

​Container filesystem layout

​Troubleshooting

Container Isolation

How it works

Container runtime

Environment

Startup sequence

Agent session

Exit codes

Log protocol

Base image

Project base image

Image build order

Custom agent images

Extending the base image

Writing a standalone Dockerfile

Build behavior

Configuration

Container filesystem layout

Troubleshooting