Skip to main content

Agents

An agent is a directory inside your project that contains instructions and configuration for an autonomous LLM session. Each run is self-contained: the agent wakes up (on a schedule or webhook), executes its task, and shuts down.

Structure

my-agent/
  agent-config.toml    # Required — credentials, model, schedule, webhooks, params
  ACTIONS.md          # Required — system prompt (the agent's instructions)
  Dockerfile           # Optional — custom Docker image for this agent
The directory name becomes the agent name. No registration is needed — the scheduler discovers agents by scanning for directories that contain an agent-config.toml.

agent-config.toml

Declares what the agent needs to run: which credentials to mount, which model to use, when to trigger, and any custom parameters.
credentials = ["github_token", "git_ssh"]
schedule = "*/5 * * * *"

[model]
provider = "anthropic"
model = "claude-sonnet-4-20250514"
thinkingLevel = "medium"
authType = "api_key"

[[webhooks]]
source = "my-github"
events = ["issues"]
actions = ["labeled"]
labels = ["agent"]

[params]
repos = ["acme/app"]
triggerLabel = "agent"
Key points:
  • credentials — list of credential refs ("type:instance") the agent needs at runtime. These are mounted into the container and injected as environment variables. See Credentials.
  • schedule and/or webhooks — at least one trigger is required. Agents can have both.
  • [model] — optional. If omitted, the agent inherits the default model from the project’s config.toml. See Models.
  • [params] — optional key-value pairs injected into the agent’s prompt as an <agent-config> JSON block. Use these for repo names, label names, org identifiers, or anything else your ACTIONS.md references.
See agent-config.toml Reference for the full field reference.

ACTIONS.md

The system prompt that defines the agent’s behavior. This is the most important file — it tells the LLM what to do, step by step. Write it as direct instructions to the model:
# My Agent

You are an automation agent. Your job is to ...

Your configuration is in the `<agent-config>` block at the start of your prompt.

`GITHUB_TOKEN` is already set in your environment. Use `gh` CLI and `git` directly.

## Workflow

1. **Check for work** — ...
2. **Do the work** — ...
3. **Report results** — ...

## Rules

- If you did work and there may be more, run `al-rerun`
- ...

How it’s used at runtime

The ACTIONS.md is set as the LLM’s system prompt. The scheduler then sends a user-message prompt assembled from several blocks:
  1. <agent-config> — JSON of the [params] table from agent-config.toml
  2. <credential-context> — describes which environment variables and tools are available (e.g. GITHUB_TOKEN, git, gh, SSH config)
  3. Trigger context (one of):
    • Scheduled run: “You are running on a schedule. Check for new work and act on anything you find.”
    • Manual run: “You have been triggered manually. Check for new work and act on anything you find.”
    • Webhook: <webhook-trigger> block with the full event payload (source, event, action, repo, etc.)
    • Agent call: <agent-call> block with the caller agent name and context
Your ACTIONS.md should reference <agent-config> for parameter values and handle both scheduled and webhook triggers if the agent uses both.

Language skills

Before the ACTIONS.md runs, the agent receives a preamble that teaches it a set of language skills — shorthand operations the actions can reference naturally. The preamble explains the underlying mechanics (curl commands, env vars) so agent authors never need to think about them. Skills currently taught to agents:
CategorySkillsDescription
Signalsal-rerun, al-status, al-return, al-exitShell commands for signaling the scheduler. See Signals.
Callsal-call, al-check, al-waitAgent-to-agent calls with return values. See Agent calls.
LocksLOCK(...), UNLOCK(...), HEARTBEAT(...)Resource locking for parallel coordination. See Resource locks.
CredentialsGITHUB_TOKEN, gh, git, etc.Credential access and tool usage. See Credentials.
Agent authors write the shorthand naturally (e.g. LOCK("github issue acme/app#42")). The agent learns what it means from the preamble — no need to document curl commands or API endpoints in your actions.

Signals

The agent uses shell commands to signal the scheduler:
CommandEffect
al-rerunTells the scheduler the agent did work and wants to be re-run immediately to drain remaining backlog.
al-status "<text>"Status update shown in the TUI (e.g. al-status "reviewing PR #42").
al-return "<value>"Returns a value to the calling agent when invoked via al-call.
al-exit [code]Terminates the agent with an exit code indicating an unrecoverable error.

Agent calls

Agents running in Docker mode can call other agents and retrieve their results using shell commands:
  • al-call <agent> — Call another agent. Pass context via stdin. Returns {"ok":true,"callId":"..."}.
  • al-check <callId> — Non-blocking status check. Returns {"status":"pending|running|completed|error", ...}.
  • al-wait <callId> [...] [--timeout N] — Wait for calls to complete (default timeout: 900s).
Calls are non-blocking: fire multiple calls, continue working, then collect results with al-wait. If the target agent’s runners are all busy, the call is queued until one frees up. Self-calls are rejected; call depth is bounded by maxCallDepth.

Runtime lifecycle

Each agent run is an isolated, short-lived container. Here’s what happens from trigger to exit:
  1. Trigger fires — a cron tick, webhook event, manual al run, or al-call from another agent.
  2. Container launches — a fresh container starts with credentials and config passed via environment variables and volume mounts.
  3. Credentials are loaded — the entry point reads credential files from /credentials/<type>/<instance>/<field> (local Docker and Cloud Run) or from AL_SECRET_* environment variables (ECS). Key credentials are injected as env vars the LLM can use directly: GITHUB_TOKEN, GH_TOKEN, SENTRY_AUTH_TOKEN, GIT_SSH_COMMAND, git author identity, etc.
  4. LLM session starts — the model is initialized and receives two inputs:
    • System prompt: the contents of ACTIONS.md
    • User prompt: <agent-config> (params JSON) + <credential-context> (available env vars, tools, and security policy) + trigger context (schedule, webhook payload, or agent call)
  5. Agent runs autonomously — the LLM executes tools (bash, file I/O, API calls) until it finishes or hits an error. Rate-limited API calls are retried automatically (up to 5 attempts with exponential backoff).
  6. Error detection — the container watches for repeated auth/permission failures (e.g. “bad credentials”, “permission denied”). After 3 such errors, it aborts early.
  7. Signals are processed — the agent uses al-rerun, al-status, al-return, and al-exit commands to write signal files. The scheduler reads them after the session ends.
  8. Container exits — exit code 0 (success), 1 (error), or 124 (timeout). Any held locks are released automatically. The scheduler logs the result and the container is removed.

Timeout

Each container has a self-termination timer controlled by local.timeout in config.toml (default: 3600 seconds / 1 hour). If the timer fires, the process exits with code 124. This is a hard kill — there is no graceful shutdown.

Reruns

When a scheduled agent runs al-rerun, the scheduler immediately re-runs it. This continues until the agent completes without al-rerun (no more work), hits an error, or reaches the maxReruns limit (default: 10, configurable in config.toml). This lets an agent drain its work queue without waiting for the next cron tick. Webhook-triggered and agent-called runs do not re-run — they respond to a single event. See Docker docs for the full container reference including the startup sequence, log protocol, filesystem layout, and exit codes.

Resource locks

When you set scale > 1 on an agent, multiple instances run concurrently. Without coordination, two instances might pick up the same GitHub issue, review the same PR, or deploy the same service at the same time. Resource locks prevent this. Locks are managed by the scheduler and available to all agents running in Docker mode. Each lock is identified by a resource key — for example, LOCK("github issue acme/app#42").

How it works

  1. Before working on a shared resource, the agent calls LOCK("resource key").
  2. If the lock is free, the agent gets it and proceeds.
  3. If another instance already holds the lock, the agent gets back the holder’s name and skips that resource.
  4. When done, the agent calls UNLOCK("resource key").
The agent learns the lock API from a preamble injected before the actions run. Agent authors just write the shorthand — no need to think about HTTP endpoints or authentication.

Operations

OperationDescription
LOCK(resourceKey)Acquire an exclusive lock on a resource. Fails if another instance holds it.
UNLOCK(resourceKey)Release a lock. Only the holder can release.
HEARTBEAT(resourceKey)Reset the TTL on a held lock. Use during long-running work to prevent expiry.

One lock at a time

Each agent instance can hold at most one lock. This keeps the model simple — the agent locks a resource, does the work, unlocks, then moves to the next item. If it tries to acquire a second lock without releasing the first, the request is rejected with a clear error message.

Timeout (TTL)

Locks expire automatically after 30 minutes by default. This prevents deadlocks if an agent crashes or hangs without releasing its lock. The timeout is configurable via gateway.lockTimeout in config.toml (value in seconds). For work that takes longer than the timeout, use HEARTBEAT to extend the TTL. Each heartbeat resets the clock to another full TTL period. If the agent forgets to heartbeat and the lock expires, another instance can claim it.

Authentication

Each container gets a unique per-run secret (the same one used for the shutdown API). Lock requests are authenticated with this secret, so only the container that acquired a lock can release or heartbeat it. There is no way for one agent instance to release another’s lock — it must wait for the TTL to expire.

Auto-release on exit

When a container exits — whether it finishes successfully, hits an error, or times out — all of its locks are released automatically by the scheduler. You don’t need to worry about cleanup in error paths.

Example agents

## Workflow

1. List open issues labeled "agent" in repos from `<agent-config>`
2. For each issue:
   - LOCK("github issue owner/repo#123")
   - If the lock fails, skip this issue — another instance is handling it
   - Clone the repo, create a branch, implement the fix
   - Open a PR and link it to the issue
   - UNLOCK("github issue owner/repo#123")
3. If you completed work and there may be more issues, run `al-rerun`

Resource key conventions

Use descriptive, unique keys:
Resource keyExample
github issue owner/repo#numberLOCK("github issue acme/app#42")
github pr owner/repo#numberLOCK("github pr acme/app#17")
deploy service-nameLOCK("deploy api-prod")

Configuration

SettingLocationDefaultDescription
gateway.lockTimeoutconfig.toml1800 (30 min)Default TTL for locks in seconds

Dockerfile (optional)

The base Docker image includes Node.js, git, curl, and openssh. If your agent needs additional tools, add a Dockerfile to the agent directory:
FROM al-agent:latest
USER root
RUN apk add --no-cache github-cli jq python3
USER node
Agents without a Dockerfile use the base image directly. See Docker docs for the full container reference including the base image contents, filesystem layout, and how to write standalone Dockerfiles.

Examples

AgentDescription
Dev AgentPicks up GitHub issues and implements changes
Reviewer AgentReviews and merges open pull requests
DevOps AgentMonitors CI failures and Sentry errors, files issues

See also