Agents
An agent is a directory inside your project that contains instructions and configuration for an autonomous LLM session. Each run is self-contained: the agent wakes up (on a schedule or webhook), executes its task, and shuts down.Structure
agent-config.toml.
agent-config.toml
Declares what the agent needs to run: which credentials to mount, which model to use, when to trigger, and any custom parameters.
credentials— list of credential refs ("type:instance") the agent needs at runtime. These are mounted into the container and injected as environment variables. See Credentials.scheduleand/orwebhooks— at least one trigger is required. Agents can have both.[model]— optional. If omitted, the agent inherits the default model from the project’sconfig.toml. See Models.[params]— optional key-value pairs injected into the agent’s prompt as an<agent-config>JSON block. Use these for repo names, label names, org identifiers, or anything else your ACTIONS.md references.
ACTIONS.md
The system prompt that defines the agent’s behavior. This is the most important file — it tells the LLM what to do, step by step.
Write it as direct instructions to the model:
How it’s used at runtime
The ACTIONS.md is set as the LLM’s system prompt. The scheduler then sends a user-message prompt assembled from several blocks:<agent-config>— JSON of the[params]table fromagent-config.toml<credential-context>— describes which environment variables and tools are available (e.g.GITHUB_TOKEN,git,gh, SSH config)- Trigger context (one of):
- Scheduled run: “You are running on a schedule. Check for new work and act on anything you find.”
- Manual run: “You have been triggered manually. Check for new work and act on anything you find.”
- Webhook:
<webhook-trigger>block with the full event payload (source, event, action, repo, etc.) - Agent call:
<agent-call>block with the caller agent name and context
<agent-config> for parameter values and handle both scheduled and webhook triggers if the agent uses both.
Language skills
Before the ACTIONS.md runs, the agent receives a preamble that teaches it a set of language skills — shorthand operations the actions can reference naturally. The preamble explains the underlying mechanics (curl commands, env vars) so agent authors never need to think about them. Skills currently taught to agents:| Category | Skills | Description |
|---|---|---|
| Signals | al-rerun, al-status, al-return, al-exit | Shell commands for signaling the scheduler. See Signals. |
| Calls | al-call, al-check, al-wait | Agent-to-agent calls with return values. See Agent calls. |
| Locks | LOCK(...), UNLOCK(...), HEARTBEAT(...) | Resource locking for parallel coordination. See Resource locks. |
| Credentials | GITHUB_TOKEN, gh, git, etc. | Credential access and tool usage. See Credentials. |
LOCK("github issue acme/app#42")). The agent learns what it means from the preamble — no need to document curl commands or API endpoints in your actions.
Signals
The agent uses shell commands to signal the scheduler:| Command | Effect |
|---|---|
al-rerun | Tells the scheduler the agent did work and wants to be re-run immediately to drain remaining backlog. |
al-status "<text>" | Status update shown in the TUI (e.g. al-status "reviewing PR #42"). |
al-return "<value>" | Returns a value to the calling agent when invoked via al-call. |
al-exit [code] | Terminates the agent with an exit code indicating an unrecoverable error. |
Agent calls
Agents running in Docker mode can call other agents and retrieve their results using shell commands:al-call <agent>— Call another agent. Pass context via stdin. Returns{"ok":true,"callId":"..."}.al-check <callId>— Non-blocking status check. Returns{"status":"pending|running|completed|error", ...}.al-wait <callId> [...] [--timeout N]— Wait for calls to complete (default timeout: 900s).
al-wait. If the target agent’s runners are all busy, the call is queued until one frees up. Self-calls are rejected; call depth is bounded by maxCallDepth.
Runtime lifecycle
Each agent run is an isolated, short-lived container. Here’s what happens from trigger to exit:- Trigger fires — a cron tick, webhook event, manual
al run, oral-callfrom another agent. - Container launches — a fresh container starts with credentials and config passed via environment variables and volume mounts.
- Credentials are loaded — the entry point reads credential files from
/credentials/<type>/<instance>/<field>(local Docker and Cloud Run) or fromAL_SECRET_*environment variables (ECS). Key credentials are injected as env vars the LLM can use directly:GITHUB_TOKEN,GH_TOKEN,SENTRY_AUTH_TOKEN,GIT_SSH_COMMAND, git author identity, etc. - LLM session starts — the model is initialized and receives two inputs:
- System prompt: the contents of
ACTIONS.md - User prompt:
<agent-config>(params JSON) +<credential-context>(available env vars, tools, and security policy) + trigger context (schedule, webhook payload, or agent call)
- System prompt: the contents of
- Agent runs autonomously — the LLM executes tools (bash, file I/O, API calls) until it finishes or hits an error. Rate-limited API calls are retried automatically (up to 5 attempts with exponential backoff).
- Error detection — the container watches for repeated auth/permission failures (e.g. “bad credentials”, “permission denied”). After 3 such errors, it aborts early.
- Signals are processed — the agent uses
al-rerun,al-status,al-return, andal-exitcommands to write signal files. The scheduler reads them after the session ends. - Container exits — exit code 0 (success), 1 (error), or 124 (timeout). Any held locks are released automatically. The scheduler logs the result and the container is removed.
Timeout
Each container has a self-termination timer controlled bylocal.timeout in config.toml (default: 3600 seconds / 1 hour). If the timer fires, the process exits with code 124. This is a hard kill — there is no graceful shutdown.
Reruns
When a scheduled agent runsal-rerun, the scheduler immediately re-runs it. This continues until the agent completes without al-rerun (no more work), hits an error, or reaches the maxReruns limit (default: 10, configurable in config.toml). This lets an agent drain its work queue without waiting for the next cron tick.
Webhook-triggered and agent-called runs do not re-run — they respond to a single event.
See Docker docs for the full container reference including the startup sequence, log protocol, filesystem layout, and exit codes.
Resource locks
When you setscale > 1 on an agent, multiple instances run concurrently. Without coordination, two instances might pick up the same GitHub issue, review the same PR, or deploy the same service at the same time. Resource locks prevent this.
Locks are managed by the scheduler and available to all agents running in Docker mode. Each lock is identified by a resource key — for example, LOCK("github issue acme/app#42").
How it works
- Before working on a shared resource, the agent calls
LOCK("resource key"). - If the lock is free, the agent gets it and proceeds.
- If another instance already holds the lock, the agent gets back the holder’s name and skips that resource.
- When done, the agent calls
UNLOCK("resource key").
Operations
| Operation | Description |
|---|---|
LOCK(resourceKey) | Acquire an exclusive lock on a resource. Fails if another instance holds it. |
UNLOCK(resourceKey) | Release a lock. Only the holder can release. |
HEARTBEAT(resourceKey) | Reset the TTL on a held lock. Use during long-running work to prevent expiry. |
One lock at a time
Each agent instance can hold at most one lock. This keeps the model simple — the agent locks a resource, does the work, unlocks, then moves to the next item. If it tries to acquire a second lock without releasing the first, the request is rejected with a clear error message.Timeout (TTL)
Locks expire automatically after 30 minutes by default. This prevents deadlocks if an agent crashes or hangs without releasing its lock. The timeout is configurable viagateway.lockTimeout in config.toml (value in seconds).
For work that takes longer than the timeout, use HEARTBEAT to extend the TTL. Each heartbeat resets the clock to another full TTL period. If the agent forgets to heartbeat and the lock expires, another instance can claim it.
Authentication
Each container gets a unique per-run secret (the same one used for the shutdown API). Lock requests are authenticated with this secret, so only the container that acquired a lock can release or heartbeat it. There is no way for one agent instance to release another’s lock — it must wait for the TTL to expire.Auto-release on exit
When a container exits — whether it finishes successfully, hits an error, or times out — all of its locks are released automatically by the scheduler. You don’t need to worry about cleanup in error paths.Example agents
Resource key conventions
Use descriptive, unique keys:| Resource key | Example |
|---|---|
github issue owner/repo#number | LOCK("github issue acme/app#42") |
github pr owner/repo#number | LOCK("github pr acme/app#17") |
deploy service-name | LOCK("deploy api-prod") |
Configuration
| Setting | Location | Default | Description |
|---|---|---|---|
gateway.lockTimeout | config.toml | 1800 (30 min) | Default TTL for locks in seconds |
Dockerfile (optional)
The base Docker image includes Node.js, git, curl, and openssh. If your agent needs additional tools, add a Dockerfile to the agent directory:
Examples
| Agent | Description |
|---|---|
| Dev Agent | Picks up GitHub issues and implements changes |
| Reviewer Agent | Reviews and merges open pull requests |
| DevOps Agent | Monitors CI failures and Sentry errors, files issues |
See also
- Creating Agents — step-by-step setup guide
- agent-config.toml Reference — all config fields
- Models — supported LLM providers and model IDs
- Credentials — credential types and storage
- Webhooks — webhook setup and filter fields
- Docker — container isolation and custom images