Agents - Action Llama

An agent is a directory inside your project that contains instructions and configuration for an autonomous LLM session. Each run is self-contained: the agent wakes up (on a schedule or webhook), executes its task, and shuts down.

Skills vs Agents

A skill is a portable artifact — a SKILL.md file (and optionally a Dockerfile) that defines what an agent does. Skills can be shared, published, and installed from git repos. An agent is a skill instantiated in your project with local runtime configuration. When you run al add to install a skill, it becomes an agent with its own config.toml for project-specific settings like credentials, schedule, and model.

Agent Structure

An agent is a directory with at least two files:

agents/<name>/
├── SKILL.md        # Portable metadata + instructions (the skill)
├── config.toml     # Project-local runtime config
└── Dockerfile      # Optional — custom container image

SKILL.md contains portable metadata (name, description, license, compatibility) in its YAML frontmatter, and the agent’s instructions in its markdown body.
config.toml contains project-specific runtime configuration: credentials, models, schedule, webhooks, hooks, params, scale, and timeout.
Dockerfile is optional. Defines custom container dependencies. May be provided by the skill author or customized per-project.

The directory name becomes the agent name. No registration is needed — the scheduler discovers agents by scanning for directories that contain a SKILL.md.

How Context is Assembled

At runtime, the agent’s LLM session receives two inputs:

System prompt

The markdown body of SKILL.md, prepended with a preamble that teaches the agent its language skills.

User prompt

Assembled from several blocks:

<agent-config> — JSON of the params field from config.toml
<credential-context> — describes which environment variables and tools are available (e.g. GITHUB_TOKEN, git, gh, SSH config)
<environment> — filesystem constraints and working directory info
Trigger context (one of):
- Scheduled run: “You are running on a schedule. Check for new work and act on anything you find.”
- Manual run: “You have been triggered manually. Check for new work and act on anything you find.”
- Webhook: <webhook-trigger> block with the full event payload (source, event, action, repo, etc.)
- Subagent call: <skill-subagent> block with the caller agent name and context

Your SKILL.md instructions should reference <agent-config> for parameter values and handle both scheduled and webhook triggers if the agent uses both.

Language Skills

Before the SKILL.md instructions run, the agent receives a preamble that teaches it a set of language skills — shorthand operations the skill can reference naturally. The preamble explains the underlying mechanics (curl commands, env vars) so agent authors never need to think about them.

Category	Skills	Description
Signals	`set_status`, `return_value`	Tools for reporting status and returning values to the scheduler
Calls	`call_agent`, `check_call`	Agent-to-agent calls with return values
Wait	`wait_for_trigger`	Suspend and resume on a future webhook or agent trigger
Locks	`acquire_lock`, `release_lock`	Resource locking for parallel coordination
Credentials	`GITHUB_TOKEN`, `gh`, `git`, etc.	Credential access and tool usage

Agent authors reference the tools naturally in their SKILL.md (e.g. acquire_lock with a resource URI). The agent learns what the tools do from the preamble. See Agent Tools for the complete tool reference.

Runtime Lifecycle

Each agent run is an isolated, short-lived process. By default agents run in Docker containers, but agents can also be configured to run on the host machine under a separate OS user (see Runtime). Here’s the full sequence from trigger to exit:

Trigger fires — a cron tick, webhook event, manual al run, or call_agent from another agent.
Work is queued — if all runners for the target agent are busy, the trigger is placed in a SQLite-backed work queue until a runner becomes available.
Process launches — a fresh container (or host-user process) starts with credentials and config passed via environment variables and volume mounts (or temp directories).
Credentials are loaded — the entry point reads credential files from the credentials path (/credentials/ in containers, or the AL_CREDENTIALS_PATH temp directory in host-user mode). Key credentials are injected as env vars the LLM can use directly: GITHUB_TOKEN, GH_TOKEN, SENTRY_AUTH_TOKEN, GIT_SSH_COMMAND, git author identity, etc.
Hooks run — if hooks.pre steps are defined in config.toml, they execute sequentially (clone repos, fetch data, run shell commands) to stage context before the LLM starts. See Dynamic Context.
LLM session starts — the model receives the SKILL.md instructions as system prompt and the assembled user prompt.
Agent runs autonomously — the LLM executes tools (bash, file I/O, API calls) until it finishes or hits an error. Rate-limited API calls are retried automatically (up to 5 attempts with exponential backoff).
Wait (optional) — the agent may call wait_for_trigger to suspend mid-conversation and wait for a specific webhook event or agent trigger. While waiting, the container is paused (docker pause) and the runner slot is released. When the matching trigger arrives, the container resumes and the agent continues where it left off. See Wait & Resume.
Error detection — the runtime watches for repeated auth/permission failures (e.g. “bad credentials”, “permission denied”). After 3 such errors, it aborts early.
Signals are processed — the agent uses the set_status and return_value tools to communicate with the scheduler during the session.
Process exits — exit code 0 (success), 1 (error), or 124 (timeout). Any held locks are released automatically. The scheduler logs the result and the container is removed (or the working directory is cleaned up in host-user mode).

Timeout

Each agent process has a self-termination timer controlled by timeout in the agent’s config.toml (falls back to local.timeout in project config.toml, then 900 seconds). If the timer fires, the process exits with code 124. This is a hard kill — there is no graceful shutdown. See Agent Config — Timeout for configuration.

Reruns

The scheduler automatically handles reruns for scheduled agents. When a scheduled run completes successfully and more work may be available, the scheduler can re-run the agent immediately. This continues until the agent completes with no remaining work, hits an error, or reaches the maxReruns limit (default: 10, configurable in config.toml). This lets agents drain their work queue without waiting for the next cron tick. Webhook-triggered and agent-called runs do not re-run — they respond to a single event.

Work Queue

When a trigger fires (webhook event or agent call) but all runner instances for the target agent are busy, the event is placed in a work queue instead of being dropped. Items are dequeued and executed as runners become available. The queue is backed by SQLite (.al/work-queue.db), so pending items survive scheduler restarts. Each agent has its own queue. If the queue is full, the oldest items are dropped. You can see queue depth per agent in al stat output (the queue column).

Setting	Location	Default	Description
`workQueueSize`	`config.toml`	`100`	Maximum queued work items per agent

Wait & Resume

Agents can suspend mid-conversation and wait for a future trigger using the wait_for_trigger tool. This enables multi-step workflows like: process a PR opened event, then wait for it to merge, then deploy.

How it works

The agent calls wait_for_trigger with a filter (webhook source/event/match predicates, or agent trigger source).
The transport is disconnected and the container is paused (docker pause).
The runner slot is released back to the pool, so other work can use it.
When a matching trigger arrives, the container is unpaused, the transport reconnects, and the agent resumes where it left off with the trigger payload.

Timeout

Waiting instances have a timeout (default: 30 minutes). If no matching trigger arrives before the deadline, the wait fails with a timeout error and the agent can handle it. The timeout is configurable:

Setting	Location	Default	Description
`defaultWaitTimeout`	`config.toml`	`1800`	Project-wide default wait timeout in seconds
`waitTimeout`	`agents/<name>/config.toml`	(project default)	Per-agent wait timeout in seconds
`timeout` parameter	`wait_for_trigger` tool call	(agent default)	Per-call timeout (e.g. `"2h"`, `"30m"`)

Agent state

While an agent is waiting, its state in the TUI and dashboard shows as Waiting (cyan). The state transitions are:

running → waiting — when wait_for_trigger is called
waiting → running — when a matching trigger arrives
waiting → error — when the timeout expires
waiting → killed — when the user kills the instance

Limitations

Environment variables set during the run are not preserved across wait/resume (the shell session is terminated and recreated).
The working directory (cwd) is restored on resume.
Wait is only supported for webhook and agent-trigger types. Schedule and manual triggers cannot be waited on.

Container Filesystem

When running in the default container runtime:

Path	Mode	Contents
`/app`	read-only	Action Llama application + node_modules
`/credentials`	read-only	Mounted credential files (`/<type>/<instance>/<field>`)
`/tmp`	read-write (tmpfs, 2GB)	Agent working directory — repos, scratch files, SSH keys
`/home/node`	read-write (64MB)	Home directory

The root filesystem is read-only. All agent work should happen in /tmp.

Host-User Filesystem

When running in host-user mode, the agent runs directly on the host:

Path	Contents
`/tmp/al-runs/<instance-id>/`	Working directory (chowned to agent user)
`AL_CREDENTIALS_PATH` (temp dir)	Staged credential files (`/<type>/<instance>/<field>`)

The agent has access to the host filesystem but runs as a separate OS user, so it cannot access other users’ files or credentials.

​Skills vs Agents

​Agent Structure

​How Context is Assembled

​System prompt

​User prompt

​Language Skills

​Runtime Lifecycle

​Timeout

​Reruns

​Work Queue

​Wait & Resume

​How it works

​Timeout

​Agent state

​Limitations

​Container Filesystem

​Host-User Filesystem

​See Also

Skills vs Agents

Agent Structure

How Context is Assembled

System prompt

User prompt

Language Skills

Runtime Lifecycle

Timeout

Reruns

Work Queue

Wait & Resume

How it works

Timeout

Agent state

Limitations

Container Filesystem

Host-User Filesystem

See Also