The scheduler is the heart of Action Llama. It discovers agents, fires cron triggers, dispatches webhook events, and manages runner pools.Documentation Index
Fetch the complete documentation index at: https://docs.actionllama.org/llms.txt
Use this file to discover all available pages before exploring further.
Architecture
- Scheduler — discovers agents by scanning for directories with
SKILL.md. Registers cron jobs and webhook triggers. Manages a pool of runners per agent (configurable viascale) and a durable work queue for buffering events when runners are busy. - Gateway — HTTP server that receives webhooks from external services (GitHub, Sentry, Linear, Mintlify), serves the web dashboard, handles resource locking, and processes agent signals.
- Agent Processes — each agent run is an isolated process — either a Docker container or a host-user process (via
sudo -u) — with its own credentials and environment variables. Processes are ephemeral — they start, do their work, and are cleaned up.
Agent Discovery
The scheduler scans the project directory for subdirectories containing aSKILL.md file. Each discovered directory becomes an agent. The directory name is the agent name.
No registration step is needed. Add a new agent directory, restart the scheduler, and it picks it up automatically.
Cron Scheduling
Agents with aschedule field in their SKILL.md frontmatter are registered as cron jobs. When a cron tick fires:
- If a runner is available, the agent starts immediately
- If all runners are busy, the scheduled run is skipped with a warning (cron runs are not queued)
Webhook Dispatch
When the gateway receives a webhook:- Signature verification — the payload is verified against the credential secret for that webhook source
- Event parsing — the raw payload is parsed into a
WebhookContext(source, event, action, repo, etc.) - Filter matching — the context is matched against each agent’s webhook trigger filters
- Runner dispatch — if a runner is available, the agent starts. If all runners are busy, the event is queued in the work queue.
Runner Pools
Each agent has its own pool of runners. The pool size is controlled by thescale field in SKILL.md frontmatter (default: 1).
scale = 1— only one instance can run at a time (default)scale = N— up to N instances can run concurrentlyscale = 0— agent is disabled (no runners, no cron, no webhooks)
scale field in config.toml sets a cap on total concurrent runners across all agents.
Work Queue
When a webhook event or agent call arrives but all runners are busy, the event is placed in a work queue. Items are dequeued and executed as runners become available.- Backed by SQLite (
.al/work-queue.db) — survives scheduler restarts - Per-agent queues
- Configurable size:
workQueueSize(default: 100) - When full, oldest items are dropped
- Queue depth visible in
al statoutput
Reruns
The scheduler automatically handles reruns for scheduled agents. When a scheduled run completes successfully, the scheduler can immediately start a new run. This continues until:- The agent completes with no remaining work
- The agent hits an error
- The
maxRerunslimit is reached (default: 10)
Agent Calls
Agents can call other agents via thecall_agent tool. The scheduler routes the call to the target agent’s runner pool:
- If a runner is available, the called agent starts immediately
- If all runners are busy, the call is queued in the work queue
- Self-calls are rejected
- Call depth is bounded by
maxCallDepth(default: 3) to prevent infinite loops
Graceful Shutdown
When the scheduler receives a stop signal (al stop or SIGTERM):
- No new runs are started
- All pending work queues are cleared
- In-flight runs continue until they finish
- Once all runs complete, the process exits
Configuration
| Setting | Location | Default | Description |
|---|---|---|---|
maxReruns | config.toml | 10 | Max consecutive reruns per agent |
maxCallDepth | config.toml | 3 | Max depth for agent call chains |
workQueueSize | config.toml | 100 | Max queued items per agent |
scale | config.toml | (unlimited) | Project-wide max concurrent runners |
defaultWaitTimeout | config.toml | 1800 | Default timeout for wait_for_trigger (seconds) |
scale | SKILL.md frontmatter | 1 | Per-agent concurrent runner limit |
gateway.port | config.toml | 8080 | Gateway HTTP port |
Troubleshooting
Agent not running on schedule
- Verify the cron expression in
SKILL.mdfrontmatter is valid - Check if the agent or scheduler is paused:
al stat - Resume if paused:
al resume(scheduler) oral resume <agent>
Agent keeps re-running
The scheduler automatically re-runs scheduled agents up tomaxReruns times (default: 10). If an agent is re-running more than expected, check whether it is completing successfully each time — the scheduler will keep re-running it as long as the run succeeds.
Agent timing out
Default timeout is 900 seconds (15 minutes). Increase it in the project’sconfig.toml or per-agent in the agent’s config.toml: