Architecture
- Scheduler — discovers agents by scanning for directories with
SKILL.md. Registers cron jobs and webhook triggers. Manages a pool of runners per agent (configurable viascale) and a durable work queue for buffering events when runners are busy. - Gateway — HTTP server that receives webhooks from external services (GitHub, Sentry, Linear, Mintlify), serves the web dashboard, handles resource locking, and processes agent signals.
- Agent Processes — each agent run is an isolated process — either a Docker container or a host-user process (via
sudo -u) — with its own credentials and environment variables. Processes are ephemeral — they start, do their work, and are cleaned up.
Agent Discovery
The scheduler scans the project directory for subdirectories containing aSKILL.md file. Each discovered directory becomes an agent. The directory name is the agent name.
No registration step is needed. Add a new agent directory, restart the scheduler, and it picks it up automatically.
Cron Scheduling
Agents with aschedule field in their SKILL.md frontmatter are registered as cron jobs. When a cron tick fires:
- If a runner is available, the agent starts immediately
- If all runners are busy, the scheduled run is skipped with a warning (cron runs are not queued)
Webhook Dispatch
When the gateway receives a webhook:- Signature verification — the payload is verified against the credential secret for that webhook source
- Event parsing — the raw payload is parsed into a
WebhookContext(source, event, action, repo, etc.) - Filter matching — the context is matched against each agent’s webhook trigger filters
- Runner dispatch — if a runner is available, the agent starts. If all runners are busy, the event is queued in the work queue.
Runner Pools
Each agent has its own pool of runners. The pool size is controlled by thescale field in SKILL.md frontmatter (default: 1).
scale = 1— only one instance can run at a time (default)scale = N— up to N instances can run concurrentlyscale = 0— agent is disabled (no runners, no cron, no webhooks)
scale field in config.toml sets a cap on total concurrent runners across all agents.
Work Queue
When a webhook event or agent call arrives but all runners are busy, the event is placed in a work queue. Items are dequeued and executed as runners become available.- Backed by SQLite (
.al/work-queue.db) — survives scheduler restarts - Per-agent queues
- Configurable size:
workQueueSize(default: 100) - When full, oldest items are dropped
- Queue depth visible in
al statoutput
Reruns
When a scheduled agent callsal-rerun, the scheduler immediately starts a new run. This continues until:
- The agent completes without calling
al-rerun(no more work) - The agent hits an error
- The
maxRerunslimit is reached (default: 10)
Agent Calls
Agents can call other agents viaal-subagent. The scheduler routes the call to the target agent’s runner pool:
- If a runner is available, the called agent starts immediately
- If all runners are busy, the call is queued in the work queue
- Self-calls are rejected
- Call depth is bounded by
maxCallDepth(default: 3) to prevent infinite loops
Graceful Shutdown
When the scheduler receives a stop signal (al stop or SIGTERM):
- No new runs are started
- All pending work queues are cleared
- In-flight runs continue until they finish
- Once all runs complete, the process exits
Configuration
| Setting | Location | Default | Description |
|---|---|---|---|
maxReruns | config.toml | 10 | Max consecutive reruns per agent |
maxCallDepth | config.toml | 3 | Max depth for agent call chains |
workQueueSize | config.toml | 100 | Max queued items per agent |
scale | config.toml | (unlimited) | Project-wide max concurrent runners |
scale | SKILL.md frontmatter | 1 | Per-agent concurrent runner limit |
gateway.port | config.toml | 8080 | Gateway HTTP port |
Troubleshooting
Agent not running on schedule
- Verify the cron expression in
SKILL.mdfrontmatter is valid - Check if the agent or scheduler is paused:
al stat - Resume if paused:
al resume(scheduler) oral resume <agent>
Agent keeps re-running
An agent that callsal-rerun will re-run immediately, up to maxReruns (default: 10). If it’s re-running more than expected, check the agent’s SKILL.md — it may be calling al-rerun even when there’s no remaining work.
Agent timing out
Default timeout is 900 seconds (15 minutes). Increase it in the project’sconfig.toml or per-agent in the agent’s config.toml: