Skip to main content
The scheduler is the heart of Action Llama. It discovers agents, fires cron triggers, dispatches webhook events, and manages runner pools.

Architecture

┌──────────────────────────────────────────────┐
│                 Scheduler                    │
│  Discovers agents, fires cron triggers,      │
│  manages runner pool and work queue          │
├──────────────────────────────────────────────┤
│                  Gateway                     │
│  HTTP server: webhooks, resource locks,      │
│  dashboard, agent signals, control API       │
├───────────┬───────────┬──────────────────────┤
│ Container │ Container │   Host-User Process  │
│  Agent A  │  Agent A  │       Agent B        │
│ (run 1)   │ (run 2)   │      (run 1)         │
└───────────┴───────────┴──────────────────────┘
  • Scheduler — discovers agents by scanning for directories with SKILL.md. Registers cron jobs and webhook triggers. Manages a pool of runners per agent (configurable via scale) and a durable work queue for buffering events when runners are busy.
  • Gateway — HTTP server that receives webhooks from external services (GitHub, Sentry, Linear, Mintlify), serves the web dashboard, handles resource locking, and processes agent signals.
  • Agent Processes — each agent run is an isolated process — either a Docker container or a host-user process (via sudo -u) — with its own credentials and environment variables. Processes are ephemeral — they start, do their work, and are cleaned up.

Agent Discovery

The scheduler scans the project directory for subdirectories containing a SKILL.md file. Each discovered directory becomes an agent. The directory name is the agent name. No registration step is needed. Add a new agent directory, restart the scheduler, and it picks it up automatically.

Cron Scheduling

Agents with a schedule field in their SKILL.md frontmatter are registered as cron jobs. When a cron tick fires:
  • If a runner is available, the agent starts immediately
  • If all runners are busy, the scheduled run is skipped with a warning (cron runs are not queued)
This means cron is best-effort — if an agent is still running from the previous tick, the new tick is dropped.

Webhook Dispatch

When the gateway receives a webhook:
  1. Signature verification — the payload is verified against the credential secret for that webhook source
  2. Event parsing — the raw payload is parsed into a WebhookContext (source, event, action, repo, etc.)
  3. Filter matching — the context is matched against each agent’s webhook trigger filters
  4. Runner dispatch — if a runner is available, the agent starts. If all runners are busy, the event is queued in the work queue.
Unlike cron, webhook events are queued (not dropped) when runners are busy.

Runner Pools

Each agent has its own pool of runners. The pool size is controlled by the scale field in SKILL.md frontmatter (default: 1).
  • scale = 1 — only one instance can run at a time (default)
  • scale = N — up to N instances can run concurrently
  • scale = 0 — agent is disabled (no runners, no cron, no webhooks)
The project-wide scale field in config.toml sets a cap on total concurrent runners across all agents.

Work Queue

When a webhook event or agent call arrives but all runners are busy, the event is placed in a work queue. Items are dequeued and executed as runners become available.
  • Backed by SQLite (.al/work-queue.db) — survives scheduler restarts
  • Per-agent queues
  • Configurable size: workQueueSize (default: 100)
  • When full, oldest items are dropped
  • Queue depth visible in al stat output

Reruns

When a scheduled agent calls al-rerun, the scheduler immediately starts a new run. This continues until:
  • The agent completes without calling al-rerun (no more work)
  • The agent hits an error
  • The maxReruns limit is reached (default: 10)
This lets agents drain their work queue efficiently without waiting for the next cron tick. Only scheduled runs can rerun — webhook and call runs do not.

Agent Calls

Agents can call other agents via al-subagent. The scheduler routes the call to the target agent’s runner pool:
  • If a runner is available, the called agent starts immediately
  • If all runners are busy, the call is queued in the work queue
  • Self-calls are rejected
  • Call depth is bounded by maxCallDepth (default: 3) to prevent infinite loops
See Subagents for a guide on agent-to-agent workflows.

Graceful Shutdown

When the scheduler receives a stop signal (al stop or SIGTERM):
  1. No new runs are started
  2. All pending work queues are cleared
  3. In-flight runs continue until they finish
  4. Once all runs complete, the process exits

Configuration

SettingLocationDefaultDescription
maxRerunsconfig.toml10Max consecutive reruns per agent
maxCallDepthconfig.toml3Max depth for agent call chains
workQueueSizeconfig.toml100Max queued items per agent
scaleconfig.toml(unlimited)Project-wide max concurrent runners
scaleSKILL.md frontmatter1Per-agent concurrent runner limit
gateway.portconfig.toml8080Gateway HTTP port

Troubleshooting

Agent not running on schedule

  • Verify the cron expression in SKILL.md frontmatter is valid
  • Check if the agent or scheduler is paused: al stat
  • Resume if paused: al resume (scheduler) or al resume <agent>

Agent keeps re-running

An agent that calls al-rerun will re-run immediately, up to maxReruns (default: 10). If it’s re-running more than expected, check the agent’s SKILL.md — it may be calling al-rerun even when there’s no remaining work.
# config.toml — lower the limit if needed
maxReruns = 5

Agent timing out

Default timeout is 900 seconds (15 minutes). Increase it in the project’s config.toml or per-agent in the agent’s config.toml:
# config.toml — project-wide default
[local]
timeout = 3600    # 1 hour
# agents/<name>/config.toml — per-agent override
timeout = 7200    # 2 hours