Scheduler - Action Llama

The scheduler is the heart of Action Llama. It discovers agents, fires cron triggers, dispatches webhook events, and manages runner pools.

Architecture

┌──────────────────────────────────────────────┐
│                 Scheduler                    │
│  Discovers agents, fires cron triggers,      │
│  manages runner pool and work queue          │
├──────────────────────────────────────────────┤
│                  Gateway                     │
│  HTTP server: webhooks, resource locks,      │
│  dashboard, agent signals, control API       │
├───────────┬───────────┬──────────────────────┤
│ Container │ Container │   Host-User Process  │
│  Agent A  │  Agent A  │       Agent B        │
│ (run 1)   │ (run 2)   │      (run 1)         │
└───────────┴───────────┴──────────────────────┘

Scheduler — discovers agents by scanning for directories with SKILL.md. Registers cron jobs and webhook triggers. Manages a pool of runners per agent (configurable via scale) and a durable work queue for buffering events when runners are busy.
Gateway — HTTP server that receives webhooks from external services (GitHub, Sentry, Linear, Mintlify), serves the web dashboard, handles resource locking, and processes agent signals.
Agent Processes — each agent run is an isolated process — either a Docker container or a host-user process (via sudo -u) — with its own credentials and environment variables. Processes are ephemeral — they start, do their work, and are cleaned up.

Agent Discovery

The scheduler scans the project directory for subdirectories containing a SKILL.md file. Each discovered directory becomes an agent. The directory name is the agent name. No registration step is needed. Add a new agent directory, restart the scheduler, and it picks it up automatically.

Cron Scheduling

Agents with a schedule field in their SKILL.md frontmatter are registered as cron jobs. When a cron tick fires:

If a runner is available, the agent starts immediately
If all runners are busy, the scheduled run is skipped with a warning (cron runs are not queued)

This means cron is best-effort — if an agent is still running from the previous tick, the new tick is dropped.

Webhook Dispatch

When the gateway receives a webhook:

Signature verification — the payload is verified against the credential secret for that webhook source
Event parsing — the raw payload is parsed into a WebhookContext (source, event, action, repo, etc.)
Filter matching — the context is matched against each agent’s webhook trigger filters
Runner dispatch — if a runner is available, the agent starts. If all runners are busy, the event is queued in the work queue.

Unlike cron, webhook events are queued (not dropped) when runners are busy.

Runner Pools

Each agent has its own pool of runners. The pool size is controlled by the scale field in SKILL.md frontmatter (default: 1).

scale = 1 — only one instance can run at a time (default)
scale = N — up to N instances can run concurrently
scale = 0 — agent is disabled (no runners, no cron, no webhooks)

The project-wide scale field in config.toml sets a cap on total concurrent runners across all agents.

Work Queue

When a webhook event or agent call arrives but all runners are busy, the event is placed in a work queue. Items are dequeued and executed as runners become available.

Backed by SQLite (.al/work-queue.db) — survives scheduler restarts
Per-agent queues
Configurable size: workQueueSize (default: 100)
When full, oldest items are dropped
Queue depth visible in al stat output

Reruns

The scheduler automatically handles reruns for scheduled agents. When a scheduled run completes successfully, the scheduler can immediately start a new run. This continues until:

The agent completes with no remaining work
The agent hits an error
The maxReruns limit is reached (default: 10)

This lets agents drain their work queue efficiently without waiting for the next cron tick. Only scheduled runs can rerun — webhook and call runs do not.

Agent Calls

Agents can call other agents via the call_agent tool. The scheduler routes the call to the target agent’s runner pool:

If a runner is available, the called agent starts immediately
If all runners are busy, the call is queued in the work queue
Self-calls are rejected
Call depth is bounded by maxCallDepth (default: 3) to prevent infinite loops

See Subagents for a guide on agent-to-agent workflows.

Graceful Shutdown

When the scheduler receives a stop signal (al stop or SIGTERM):

No new runs are started
All pending work queues are cleared
In-flight runs continue until they finish
Once all runs complete, the process exits

Configuration

Setting	Location	Default	Description
`maxReruns`	`config.toml`	`10`	Max consecutive reruns per agent
`maxCallDepth`	`config.toml`	`3`	Max depth for agent call chains
`workQueueSize`	`config.toml`	`100`	Max queued items per agent
`scale`	`config.toml`	(unlimited)	Project-wide max concurrent runners
`defaultWaitTimeout`	`config.toml`	`1800`	Default timeout for `wait_for_trigger` (seconds)
`scale`	`SKILL.md` frontmatter	`1`	Per-agent concurrent runner limit
`gateway.port`	`config.toml`	`8080`	Gateway HTTP port

Troubleshooting

Agent not running on schedule

Verify the cron expression in SKILL.md frontmatter is valid
Check if the agent or scheduler is paused: al stat
Resume if paused: al resume (scheduler) or al resume <agent>

Agent keeps re-running

The scheduler automatically re-runs scheduled agents up to maxReruns times (default: 10). If an agent is re-running more than expected, check whether it is completing successfully each time — the scheduler will keep re-running it as long as the run succeeds.

# config.toml — lower the limit if needed
maxReruns = 5

Agent timing out

Default timeout is 900 seconds (15 minutes). Increase it in the project’s config.toml or per-agent in the agent’s config.toml:

# config.toml — project-wide default
[local]
timeout = 3600    # 1 hour

# agents/<name>/config.toml — per-agent override
timeout = 7200    # 2 hours

​Architecture

​Agent Discovery

​Cron Scheduling

​Webhook Dispatch

​Runner Pools

​Work Queue

​Reruns

​Agent Calls

​Graceful Shutdown

​Configuration

​Troubleshooting

​Agent not running on schedule

​Agent keeps re-running

​Agent timing out