Scaling Agents - Action Llama

By default, each agent runs one instance at a time. This guide shows how to scale up and use resource locks to prevent duplicate work.

The Problem

With scale = 1, a single agent instance handles all work sequentially. If 5 GitHub issues arrive via webhook while the agent is working on one, those 5 events queue up and wait. For high-volume workloads, this creates a bottleneck.

Increase Scale

In the agent’s config.toml:

# agents/dev/config.toml
scale = 3    # Run up to 3 instances concurrently

Now when 5 issues arrive, up to 3 are processed simultaneously. The remaining 2 wait in the work queue.

Add Locking

With multiple instances, two agents might try to work on the same issue. Add a lock/skip/work/unlock pattern to your SKILL.md:

## Workflow

1. List open issues labeled "agent" in repos from `<agent-config>`
2. For each issue:
   - `acquire_lock` with resource `"github://owner/repo/issues/123"`
   - If the lock fails, skip this issue — another instance is handling it
   - Clone the repo, create a branch, implement the fix
   - Open a PR and link it to the issue
   - `release_lock` with resource `"github://owner/repo/issues/123"`

How lock commands work

When the agent calls acquire_lock with resource "github://owner/repo/issues/123":

Lock acquired: {"ok": true} — proceed with work
Already held: {"ok": false, "holder": "dev-abc123", ...} — skip this resource

When done: release_lock with the same resource releases the lock. If the agent crashes or times out, locks are auto-released.

Monitor with `al stat`

Check queue depth and running instances:

al stat
al stat -E production

The queue column shows how many events are waiting. If it’s consistently high, consider increasing scale.

Resource Considerations

Each parallel instance:

Uses a separate Docker container
Consumes memory (local.memory per container, default 4GB)
Consumes CPU (local.cpus per container, default 2)
Makes independent LLM API calls (watch your rate limits and quota)

Tune work queue size

If events arrive faster than agents can process them, the queue buffers them:

# config.toml
workQueueSize = 200    # default: 100 per agent

When the queue is full, the oldest items are dropped.

Default agent scale

Set the default scale for all agents that don’t have an explicit scale in their config.toml:

# config.toml
defaultAgentScale = 3    # each agent gets 3 runners unless overridden

Without this setting, agents default to 1 runner each.

Project-wide scale cap

Limit total concurrent runners across all agents:

# config.toml
scale = 10    # max 10 runners total across all agents

If defaultAgentScale * agentCount exceeds scale, agents are throttled at startup and a warning is shown.

Example Configuration

Agent runtime config in agents/dev/config.toml:

credentials = ["github_token", "git_ssh"]
schedule = "*/5 * * * *"
models = ["sonnet"]
scale = 3

[[webhooks]]
source = "my-github"
events = ["issues"]
actions = ["labeled"]
labels = ["agent"]

[params]
repos = ["acme/app", "acme/api"]
triggerLabel = "agent"

Next steps

Resource Locks (concepts) — TTL, heartbeat, deadlock detection
Agent Tools — Locking — lock tool reference and guidelines

​The Problem

​Increase Scale

​Add Locking

​How lock commands work

​Monitor with al stat

​Resource Considerations

​Tune work queue size

​Default agent scale

​Project-wide scale cap

​Example Configuration

​Next steps