Models - Action Llama

Action Llama supports 8 LLM providers. Define named models in config.toml under [models.<name>], then reference them by name in each agent’s config.toml. Agents list models in priority order — the first is the primary, the rest are fallbacks tried automatically on rate limits when using the default pi harness.

`[models.<name>]` Fields

Field	Type	Required	Description
`provider`	string	Yes	Provider name (see table below)
`model`	string	Yes	Model ID
`authType`	string	Yes	`"api_key"`, `"oauth_token"`, or `"pi_auth"`
`thinkingLevel`	string	No	Reasoning budget (Anthropic only)

Providers

Anthropic

Claude models with optional extended thinking.

[models.sonnet]
provider = "anthropic"
model = "claude-sonnet-4-20250514"
thinkingLevel = "medium"
authType = "api_key"

Model	Description
`claude-opus-4-20250514`	Most capable, best for complex multi-step tasks
`claude-sonnet-4-20250514`	Balanced performance and cost (recommended)
`claude-haiku-3-5-20241022`	Fastest and cheapest

Credential: anthropic_key (field: token) Auth types:

`authType`	Token format	Description
`api_key`	`sk-ant-api-...`	Standard Anthropic API key
`oauth_token`	`sk-ant-oat-...`	OAuth token from `claude setup-token`
`pi_auth`	(none)	Uses existing pi auth credentials (`~/.pi/agent/auth.json`). No credential file needed.

Notes:

pi_auth is only available with the pi harness.
pi_auth is not supported in Docker mode. Switch to api_key or oauth_token for containerized runs.
When using the claude harness, Action Llama passes Anthropic credentials to Claude CLI as ANTHROPIC_API_KEY or CLAUDE_CODE_AUTH_TOKEN depending on the configured auth type.

Thinking level: Anthropic is the only provider that supports thinkingLevel. Valid values:

Level	Description
`off`	No extended thinking
`minimal`	Minimal reasoning
`low`	Light reasoning
`medium`	Balanced (recommended)
`high`	Deep reasoning
`xhigh`	Maximum reasoning budget

If omitted, thinking is not explicitly configured. For other providers, thinkingLevel is ignored.

OpenAI

[models.gpt4o]
provider = "openai"
model = "gpt-4o"
authType = "api_key"

Model	Description
`gpt-4o`	Flagship multimodal model (recommended)
`gpt-4o-mini`	Smaller, faster, cheaper
`gpt-4-turbo`	Previous generation
`o1-preview`	Reasoning model
`o1-mini`	Smaller reasoning model

Credential: openai_key (field: token)

Groq

[models.groq-llama]
provider = "groq"
model = "llama-3.3-70b-versatile"
authType = "api_key"

Model	Description
`llama-3.3-70b-versatile`	Llama 3.3 70B on Groq inference

Groq runs open-source models at high speed. Check Groq’s docs for the full list of available model IDs. Credential: groq_key (field: token)

Google Gemini

[models.gemini]
provider = "google"
model = "gemini-2.0-flash-exp"
authType = "api_key"

Model	Description
`gemini-2.0-flash-exp`	Fast experimental model

Check Google AI Studio for the full list of available model IDs. Credential: google_key (field: token)

xAI

[models.grok]
provider = "xai"
model = "grok-beta"
authType = "api_key"

Model	Description
`grok-beta`	Grok beta

Credential: xai_key (field: token)

Mistral

[models.mistral]
provider = "mistral"
model = "mistral-large-2411"
authType = "api_key"

Model	Description
`mistral-large-2411`	Mistral Large (November 2024)

Check Mistral’s docs for the full list of available model IDs. Credential: mistral_key (field: token)

OpenRouter

OpenRouter provides access to models from many providers through a single API.

[models.or-sonnet]
provider = "openrouter"
model = "anthropic/claude-3.5-sonnet"
authType = "api_key"

Model IDs use the provider/model format. See OpenRouter’s model list for all available models. Credential: openrouter_key (field: token)

Custom

For any provider not listed above. The model ID and API routing are handled by the pi harness.

[models.my-model]
provider = "custom"
model = "your-model-name"
authType = "api_key"

Credential: custom_key (field: token)

Mixing Models

Each agent can use a different model. Define all models in the project’s config.toml, then reference them by name in each agent’s config.toml:

config.toml              → [models.sonnet], [models.gpt4o], [models.groq-llama]
agents/dev/config.toml   → models = ["sonnet"]
agents/reviewer/config.toml → models = ["gpt4o"]
agents/devops/config.toml   → models = ["groq-llama"]

Model Fallback

Agents can list multiple models to create a fallback chain. When the primary model is rate-limited or unavailable, Action Llama automatically tries the next model in the list when using the pi harness:

# agents/<name>/config.toml
models = ["sonnet", "haiku", "gpt4o"]

Fallback switching is instant — there is no delay when moving to the next model. Exponential backoff only kicks in after all models in the chain have been exhausted. A circuit breaker tracks model availability in memory. When a model returns a rate limit or overload error, it is marked unavailable for 60 seconds. After the cooldown expires, the model is retried. The claude harness does not currently use the fallback chain. It runs the agent’s primary model directly via Claude CLI.

Credential Setup

Each provider requires a corresponding credential in ~/.action-llama/credentials/. Run al doctor to configure them interactively. LLM credentials are loaded automatically based on the models referenced in the agent’s models list in config.toml — they do not need to be listed in the agent’s credentials array. The credentials array is for runtime credentials the agent uses during execution (GitHub tokens, SSH keys, etc.). See Credentials for the full credential reference.

​[models.<name>] Fields

​Providers

​Anthropic

​OpenAI

​Groq

​Google Gemini

​xAI

​Mistral

​OpenRouter

​Custom

​Mixing Models

​Model Fallback

​Credential Setup