Skip to main content
Action Llama supports 8 LLM providers. Define named models in config.toml under [models.<name>], then reference them by name in each agent’s config.toml. Agents list models in priority order — the first is the primary, the rest are fallbacks tried automatically on rate limits.

[models.<name>] Fields

FieldTypeRequiredDescription
providerstringYesProvider name (see table below)
modelstringYesModel ID
authTypestringYes"api_key", "oauth_token", or "pi_auth"
thinkingLevelstringNoReasoning budget (Anthropic only)

Providers

Anthropic

Claude models with optional extended thinking.
[models.sonnet]
provider = "anthropic"
model = "claude-sonnet-4-20250514"
thinkingLevel = "medium"
authType = "api_key"
ModelDescription
claude-opus-4-20250514Most capable, best for complex multi-step tasks
claude-sonnet-4-20250514Balanced performance and cost (recommended)
claude-haiku-3-5-20241022Fastest and cheapest
Credential: anthropic_key (field: token) Auth types:
authTypeToken formatDescription
api_keysk-ant-api-...Standard Anthropic API key
oauth_tokensk-ant-oat-...OAuth token from claude setup-token
pi_auth(none)Uses existing pi auth credentials (~/.pi/agent/auth.json). No credential file needed.
Note: pi_auth is not supported in Docker mode. Switch to api_key or oauth_token for containerized runs. Thinking level: Anthropic is the only provider that supports thinkingLevel. Valid values:
LevelDescription
offNo extended thinking
minimalMinimal reasoning
lowLight reasoning
mediumBalanced (recommended)
highDeep reasoning
xhighMaximum reasoning budget
If omitted, thinking is not explicitly configured. For other providers, thinkingLevel is ignored.

OpenAI

[models.gpt4o]
provider = "openai"
model = "gpt-4o"
authType = "api_key"
ModelDescription
gpt-4oFlagship multimodal model (recommended)
gpt-4o-miniSmaller, faster, cheaper
gpt-4-turboPrevious generation
o1-previewReasoning model
o1-miniSmaller reasoning model
Credential: openai_key (field: token)

Groq

[models.groq-llama]
provider = "groq"
model = "llama-3.3-70b-versatile"
authType = "api_key"
ModelDescription
llama-3.3-70b-versatileLlama 3.3 70B on Groq inference
Groq runs open-source models at high speed. Check Groq’s docs for the full list of available model IDs. Credential: groq_key (field: token)

Google Gemini

[models.gemini]
provider = "google"
model = "gemini-2.0-flash-exp"
authType = "api_key"
ModelDescription
gemini-2.0-flash-expFast experimental model
Check Google AI Studio for the full list of available model IDs. Credential: google_key (field: token)

xAI

[models.grok]
provider = "xai"
model = "grok-beta"
authType = "api_key"
ModelDescription
grok-betaGrok beta
Credential: xai_key (field: token)

Mistral

[models.mistral]
provider = "mistral"
model = "mistral-large-2411"
authType = "api_key"
ModelDescription
mistral-large-2411Mistral Large (November 2024)
Check Mistral’s docs for the full list of available model IDs. Credential: mistral_key (field: token)

OpenRouter

OpenRouter provides access to models from many providers through a single API.
[models.or-sonnet]
provider = "openrouter"
model = "anthropic/claude-3.5-sonnet"
authType = "api_key"
Model IDs use the provider/model format. See OpenRouter’s model list for all available models. Credential: openrouter_key (field: token)

Custom

For any provider not listed above. The model ID and API routing are handled by the underlying pi.dev agent harness.
[models.my-model]
provider = "custom"
model = "your-model-name"
authType = "api_key"
Credential: custom_key (field: token)

Mixing Models

Each agent can use a different model. Define all models in the project’s config.toml, then reference them by name in each agent’s config.toml:
config.toml              → [models.sonnet], [models.gpt4o], [models.groq-llama]
agents/dev/config.toml   → models = ["sonnet"]
agents/reviewer/config.toml → models = ["gpt4o"]
agents/devops/config.toml   → models = ["groq-llama"]

Model Fallback

Agents can list multiple models to create a fallback chain. When the primary model is rate-limited or unavailable, Action Llama automatically tries the next model in the list:
# agents/<name>/config.toml
models = ["sonnet", "haiku", "gpt4o"]
Fallback switching is instant — there is no delay when moving to the next model. Exponential backoff only kicks in after all models in the chain have been exhausted. A circuit breaker tracks model availability in memory. When a model returns a rate limit or overload error, it is marked unavailable for 60 seconds. After the cooldown expires, the model is retried.

Credential Setup

Each provider requires a corresponding credential in ~/.action-llama/credentials/. Run al doctor to configure them interactively. LLM credentials are loaded automatically based on the models referenced in the agent’s models list in config.toml — they do not need to be listed in the agent’s credentials array. The credentials array is for runtime credentials the agent uses during execution (GitHub tokens, SSH keys, etc.). See Credentials for the full credential reference.