Models & providers

Reasoning & Thinking

Some models think before they answer - running an internal reasoning pass that isn't part of the final response. This section covers how Bodega One surfaces that capability, how to control it per-message or globally, and what to expect from the UI.

How reasoning works in Bodega One

When reasoning is active, the model produces a chain-of-thought before writing its response. Bodega collects those thinking tokens in a separate stream and shows them in a collapsible Reasoning block above the response - the response text itself is always clean.

Three things control whether and how much reasoning happens for a given message, resolved in this order:

Per-message composer pill - overrides everything else for that one send
Claude Fast Mode - suppresses Claude reasoning globally (only affects Claude models; per-message pill still wins)
Per-model override - set in the expanded model card in Settings → Models
Global default - saved in Settings → Models as llm.reasoning_effort

If none of those are set, reasoning is off.

Not all models support reasoning. The composer pill appears only for models that do. Non-reasoning models silently ignore the effort value - setting a global default won't cause errors.

Which models support reasoning

The composer pill and model cards show three states depending on the selected model:

State	What you see	Models
Segmented (interactive)	Off / Low / Medium / High dropdown	Claude 3.7 Sonnet and later (claude-3.7+, claude-opus-4-x, claude-sonnet-4-x, claude-haiku-4-x); OpenAI o1, o3, o4-mini, GPT-5 (not gpt-5-chat); Gemini 2.5 Pro / Flash and later; local Ollama qwen3, glm-4, deepseek variants
Auto chip (non-interactive)	Static "Auto" label	Always-on reasoners: deepseek-reasoner, deepseek-r1 (cloud), grok, kimi-thinking, qwq / qwen-qwq
Hidden	Nothing	All other models (GPT-4o, Claude 3.5 and earlier, most base Ollama models)

Extended tiers (beta.29). Some models expose more than four levels, and each model's menu shows only the ones it accepts: GPT-5 adds Minimal below Low (answer fast, barely think), and Claude 4.6+ adaptive models add Extra High and Max above High. Set a tier, then switch to a model that doesn't have it, and Bodega clamps to that model's nearest level before the request leaves - a Max set on Claude becomes High on GPT-5, so the API never rejects it.

Always-on reasoners manage their own reasoning depth - you can't tune it. For local Ollama models, the reasoning control appears when Bodega detects thinking support from the model's metadata; if it doesn't appear for a model you expect to support it, make sure the model is fully pulled and reconnect Ollama.

Per-message reasoning pill

The composer has a row of controls below the text input. The reasoning pill is one of them - it looks like a brain icon with the current effort label.

Type your message in the composer (Chat mode or Code mode / Agent panel).
Click the brain icon button in the control row below the input.
Select Off, Low, Medium, or High from the dropdown. Press Escape or click outside to close without changing.
Send the message. The pill selection applies only to that message.

What "Off" means here: it means "use my default setting", not "force reasoning off". If your global default is set to Medium, selecting Off in the pill still sends Medium. The Off selection defers to the global - it does not suppress it. If you need to guarantee no reasoning on a message when your global default is non-zero, you must first lower your global default in Settings → Models → My Models (save it to Off), then send the message.

The selected effort resets to Off when you switch sessions or reload. It persists across messages within a session until you change it.

Global and per-model defaults

To set a default that applies to all messages without touching the composer pill each time:

Global default:

Go to Settings → Models → My Models.
Adjust the reasoning effort setting in the LLM settings form.
Click Save Model Settings.

The value is stored as llm.reasoning_effort (off / low / medium / high, default off) and applies to every reasoning-capable model when no per-message or per-model override is set.

Per-model override:

Go to Settings → Models → My Models.
Click the chevron on a model card to expand it.
Find the Reasoning Effort segmented control (Off / Low / Med / High).
Select the level you want for that model specifically.
Save.

The per-model value (llm.model_overrides[modelName].reasoningEffort) wins over the global default for that model only. Non-reasoning models ignore both values.

Claude Fast Mode

Claude Fast Mode suppresses extended thinking on Claude models to get faster replies - same model, no quality downgrade, just no reasoning pre-pass. It sits between the per-message pill and the per-model/global defaults in the resolution chain.

Click the Fast toggle in the message composer - it appears next to the reasoning control for Claude models.

To override it for one message, set the composer pill to Low, Medium, or High for that send - the pill always wins.

Fast Mode has no effect on OpenAI, Gemini, or local models. It exists specifically for Claude because Anthropic's adaptive and manual thinking architecture makes the skip-for-speed trade-off meaningful there.

The Reasoning disclosure block

When a model reasons during a response, a Reasoning block appears above the response text.

While streaming: shows Reasoning... Ns with a live elapsed-seconds counter and a pulsing dot.
After completion: shows Reasoning (N words) with a toggle button.
Collapsed by default. Click the brain icon + label to expand.
The reasoning chain renders in a fixed-height (500px) scrollable monospaced block. Chains longer than 2,000 characters show a character count at the bottom.

Reasoning content is held in client memory and is not persisted to the database. If you reload, the reasoning disclosure for past messages will be empty - the response text is preserved, but the thinking chain is not.

Token metrics and thinking token count

After a response completes, a metrics row appears below the message:

↳ N tokens (M thinking) • X tok/s

The thinking token count is included only when the model produced reasoning tokens. The metrics row appears after the response is fully done - not during streaming.

Note: thinking tokens count as input tokens for billing purposes on most providers.

Local Ollama reasoning models

For local Ollama models with native thinking support (qwen3, glm-4, deepseek variants), Bodega adds think: true to the Ollama API request automatically - you don't configure this. The composer pill still appears in segmented mode so you can set the effort level, but the model's reasoning depth is ultimately decided internally.

For llama.cpp-served models that embed <think>...</think> tags directly in the content stream, Bodega's ThinkTokenStripper handles the tag-stripping before the response reaches the UI - the response text you see is always clean.

If the reasoning control doesn't appear for a local model you expect to support it, ensure the model is fully pulled (ollama pull <model>) and that Ollama is connected.

How effort maps to provider API calls

You set Off / Low / Medium / High. The backend translates that into provider-specific parameters:

Provider	Low	Medium	High	Notes
Anthropic 4.6+ (adaptive)	`effort: 'low'`	`effort: 'medium'`	`effort: 'high'`	`thinking: {type: 'adaptive'}`, temperature removed
Anthropic 3.7–4.5 (manual)	`budget_tokens: 2048`	`budget_tokens: 4096`	`budget_tokens: 8192`	`thinking: {type: 'enabled'}`, temperature removed
OpenAI o-series / GPT-5	`reasoning_effort: 'low'`	`reasoning_effort: 'medium'`	`reasoning_effort: 'high'`	Swaps `max_tokens` → `max_completion_tokens`, temperature removed
Gemini 2.5+	`thinking_budget: 1024`	`thinking_budget: 4096`	`thinking_budget: -1` (dynamic)	Via `extra_body.google.thinking_config`
Local Ollama	`think: true`	`think: true`	`think: true`	Same flag regardless of effort level; model controls depth

Claude Opus 4.7 and 4.8 reject temperature entirely - even with reasoning off. Bodega handles this automatically: it strips temperature for those models and retries on a 400 error, so it self-heals for future Claude releases that deprecate temperature before the internal list is updated.

Upgrading from the old Extended Thinking toggle

Before beta.25, there was a boolean Extended Thinking toggle and a token budget field. Both are gone.

On first launch after upgrading, Bodega migrates your settings automatically:

If you had extended thinking on, your reasoning effort is now set to Medium.
The old token budget key is deleted - the budget is now determined internally by provider and effort level.

No action needed. If you want to adjust the level, go to Settings → Models → My Models and expand a model card, or change the global default and save.

This page mirrors the in-app docs hub for app version 1.0.0-beta.32.1. Found something unclear or out of date? Tell us on Discord. New here? Download the free beta and follow along.