Skip to main content

Start here

Hardware & System Requirements

Bodega One runs entirely on your hardware - no cloud required. This section covers what Bodega detects at startup, how it uses that data to recommend models and cap context, and what hardware gets you which results.

Minimum - the app runs, CPU-only inference with small models:

OS Windows 10+, macOS 12+, Ubuntu 22.04+
RAM 8 GB
Storage 10 GB free
GPU None required (CPU-only mode works with 1–4B models)
Internet Not required

Recommended - comfortable agentic use with mid-size models:

RAM 16–32 GB
GPU NVIDIA with 8+ GB VRAM (RTX 3060 or better; RTX 5070+ ideal)
Storage 50 GB free

If you have no discrete GPU, Bodega detects CPU-only mode and caps context based on system RAM. You'll see CPU mode - context capped for sustainable inference in the AI panels. 1–4B models are usable; anything larger will be slow.

What Bodega detects at startup

On every launch, Bodega probes your GPU VRAM using the systeminformation library. The result feeds three things:

  • Model recommendations - the Discover tab's Fits My GPU filter and first-run model cards
  • Context window ceiling - the max tokens Bodega will send to a local model (written to hardware.local_context_ceiling)
  • Cloud Boost escalation - whether your hardware tier triggers automatic escalation on reasoning-heavy tasks

The detection runs once at startup. VRAM capacity is cached for the session. Free VRAM refreshes every 3 seconds so that a model already loaded is correctly accounted for.

Apple Silicon: Bodega uses total system RAM minus 3 GB as the effective VRAM pool. This matches how unified memory actually works - your GPU and CPU share the same physical pool. The onboarding badge shows your chip name and memory figure.

No GPU: vramGB is reported as 0. Bodega sets isCpuOnly = true and switches to RAM-based context ceilings.

Multi-GPU: Bodega uses the largest single GPU's VRAM. Ollama loads models onto one GPU, not split across cards.

Your System card

A live hardware readout appears at the top of the Hardware section in the Help panel.

  1. Open the Help panel: gear icon (top right) → Help
  2. Select Hardware in the left rail
  3. The Your System card at the top shows: CPU model, total RAM, GPU model, VRAM, and a one-line capability summary

The summary uses these thresholds:

VRAM Summary
40 GB+ Can run 70B models comfortably
20–40 GB Can run 32B models comfortably
14–20 GB Can run 14B models comfortably
7–14 GB Can run 7–8B models comfortably
Under 7 GB Limited VRAM - 7B models with short context
No GPU CPU inference only (7–8B)

The card fetches from the backend on mount. If the backend isn't running when you open it, the card won't appear - no error is shown.

Two tier systems (and why they differ)

Bodega uses two separate hardware classification systems. They serve different purposes and have different thresholds - do not confuse them.

Discover tab tiers (used for model recommendations and the Fits My GPU filter):

Tier VRAM
tiny ≤ 4 GB
small 4–8 GB
medium 8–16 GB
large 16–24 GB
xl 24+ GB

Routing tiers (used by the agentic loop for model selection and Cloud Boost escalation decisions):

Tier VRAM Largest model at Q4_K_M
minimal < 6 GB up to 3B
budget 6–10 GB up to 7B
mid 10–18 GB up to 13B
high 18–32 GB up to 30B
prosumer 32+ GB 70B+

The routing tiers also expose two capability flags: canRunSmartTier (13B+ accessible, requires 10 GB) and canRunCodeTier (code-specialized 7B models, requires 6 GB).

VRAM sizing: the Q4_K_M rule of thumb

For Q4_K_M quantization - the default in Bodega's managed llama.cpp mode - model weight size in VRAM is roughly:

VRAM (GB) ≈ parameters (B) × 0.55 + KV cache + ~1 GB overhead

Actual sizes from Bodega's catalog (Q4_K_M):

Model size VRAM (weights only)
1.5B 1.1 GB
3B 1.9 GB
4B 2.5 GB
7B 4.7 GB
8B 5.2 GB
14B 9.0 GB
24B (MoE) 14.3 GB
30B MoE 18.6 GB
32B 19.9 GB
70B 42.5 GB

KV cache is the hidden cost. An 8B model's KV cache grows from about 1.2 GB at 8K tokens to ~20 GB at 128K tokens. Context length is where VRAM gets consumed faster than the model weights table suggests. Bodega's context ceilings are set conservatively to prevent thrashing during agentic runs.

Context window ceilings

After VRAM detection, Bodega computes a context ceiling and writes it to hardware.local_context_ceiling. This caps how many tokens the agentic loop sends to local models.

GPU path (VRAM-based):

Effective VRAM Context ceiling
≤ 4 GB 8,192 tokens
4–8 GB 16,384 tokens
8–24 GB 32,768 tokens
24+ GB 65,536 tokens

CPU-only path (RAM-based, no discrete GPU):

System RAM Context ceiling
≤ 8 GB 4,096 tokens
8–16 GB 8,192 tokens
16–32 GB 16,384 tokens
32+ GB 32,768 tokens

The ceiling calculation uses max(free VRAM, total VRAM − 4 GB) rather than free VRAM alone. This prevents a model that's already loaded from causing the probe to misclassify your card. For example: a 32 GB GPU with a 26B model resident still probes as a 28 GB card for ceiling purposes, not a 4 GB card.

To raise the ceiling manually: Settings → (search) context_window_cap → set llm.context_window_cap to a positive integer. Set it to 0 to return to automatic. The budget bar in each AI panel reflects the active ceiling.

Reducing KV cache memory with Ollama

If you're using Ollama as your local provider and running into VRAM pressure at longer contexts, you can cut KV cache size by roughly 50% by setting an environment variable before launching Ollama:

export OLLAMA_KV_CACHE_TYPE=q8_0

On Windows, set this in System Properties → Environment Variables before starting Ollama.

This is an Ollama setting, not a Bodega setting - Bodega has no UI control for it. It also has no effect in llama.cpp managed mode (where Bodega controls the server directly).

GPU value picks

LLM inference is memory-bandwidth-limited, not compute-limited. VRAM capacity and bandwidth matter more than GPU core count.

Pick GPU VRAM Notes
Budget entry Intel Arc B580 12 GB $249; solid for 7–8B models
Best value used RTX 3090 24 GB $800–950; handles 32B at ~112 tok/s
Best value new RTX 5070 12 GB GDDR7 $550–750; fast bandwidth
No compromises RTX 5090 32 GB GDDR7 $2,000+; 70B models comfortably

AMD: The RX 7900 XTX has 24 GB VRAM but ROCm software support on Windows is inconsistent. On Linux, AMD gets the ROCm llama.cpp binary. On Windows, AMD gets the Vulkan build (HIP Radeon support is experimental).

NVIDIA on Linux: Bodega ships a Vulkan llama.cpp binary for Linux NVIDIA - not CUDA. CUDA binaries for Linux are not included in Bodega's managed installation.

Apple Silicon

On M-series Macs, the GPU and CPU share the same memory pool. Bodega detects Apple Silicon via process.platform === 'darwin' && process.arch === 'arm64' and uses total RAM − 3 GB as the effective VRAM figure.

Approximate memory bandwidth by chip (from Apple's published specs):

Chip Bandwidth
M4 (base, 16–32 GB) ~120 GB/s
M4 Pro (24–48 GB) ~273 GB/s
M4 Max (64–128 GB) ~546 GB/s

Higher bandwidth means faster token generation - bandwidth scales matter more than raw VRAM here.

When you install llama.cpp through Bodega on macOS, it installs the macos-arm64 binary, which uses Metal acceleration. No configuration required.

If you're using Ollama on Apple Silicon, Ollama handles MLX acceleration automatically. MLX runs roughly 20–30% faster than llama.cpp for inference on Apple Silicon.

Free memory note: macOS manages unified memory dynamically (compression, paging). The free memory reading Bodega sees can fluctuate. The probe uses os.freemem() with a 1 GB kernel reserve as a proxy.

llama.cpp binary selection

When you install llama.cpp through Bodega (first-run onboarding or Settings → Providers → llama.cpp → Install), Bodega picks the right binary for your system automatically:

Platform Condition Binary
Windows NVIDIA Driver reports CUDA ≥ 13.3 CUDA 13
Windows NVIDIA Driver reports CUDA ≥ 12.4 CUDA 12
Windows NVIDIA No parseable CUDA version Vulkan
Windows AMD Any Vulkan
Linux NVIDIA Any Vulkan
Linux AMD ROCm installed ROCm
Linux AMD ROCm absent CPU fallback
macOS arm64 Any Metal (macos-arm64)
CPU-only fallback Any CPU

Bodega reads the CUDA version directly from nvidia-smi output (not inferred from driver version numbers). If nvidia-smi times out or is absent, it falls back to driver major version, then to Vulkan or CPU. The install is never blocked by a missing nvidia-smi.

Air-gap mode: the binary install endpoint returns 403. Use llamacpp.binary_path_override in Settings to point to a manually downloaded binary.

Cloud Boost as a no-GPU path

Cloud Boost lets you configure a cloud provider (OpenAI, Anthropic, and others) as a secondary fallback. On low-end hardware, it activates automatically for tasks that need it.

To configure:

  1. Go to Settings → Cloud Boost
  2. Enable the toggle
  3. Choose a cloud provider
  4. Enter your API key
  5. Set optional daily and monthly budget limits

Auto-escalation triggers - both conditions must be true for trigger B:

  • (A) The local model's QEL verification score falls below 50 for two consecutive iterations in the same session, OR
  • (B) Hardware tier is minimal (< 6 GB VRAM) or budget (6–10 GB VRAM) AND the current task is classified as reasoning-heavy

Simple conversational messages on minimal hardware do not trigger auto-escalation. The task has to be a planning or reasoning task.

You can also toggle Cloud Boost manually in the chat input area.

Cost tracking: gear icon → Usage Dashboard

Air-gap mode: Cloud Boost is fully blocked when air-gap is enabled (Settings → Privacy & Safety → Air-Gap).

Free VRAM detection limits

Bodega can only read free VRAM on systems where the GPU driver reports the memoryFree field via systeminformation. This works on most Windows NVIDIA and Linux AMD (ROCm) systems.

On Apple Silicon and some Windows AMD configurations, free VRAM is not reported separately - Bodega uses total VRAM as the effective figure.

When free VRAM isn't available, model fit scoring uses total VRAM. This means the Discover tab may show models as fitting even when another model is already resident. The requiresEviction indicator on a model card means it fits in your total VRAM but would require the current model to be unloaded first.

Keyboard shortcuts

KeysAction
Gear icon → Help → HardwareOpen Hardware section with Your System card

This page mirrors the in-app docs hub for app version 1.0.0-beta.26.1. Found something unclear or out of date? Tell us on Discord. New here? Download the free beta and follow along.