How do I set up DeepSeek locally with Bodega One Code?

Pull the model in Ollama using ollama pull deepseek-r1:14b, then open Bodega One Code and navigate to Settings > Providers > Ollama. The model will automatically appear in the model selector. The whole process takes under 10 minutes once Ollama is installed.

What's the difference between DeepSeek-R1 and DeepSeek-V3?

DeepSeek-R1 is a reasoning model that thinks before answering, making it slower but stronger on complex debugging and architecture decisions. DeepSeek-V3 is faster and better for general-purpose code, but the full 671B version requires multi-GPU setups. The distilled 14B versions of both run on consumer hardware.

Which DeepSeek model should I run with 12GB of VRAM?

DeepSeek-R1 14B quantized as Q4_K_M is the recommended choice for 12GB VRAM. It offers noticeably stronger reasoning capability than the 7B versions and generates tokens at 15-30 tokens per second on typical GPUs, making it fast enough for interactive chat and agent loops.

What are thinking tokens and why do they appear in DeepSeek-R1 responses?

DeepSeek-R1 generates a chain-of-thought reasoning process before answering, which shows up as think blocks in responses. Some Ollama versions strip these automatically while others pass them through. For Bodega One Code's agent loop, this usually isn't a problem since the agent extracts the final answer from the response.

How to run DeepSeek locally with Bodega One Code

Quick answer

To run DeepSeek locally with Bodega One Code: pull the model in Ollama (ollama pull deepseek-r1:14b), then in Bodega One Code go to Settings → Providers → Ollama. DeepSeek-R1 14B runs well on 12GB+ VRAM. See all 15+ supported providers.

DeepSeek attracted attention in early 2025 with models that matched frontier performance at a fraction of the training cost. DeepSeek-R1 in particular, a reasoning model trained with reinforcement learning, showed that the compute gap between US and Chinese AI labs was smaller than most had assumed.

For developers running local AI, DeepSeek's models are compelling for a specific reason: they are open-weight, quantization-friendly, and capable on coding and reasoning tasks. Here's how to run them with Bodega One Code.

Which DeepSeek model should you use?

DeepSeek has released several model families. For coding work:

DeepSeek-R1: A reasoning model. Slower (it “thinks” before answering), but noticeably stronger on complex tasks: debugging, architecture decisions, multi-step code generation. Available in 1.5B, 7B, 8B, 14B, 32B, and 70B parameter sizes.
DeepSeek-V3: A general-purpose model. Faster than R1, strong on code. The full V3 is 671B parameters (MoE) and only runs well on multi-GPU setups. The distilled versions (7B, 8B, 14B) run on consumer hardware.
DeepSeek-Coder-V2: An earlier coding-specific model. Still solid, but DeepSeek-R1 and V3 have largely superseded it for general coding tasks.

Which size to run by VRAM

8GB VRAM: DeepSeek-R1 7B or 8B (Q4_K_M). Functional, good for everyday tasks.
12GB VRAM: DeepSeek-R1 14B (Q4_K_M). Noticeably stronger reasoning.
16-24GB VRAM: DeepSeek-R1 32B or DeepSeek-V3 distilled 14B, strong on complex code.
48GB+ VRAM: DeepSeek-R1 70B, approaches frontier performance locally.
Apple Silicon 16GB: DeepSeek-R1 7B or 8B MLX, good balance of speed and quality.

For a full hardware reference, see the GPU guide for local AI.

Option 1: Run via Ollama (recommended)

Ollama has DeepSeek-R1 in its model library. Pull a specific size:

ollama pull deepseek-r1:7b (7B parameter model)
ollama pull deepseek-r1:14b (14B parameter model)
ollama pull deepseek-r1:32b (32B parameter model, needs ~20GB+ VRAM)

Ollama handles quantization automatically. The default pull gives you Q4_K_M, which is a good balance of quality and size.

Once the model is pulled and Ollama is running, connect Bodega One Code: Settings → Providers → Ollama. The model will appear in the model selector.

Option 2: Run via LM Studio

LM Studio's model browser includes DeepSeek-R1 in multiple sizes. Search for “DeepSeek-R1” in the model browser, pick your size, and download. Load it and start the local server. Then connect Bodega One Code to LM Studio at http://localhost:1234/v1.

For the full LM Studio setup guide, see LM Studio + Bodega One Code setup.

A note on the thinking tokens

DeepSeek-R1 is a reasoning model. It generates a chain of thought before giving its final answer. This shows up in responses as a <think>...</think> block before the actual answer. Some Ollama versions strip this automatically; others pass it through.

For the agentic coding loop in Bodega One Code, this usually isn't a problem. The agent extracts the final answer from the response. But if you see thinking tokens in unexpected places in the UI, it's worth checking whether your Ollama version handles the R1 reasoning format correctly.

Performance expectations

DeepSeek-R1 14B is a strong all-round model for coding. On a 12GB VRAM machine with Ollama, expect token generation in the 15-30 tokens/second range depending on GPU. That's fast enough for interactive chat and agent loops without feeling slow.

For comparison: Qwen2.5-Coder-32B at the same quality level requires ~22GB VRAM. If you have less than 16GB VRAM and want strong coding performance, DeepSeek-R1 14B is worth trying first.

See the full BYOLLM provider list for all local and cloud options supported in Bodega One Code. If you want to try cloud DeepSeek (via API) for comparison, the custom provider preset supports any OpenAI-compatible endpoint.

Common questions

How do I set up DeepSeek locally with Bodega One Code?: Pull the model in Ollama using ollama pull deepseek-r1:14b, then open Bodega One Code and navigate to Settings > Providers > Ollama. The model will automatically appear in the model selector. The whole process takes under 10 minutes once Ollama is installed.
What's the difference between DeepSeek-R1 and DeepSeek-V3?: DeepSeek-R1 is a reasoning model that thinks before answering, making it slower but stronger on complex debugging and architecture decisions. DeepSeek-V3 is faster and better for general-purpose code, but the full 671B version requires multi-GPU setups. The distilled 14B versions of both run on consumer hardware.
Which DeepSeek model should I run with 12GB of VRAM?: DeepSeek-R1 14B quantized as Q4_K_M is the recommended choice for 12GB VRAM. It offers noticeably stronger reasoning capability than the 7B versions and generates tokens at 15-30 tokens per second on typical GPUs, making it fast enough for interactive chat and agent loops.
What are thinking tokens and why do they appear in DeepSeek-R1 responses?: DeepSeek-R1 generates a chain-of-thought reasoning process before answering, which shows up as think blocks in responses. Some Ollama versions strip these automatically while others pass them through. For Bodega One Code's agent loop, this usually isn't a problem since the agent extracts the final answer from the response.

Written by the Bodega One team. We build Bodega One Code, the local-first AI IDE, and we write here about local models, AI costs, and what we learn shipping it. More about the team and why we build local-first on the about page.

Stay in the loop

Build-in-public updates, model picks, and Copilot/Cursor news as it breaks.

Ready to own your tools?

Beta is free and open to everyone. Download free.

Download Free →See Pricing

← Back to the blog