Skip to main content
local-firstgoogleBYOLLMhardwareopen-sourcemodel-release

Gemma 4: Google's first Apache 2.0 open model is also its best

Bodega One7 min read
Quick answer

Google released Gemma 4 on April 2 under Apache 2.0. Four model sizes. The 31B ranks #3 among all open models globally and runs on a single RTX 4090 (~19 GB at Q4_K_M). If you avoided Gemma because of the old license terms, that objection is gone. Full Ollama support: ollama run gemma4.

Gemma 3 was a decent model with a frustrating license. Google's own Gemma terms required a separate review, added friction for commercial use, and made teams skip it in favor of Apache 2.0 models like Qwen or Llama. Gemma 4 changes that. Apache 2.0 across all four model sizes. No usage policy review, no separate agreement, no carve-outs. Just a standard open-source license.

The license change alone would be significant. The benchmark numbers make it a bigger story.

What Gemma 4 actually ships

Four model sizes, two architectures. The E2B and E4B come from Google's Gemma-3n architecture. "E" stands for effective parameters. The E4B has roughly 11B total parameters but runs inference like a 4B model, using per-layer embeddings and alternating attention (sliding-window local layers plus full-context global layers) to access a larger knowledge base without the compute cost. It's unusual architecture and the practical result is a 5 GB model that outperforms most 7B models from 2024.

The 26B is a Mixture-of-Experts model: 26B total parameters, 4B active per inference step. The 31B is a standard dense model. All four support text, image, and audio input with text output. Multimodal is built into the architecture, not added on after.

  • E2B: mobile and edge deployment. Not covered here.
  • E4B: ~5 GB VRAM at Q4_K_M. 128K context. Default Ollama pull.
  • 26B MoE: ~14 GB VRAM at Q4_K_M. 256K context. #6 open model on Arena AI.
  • 31B Dense: ~19 GB VRAM at Q4_K_M. 256K context. #3 open model on Arena AI.

The benchmark numbers

The AIME 2026 score is the headline. Gemma 3 27B scored 20.8%. Gemma 4 31B scores 89.2%. That's not a small improvement. It's the largest single-generation reasoning leap in open-source model history. For context: Qwen3.5-27B, the previous top single-GPU model for coding, scores 48.7% on AIME. Gemma 4 31B nearly doubles it.

LiveCodeBench tells a similar story. Gemma 3's 27B scored 29.1%. The 31B scores 80.0%. The 26B MoE sits at approximately 70%. The E4B comes in at 52%, competitive with other models in the 5 GB tier, and ahead of most that ran at this VRAM level a year ago.

On the Arena AI text leaderboard as of April 2026: the 31B ranks #3 among all open models. The 26B MoE ranks #6. Google's own description: "outcompetes models 20x its size." Independent testing confirms it.

GPQA Diamond (scientific reasoning): 84.3%. Multi-needle retrieval over long context: 66.4%, up from Gemma 3's 13.5%. The long-context improvements matter practically. 256K context is only useful if the model can actually reason over the full window, and the retrieval scores suggest it can.

What the Apache 2.0 change actually means

The old Gemma license was Google's own terms. Commercial use was technically allowed but came with restrictions: prohibited use cases defined by reference to other documents, usage policies that could change, and enough ambiguity that legal teams at many organizations flagged it. The practical result: teams building products defaulted to Apache 2.0 models and skipped Gemma entirely.

Apache 2.0 is unambiguous. Use the model, modify it, redistribute it, build products with it. Attribute Google. That's it. No usage policy review needed, no separate agreement, no questions about whether a specific application triggers a restricted-use clause.

For local-first deployments this matters less. You're running the model on your hardware, no cloud inference, no one auditing your use case. But for any team shipping a product that embeds a model, the license change removes a real obstacle.

Running it locally

All three consumer-grade sizes have full Ollama support as of release day. Pick based on your VRAM:

  • E4B (~5 GB): ollama run gemma4. Default pull. RTX 3050, MacBook M1, any 6 GB+ GPU.
  • 26B MoE (~14 GB): ollama run gemma4:26b. RTX 3080 10 GB is borderline; RTX 3080 12 GB or RTX 4070 and up is comfortable.
  • 31B Dense (~19 GB): ollama run gemma4:31b. RTX 3090, RTX 4090, or Mac M2 Max 32 GB+. The one to run if your hardware supports it.

Ollama pulls Q4_K_M by default. If you want higher precision or specific quantization, pull from HuggingFace and run via ollama create. BF16 (full precision) requires 62 GB+ for the 31B. That's multi-GPU or server territory.

LM Studio also supports Gemma 4 through the model browser. The GPU guide covers hardware expectations by tier if you're choosing hardware around these numbers.

Who should upgrade from Gemma 3

If you're running Gemma 3 12B today, the E4B is the obvious upgrade. Similar VRAM footprint, better benchmarks, cleaner license. Drop-in replacement on Ollama.

If you're on Gemma 3 27B, the 26B MoE is the equivalent slot in Gemma 4 with better numbers and slightly lower VRAM. The 31B is worth considering if your GPU can handle 19 GB. The performance gap between the 26B and 31B is larger than the gap between Gemma 3's 12B and 27B.

If you skipped Gemma entirely because of the license: now is the time to test it. The AIME and LCB scores for the 31B are the strongest argument for a single-GPU coding model in this VRAM range.

Using Gemma 4 with Bodega One

BYOLLM means the model choice is yours. Connect Bodega One to Ollama with Gemma 4 pulled. Any of the three sizes work as a provider preset. Local inference, no cloud dependency, nothing leaves your machine.

The 31B is the strongest single-GPU option right now for agentic coding tasks based on LCB and AIME scores. If your GPU can handle 19 GB at Q4_K_M, test it against your current pick. Use the VRAM calculator to verify your setup before pulling.

Waitlist is open at bodegaone.ai. Beta starts May 2026. Full launch July 6, 2026.

Ready to own your tools?

Beta opens May 2026. Complete 14 days and earn a $30 promo code.