Bodega One Code v1.0.0-beta.18.1 -- the full story
This was supposed to be a quick two-bug hotfix this morning. By midnight it was 26 rounds, 84 files, ~3,800 lines, and a complete user-experience overhaul of the Featherless integration. Here's what happened and why each piece matters.
The original two-bug hotfix
1. Self-hosted providers were losing their custom Base URL. If you ran llama.cpp, LM Studio, vLLM, or any OpenAI-compatible local server on a non-default URL (or a different port), Bodega would let you type and save that URL -- but the moment you navigated away from the Models settings page, the value reverted to localhost. Discord users hit this within hours of beta.18 shipping.
Root cause: the URL was being read from a generic llm.openai_base_url setting that every preset shared, but only the active preset's "Test Connection" button wrote to it. Switching presets on the menu wiped the previously-saved URL.
Fix: a new lookupBaseUrlForPreset(presetId) helper generalizes the qwen/kimi region-override pattern so ANY OpenAI-compat preset can override its base URL via llm.<presetId>_base_url. The legacy single-key fallbacks are preserved so pre-Phase-2 setups don't break.
2. No native preset for Featherless AI. Featherless's onboarding page was explicitly branded "for Bodega users" with rc_* API keys, but Bodega's preset list didn't include them. Users had to set them up as a Custom OpenAI provider with a typed-in base URL -- and once they did, they hit bug #1.
Fix: Featherless added as a first-class preset with proper Bearer auth, https://api.featherless.ai/v1 default URL, the right setupTip pointing to featherless.ai/account, and dedicated llm.featherless_api_key storage. Wired into cloud-key validation, the V2 onboarding picker, the Settings → Cloud API Keys section, and the Cloud Boost provider picker.
Then live testing happened
I tested with my own Featherless key. The first thing it did was freeze.
3. The 6,700-model freeze. Featherless's /v1/models endpoint returns every HuggingFace-mirrored model they host -- 16,275 entries on a free trial, more on paid plans. Our OpenAIProvider.listModels was trying to ingest all of them, ship per-model profiles for each in the /llm/health response, and let the model picker render a 16k-entry datalist. Result: Windows socket pool drained, /llm/health payload hit ~1MB, model picker would auto-select 000ADI/Qwen2.5-...-Gensyn-Swarm-grazing_locust (the alphabetically-first random fine-tune), and the renderer locked up.
Three layers of fix:
- capListedModels at the provider boundary caps the response at 500. When upstream exceeds the cap, models from a curated allowlist of foundation-model orgs (meta-llama, deepseek-ai, Qwen, mistralai, google, microsoft, NousResearch, HuggingFaceH4, CohereForAI, allenai, tiiuae, 01-ai, WizardLMTeam, moonshotai) are always retained; the rest fill remaining slots alphabetically.
- /llm/health slimmed: when the model count exceeds 50, the response ships only model names. Per-model profiles are lazy-fetched via /api/models/:name/info on demand. Net JSON shrinks from ~1MB to ~10KB.
- pickDefaultModelForPreset learned to prefer curated foundation models. CURATED_CLOUD_MODELS.featherless seeds 10 hand-picked entries (DeepSeek-V3-0324, Qwen 2.5-Coder-32B, Llama 3.1-70B-Instruct, etc.) so first-run users land on a real model.
4. The 60fps re-render loop. After a Featherless reconfigure, the renderer would suddenly hit 49% sustained CPU. Profiler audit traced it to a feedback loop: /llm/health → setModelProfiles({...spread}) → every Zustand subscriber re-renders → one of those re-renders triggers another /llm/health → loop.
Fix: removed the lazy-fetch from useLLMHealthCheck. Empty-dict guards added to setModelProfiles and setRecommendedSettings so an empty {} (which the slim path sends) doesn't trigger spurious re-renders. CPU dropped from 49% to 1.3%.
5. The whole-settings selector cascade (BUG-DM-15). Nine components were subscribing to the entire settings object via useStore((s) => s.settings). Any change to any of the 50+ settings keys re-rendered all of them. Refactored to per-key selectors so each component only re-renders when the keys it actually reads change. ~10x reduction in re-renders during typical settings churn.
6. The SQLite race conditions. Two distinct races:
- SettingsService.setMany was firing concurrent BEGIN statements during onboarding, hitting "cannot start a transaction within a transaction." Fix: serialized via internal queue.
- Cross-service: SettingsService and MessageService could both try BEGIN at the same time. Fix: BEGIN-retry wrapper that catches the race and retries with backoff.
7. Eight Featherless models were OAuth-gated. Live API testing revealed that 8 of the 10 originally-curated Featherless models returned 403 with model_gated_needs_oauth -- they require HuggingFace account-linking that Bodega can't do. meta-llama/* (every Llama model) and google/* (every Gemma model) were affected.
Fix: rewrote the curated list with verified-working IDs only (DeepSeek-V3-0324, Qwen 2.5-72B-Instruct, Qwen 2.5-Coder-32B, Kimi-K2, Hermes-3-Llama-3.1-70B, etc.) and added an OAUTH_REQUIRED_HF_ORGS filter at the boundary so meta-llama and google models never reach the dropdown at all.
Then the silent-fail bug surfaced
This is the one that took the longest to crack. After onboarding completed, the user would press Enter on the prefilled "What can you help me with?" message, the composer would clear, and absolutely nothing else would happen. No error banner, no chat session in the sidebar, no response. The send was reaching the backend, the session was being created, but the user saw silence.
Twenty rounds of progressively-deeper retry mechanics shipped throughout the day. Each round helped a little. None of them actually fixed it.
The root cause turned out to be embarrassingly simple: ChatErrorBanner only existed in the active-chat layout, never in the empty-state ChatGreeting. When the first send failed (because Featherless's cold-start blocked the backend's event loop while parsing the 500-model list), the error fired correctly -- but it had nowhere to render. The user saw nothing because there was no UI surface to put the error on.
Fix: the banner now renders in both empty + active states. The error has somewhere to go. The user sees a clear "Request timed out -- the model may still be loading. Try again in a moment." with a Retry button instead of mysterious silence.
Sub-fixes that landed alongside:
- Code mode's ErrorBanner was rendering errors raw ("signal timed out") because it diverged from chat mode's ChatErrorBanner which used formatErrorMessage. Wired through.
- Express keepAliveTimeout bumped from Node's 5s default to 65s. The 5s default caused Chromium's keep-alive socket pool to try reusing FIN-acked sockets during the post-onboarding settle, silently hanging follow-up requests.
- Iteration-cap warning footer ("Reached the iteration limit. Response may be incomplete.") was appearing on pure-text conversational answers in code mode. Now only appended when the model actually used tools.
- Toast on first retry ("Connection slow -- reconnecting...") so the previously-silent 15s window gives visible feedback.
Then the cold-start UX problem
Even with the banner visible, Featherless's serverless cold-start was 30 seconds to 5 minutes on a busy night. Users would see "Request timed out", click Retry, see "Request timed out" again, click Retry again. The Retry was working but Featherless wasn't responding fast enough to feel like the app worked.
The proper architectural fix is to move LLM calls off the backend's main event loop into a worker thread (planned for beta.19, ~3-5 days). For tonight, two layers of mitigation:
Layer 1 -- Pre-warm on onboarding. The moment the user finishes cloud onboarding (5-10 seconds before they actually press Enter), fire a 1-token completion request to the chosen model. Featherless spins up its GPU during the welcome-screen seconds. By the time the user types and hits Enter, the model is warm.
New backend endpoint: POST /api/llm/warmup -- fire-and-forget, returns 202 immediately, runs the warmup in the background. New frontend hook useModelWarmup watches activeRoutedModel in the store and re-fires warmup whenever the user picks a different model from the dropdown or pins a new role-model in Settings. 60s same-model dedup so we don't spam Featherless.
Layer 2 -- Backend health cache + renderer health-poll pause. /llm/health now caches its response for 5 seconds with in-flight dedup. Coalesces the burst of health calls during onboarding (Providers tab + FIM + Embeddings + the main poll all fire on mount) plus the 30s steady-state poll. Cache key includes preset so reconfigure invalidates implicitly.
useLLMHealthCheck skips the 30s poll while a chat or agent stream is active. Eliminates the "Cannot reach Featherless" yellow banner flickering mid-chat that previously fired every 30s when the backend's event loop was busy awaiting Featherless's response.
Round 26 -- The persistent warming banner. Even with Layer 1 + 2, the cold-start window was still invisible. Users saw nothing happen, didn't know if the app was broken or just waiting. The transient Cannot reach Featherless banner flickering on/off REINFORCED the broken perception.
A new persistent banner sits between TopBar and the mode layout: "• Warming up DeepSeek-V3-0324 -- first send may take 30-90 seconds while the model loads on the provider's GPU." It stays up from the moment the warmup fires until either /llm/health returns connected or a chat completion succeeds. The composer stays enabled. The user knows the truth.
Then the security audit
A Sentinel agent was dispatched in parallel with the live testing to audit the day's changes. It found three HIGH-severity findings, all in the SSRF guard added for BUG-DM-18:
- IPv6-mapped IPv4 (::ffff:127.0.0.1) wasn't being matched. Node's URL constructor returns [::ffff:7f00:1] for that host, which the original isPrivateHost regex didn't catch.
- IPv6 ULA (fc00::/7) and link-local (fe80::/10) ranges weren't checked at all.
- Trailing-dot FQDN form (localhost.) bypassed the exact-string match.
Fixed all three plus added a length cap on the warmup endpoint (was logging unbounded user-controlled strings via pino).
The same Sentinel pass also verified BUG-DM-16 (prefix-match boundary in pickDefaultModelForPreset -- was matching Qwen2.5-7B against Qwen2.5-7B-Vision-Instruct instead of Qwen2.5-7B-Instruct-FP8) and BUG-DM-17 (length cap on /api/models/:name/info path param) close their respective holes.
30 new netGuards unit tests + 9 new cloud-key-validate integration tests cover the bypasses.
Then the Models tab UI polish
Live testing surfaced two cosmetic issues:
The "Search models..." input at the top of the Models tab was rendering on all three sub-tabs (Discover, My Models, Providers). Useful on Discover (you're browsing a catalog). Redundant on My Models (the eight inline role pickers ARE search inputs with shared autocomplete). Useless on Providers (at most a dozen presets). Now only renders on Discover.
When you picked a model from the Default role picker (the only <input list>-based picker in My Models), the input box turned white -- Chromium applies a :-webkit-autofill background highlight on inputs that get a value via native datalist autocomplete, and our dark theme hadn't overridden it. Fixed with the standard inset box-shadow CSS trick that overrides the autofill background.
What didn't make it
- Worker-thread refactor for LLM calls -- the proper architectural fix for the cold-start UX issue. Moves LLMService and providers into a worker_threads Worker so the Express main thread stays responsive during LLM calls. Eliminates the entire class of "backend looks dead during chat" bugs. ~3-5 days, planned for beta.19.
- File splits -- LLMService.ts (863L), useFirstRunMachine.ts (663L), ProviderCard.tsx (465L) are all over their respective limits. Beta.19.
- Warmup-debounce -- useModelWarmup currently fires twice during onboarding because of a transient state during the reconfigure cycle. Wastes one Featherless request per onboard. Trivial fix (~10 lines), beta.19.
- Optimistic user-message-shows-immediately in empty state -- currently the typed text disappears the moment Send is pressed, before the new chat session UI renders. Should show in the chat area immediately. Filed for beta.19.
Why it took so long
The root-cause fix for the silent-fail bug (banner-in-empty-state) was a 30-line change. It took ~10 hours to get there because the symptom looked like a network/race issue -- POST timing out, optimistic message reconciling wrong, abort signal firing, fetch never reaching the backend. Twenty rounds of retry mechanics, keep-alive tweaks, in-flight dedup, and abort handling each helped a little but didn't fix it. The actual cause was upstream of all that: there was no UI surface to render the error on.
Web research finally gave the angle to look at: "what if the error IS firing and we just can't see it?" Tracing render trees instead of network paths landed the fix in 30 minutes after that.
The lesson, if there is one: when twenty rounds of patches each seem to almost-work but don't quite, the failure mode probably isn't where you think it is. Stop patching, trace the actual symptom path from the bottom up.
That's beta.18.1. Beta.19 starts tomorrow with the worker-thread refactor.