Bundle 5.42 MB → 1.55 MB, Featherless cold-start, IDE chat leak fix
Going to start with the perf wins because they're the biggest user-visible change of the cycle.
Main renderer bundle: 5.42 MB to 1.55 MB. A 71 percent reduction.
Two paired fixes, neither would have worked alone.
First fix was a tsconfig miss. The base tsconfig sets module to commonjs because the Electron main process needs CJS. The webpack-specific override was missing. So TypeScript was transpiling every import() call to a synchronous require(). Every React.lazy() in the codebase was a lie. The Monaco editor, the file tree, the terminal, the diff review, all the lazy-loaded panels were getting eagerly bundled into the main chunk. Adding "module: esnext" to tsconfig.webpack.json fixed the import() preservation. Bundle dropped to 2.11 MB.
Second fix was Sentry. The crash reporter was being eagerly imported even when telemetry was disabled or air-gap mode was on. About 1.4 MB of Sentry code in the main bundle that almost never ran. Wrapped it in requestIdleCallback and a dynamic import so it loads after first paint, in its own async chunk. Bundle dropped to 1.55 MB.
Also added perf instrumentation to the backend (endpoint timings, breadcrumb tracing, baseline metrics) so we have data to chase the next round of optimizations from.
Featherless cold-start. Big multi-phase feature this cycle.
Featherless is serverless inference. Their models live in a hibernated state when nobody's using them, and cold-starting a 70B model can take 30 to 60 seconds. Before this cycle, you'd send a message into a cold model, hit the timeout, see a red error, and have no idea what just happened.
Phase 1: stretched the warmup timeout to 30 minutes and added an elapsed-time counter to the warming UI. At least you know it's working.
Phase 2: full coordinator. State machine that tracks queued, requesting, loading, ready, and verified states per model. Dedicated SSE channel at /api/featherless/warmup/progress streams stage transitions and 10-second loading-progress ticks. Persistent WarmingBanner at the top of the chat that shows current stage and dismisses cleanly. SQLite persistence so the state survives restart. Send button is disabled while warming so you can't accidentally fire into a cold model.
Layer 1 warmup. When you change the active model (Settings, Model Roles picker), Bodega fires a 1-token request in the background. By the time you actually send your first real message, the model is hot.
Round 26 verified-warm. The banner doesn't just go away when the warmup probe succeeds. It stays until the first real chat completion goes through cleanly, because Featherless can lie about ready state (the probe at 1-token context can pass while a realistic 4K-context request still cold-starts).
Warmup persistence queries now scoped by user_id per Sentinel audit.
The IDE chat leaking into chat mode bug. The one that's been around for weeks.
Sending a message in code mode would also write it to the chat session, and both panels would render the same conversation. Identical content in both modes. Joe screenshotted it across multiple test instances. Defied static analysis for hours.
Both panels share the same useChat hook under the hood. One line in useChatSend used a nullish coalesce as a fallback: const currentSessionId = sessionId ?? activeSessionId.
For chat mode's useChat instance: sessionId IS activeSessionId. The ?? never fires. Fine.
For the IDE Agent panel's useChat with no code session yet: sessionId is null, so the ?? returns activeSessionId. That's the chat session's id. So the code-mode send wrote to the chat session in the database. The backend WebSocket broadcast then rendered it in chat. Meanwhile the optimistic addMessage rendered it in code. Same conversation in both panels.
Took console.warn instrumentation on the slice setters with stack traces to pin it down. Once we saw setLocalMessages firing with sessionType=code and chat content, the call chain pointed straight at the ?? fallback.
Fix is gating the fallback to sessionType chat only. Defense-in-depth check added on the WebSocket handler: now triple-checks sid against activeSessionId AND state.sessions AND not in state.codeSessions. Even if any future code path sets activeSessionId to a code session id, the type check structurally blocks the leak.
Provider switching cleanup. 5 backend services were doing legacy single-OpenAI lookups.
The /llm/running-models poll was reading llm.openai_base_url for every cloud preset. If you used OpenAI then switched to Featherless or DeepSeek, it kept hammering api.openai.com every 3 seconds with no key or the wrong key. Now routes through the per-preset lookup helpers like the chat path already does. Same fix in four other backend paths: embedding, STT, test-connection, and the deferred chat-stream reconfigure.
Stale role models on preset switch. We were only clearing 4 of the 11 role keys, so research, debug, and advisor panels could carry stale model names across a switch. Now clears all 11. Plus pre-clears the available-model list and refetches health right after the flip, so role pickers repopulate within 200ms instead of waiting for the next 30s health tick.
Settings panel was zeroing out role model defaults on save. A previous perf optimization had reduced the settings prop to a 3-key subset for re-render perf, but the hydration code was reading the subset as if it were the full settings. populateFrom now reads the full snapshot inside the effect.
Featherless WarmingBanner was flashing for every cloud preset switch (qwen, deepseek, etc.). Now only fires for Featherless, which is the only preset with actual cold-start cost.
STOP READING FILES nudge. The agent was misclassifying "what are the contents of source-of-truth folder?" as a simple knowledge question because "what are" and "folder" weren't in the exploration intent classifier. DeepSeek looped on the contradiction for 440 seconds before producing anything. Enriched both VERBS and TARGETS lists. Regression tests locked in the prompts that hit this.
DeepSeek raw function_calls XML. Anthropic-style plural "function_calls" form was leaking as visible text in chat mode because the stripper only knew about the singular "function_call". Fixed plus bare-word and empty-block variants.
Qwen "/think" directive at the start of every response. Qwen via DashScope echoes its thinking-mode prefix at the start of every code-mode message. The stream stripper now silently eats it. Mid-stream "/think" is still treated as content so prose like "the /think directive" survives.
Featherless DeepSeek-V3 emits python tool-call fences. The model writes its tool calls as python code blocks instead of using OpenAI native tool_calls. The stripper now removes them so you don't see broken python. Doesn't make the tool actually execute though, that's a deeper fix for tomorrow.
Smaller stuff. Chat input was treating the "What can you help me with?" prompt as real input, so clicking in the middle and typing concatenated. Now it auto-selects the prefill so your first keystroke replaces. Retry button on error banners was passing a React SyntheticEvent to the retry handler as a model name. Wrapped properly. API key field looked unfilled when actually saved. Now shows "saved, paste to replace" and a green check hint.
Code quality and refactors. We have a hard line limit on file sizes (700L services, 400L components, 300L route handlers). 6 files crossed limits this cycle and got split:
- OpenAIProvider: 843L to 670L. Extracted model-cap and message-converter.
- LLMService: 736L to 699L. Extracted preset-lookup helpers.
- routes/llm.ts: 452L to 126L. Extracted health, warmup, and test-connection into sibling files.
- ProviderCard: split out useProviderBaseUrl and useProviderApiKey hooks.
- MyModelsTab: 445L to 395L. Extracted ModelRow and MultiModelVramWarning.
- GuidedTourOverlay: 405L to 326L. Extracted tourTooltipPosition.
configPath.ts also got a shared candidate builder extracted, to handle 4-level-deep service paths after install services moved into subdirs.
Security audits this cycle: 4xx body.error XSS audit. Added a convention test that no error message from any 4xx body field is rendered as innerHTML. MCP OAuth 2.1 audit. Documented current state, gap analysis, effort estimate for full compliance. Sentinel LOW-2 cleanup. Replaced remaining String(err) patterns with the proper err instanceof Error check, matching the convention used everywhere else.
Three quality gaps deferred to tomorrow, with notes:
- Qwen via DashScope doesn't invoke tools at all. The model claims folders don't exist instead of reading them. Backend log shows tools are declared and sent in the request, but DashScope's OpenAI-compat endpoint might be silently dropping them or wanting a different shape. Needs a captured network payload.
- Featherless DeepSeek-V3 tool execution. Today's fix stops the broken python output from leaking to the user, but the tool itself still doesn't run. Either force the prompt template to push XML format, or add a parser for python-style calls.
- file_system.read pagination. Tool result is capped at 16KB to prevent context blowout, so an 85KB file like our CHANGELOG can't be fully read. Adding offset and length params to the tool schema.
Pagination first tomorrow since it's the smallest scope. Good night.