Skip to main content

Build Log

The messy, honest version of development.

Not polished release notes. Real decisions, wrong turns, and why things took longer than expected. Updated as we build.

Build log entries

  1. Loop write guard, approval card fix, E2E Round 2

    Two things were driving me crazy about the agentic loop. One: the agent would write a file, re-verify, decide it wasn't done, and write the file again. And again. Same content, same path, different iteration. Two: approval cards would appear mid-stream and you'd never see them because they rendered outside the scroll container.

    Both fixed. The repeat-write guard now tracks writes per file path across the loop -- after 3 writes to the same file in a single session, it injects a system message, marks the deliverable satisfied, and breaks the cycle. Approval cards moved inside the scroll container so they actually travel with the content. E2E Round 2 ran 31 tests. Found 11 bugs across todo_write registration, model routing, panel scroll, web search iteration caps, and VRAM warning noise. All 11 fixed and committed.

  2. Chat → Runtime → Loop → QEL

    Shipped the Runtime Layer today. This one is more architectural than visible, but it matters.

    The problem: before each agentic loop, there were ~150 lines of scattered conditionals spread across the chat orchestrator. Is this session in a panel? What iteration budget applies? Does this model support tools? What happens after 3 consecutive failures? These questions were answered in different places with inconsistent logic.

    RuntimeLayer.ts consolidates all of that into a single typed LoopPolicy that gets produced before the loop starts. The classify() call looks at the request, the model's capability profile, the panel context, and the session failure history -- then produces a LoopPolicy with a single executionLane value.

    Four execution lanes:

    • advisory -- bypasses the loop entirely, single LLM call, no tools. Fast. For panels that just need a quick answer.
    • guided -- up to 8 iterations, limited tool set. For supervised agent work.
    • restricted -- panel-constrained tool allowlist. Research panel only gets research tools.
    • full -- complete tool access, computed iteration budget. Normal code mode.

    The capability detection piece is new: CapabilityProfile reads the model's known abilities (tool calling tier: native/xml/weak/none; structured output; reasoning) and can downgrade the lane automatically if the model can't support what was requested. No more sending tool calls to a model that'll ignore them.

    Dynamic failure tracking: if a session sees 3 consecutive tool failures, the lane downgrades automatically for the rest of the session. The model gets fewer chances to break things.

  3. Mar 24-26 -- Phase 9A through 9E

    Shipped the full memory system this week. Five phases in three days. This is the one I'm most proud of so far.

    The problem: every agentic loop iteration starts from scratch. The model has no memory of which files you've been editing together, what patterns you prefer, what errors you've hit before. Every session is day one.

    Phase 9 changes that. Here's what we built:

    • 9A -- HeuristicExtractor wired into the post-loop processor. After every agentic iteration, it extracts facts from what the agent observed and stores them in SQLite. Compression ratio confirmed at 5x+ on real sessions.
    • 9B -- FileAffinityTracker (tracks which files you co-edit, how often, how recently) + ImportGraphExtractor (static import graph for JS/TS/Python/Rust/Go). The context assembler uses both to inject the right files into the next session without you having to specify them.
    • 9C -- LLMObserver -- a second-pass LLM call that extracts implicit facts from assistant turns. Things the heuristic extractor misses. Runs async post-loop on hardware that can afford it, falls back to heuristic-only on low VRAM.
    • 9D -- Memory time decay. Observations have configurable half-lives by type. Stale memories fade instead of polluting context forever. BM25 relevance scoring added alongside recency decay.
    • 9E -- Evaluation harness. 25 scenarios covering injection, retrieval, dedup, decay, and cross-session recall. Memory metrics API exposed for debugging.

    Total: 8 new service files, 2 new tools (CreateDocument, DeepResearch), memory pipeline fully wired end to end. This is what makes Bodega feel like it knows you over time.

  4. Mar 23 -- 30 bugs, one session

    Ran what we're calling Operation Fumigate last Sunday. The goal: clear every known bug before the next beta tag. Final count: 30 bugs fixed in one session.

    It was deliberately parallel. Stood up 4 squads, each with a defined scope and a dedicated branch. No overlap, no conflicts.

    • Squad 1 hit the code editor and FIM (fill-in-middle): 9 bugs. Monaco diff decoration bugs, inline fix streaming edge cases, FIM fence stripping failures.
    • Squad 2 took terminal and the Problems panel: 7 bugs. Terminal duplicate input handlers, xterm focus tracking using the wrong event, OSC 133 command block edge cases.
    • Squad 3 handled streaming and session infrastructure: 8 bugs. Double SSE events, streaming interrupted on panel navigation, session data leaks, permission mode enforcement in chat mode.
    • Squad 4 closed out settings, memory, and project management: 6 bugs. Settings not persisting across restarts, memory rate limit bypasses, orphaned settings keys.

    All 4 squads merged to dev by end of day. Doc sweep ran afterward -- all counts, changelogs, and references updated to match. Tagged beta.6 that same evening.

    The thing that made this work: clear blast radius per squad, no shared files, every fix against a real test case. 30 bugs with no regressions.

  5. Mar 17-18 -- Brain MCP + 13-agent team

    This is the part that doesn't look like normal solo indie dev.

    I've been building with an AI agent team. Not AI-assisted -- an actual team of 15 specialized agents coordinated through a shared memory system called Bodega Brain. Each agent has a defined role, its own identity file, and stays in its lane.

    The roster: Co-Dev (lead), Architect (structural health), Engineer (implementation), Fixer (bugs), Sentinel (security scanning), Scout (competitive intel), Strategist (product direction), QA Engineer, Doc Guardian, Performance Profiler, Integration Tester, Release Manager, Reviewer, UX Auditor, Writer.

    Each one runs on its own git branch. Co-Dev reviews their work, creates PRs, merges after CI passes. I have final say on anything touching main. It's a proper dev workflow, just with agents instead of contractors.

    The Brain is how they coordinate -- a shared system with messaging, task queues, workspace claiming, decision logging, and a live dashboard. When two agents might conflict on the same files, they claim workspaces and check for conflicts before starting.

    This session: 8 PRs reviewed, 5 merged to dev (LSP integration, unified model hub, god-file splits, security hardening, test coverage). The acceleration this enables is real. Phase 0-3 of the V2 overhaul shipped in 48 hours.

  6. QEL ships

    Spent the last few days hardening what I'm calling the QEL -- Quality Enforcement Layer. This was the biggest early architecture decision and it's worth explaining why it exists.

    Most AI coding assistants work like this: you ask a question, the model responds, done. There's no verification that what was produced actually matches what was asked. No check that the code compiles. No detection of stubs. The model hallucinates a solution and calls it a day.

    QEL changes that. Every agentic loop iteration runs three passes: contract extraction (what did the user actually ask for?), completion verification (did the response satisfy it?), and a mode firewall that prevents the wrong class of task from sneaking through. There's a test suite with a letter-grade output system -- the agent has to get an A or B before the response goes out.

    The architecture underneath is Express + SQLite for the backend, with a streaming pipeline that pushes Server-Sent Events to the frontend in real time. 15 defined SSE event types covering everything from tool calls to plan approvals to QEL verification results.

    The other decision I made early: no god files. I've worked on enough codebases that became unmaintainable from one class doing everything. Bodega has hard line limits: 700 lines for service files, 400 for React components. When something hits the limit, it splits. This decision has already paid off four times.

    Current state: QEL shipping, 630 tests passing, agentic loop running on Ollama and OpenAI-compatible providers.

  7. Initial commit day

    Started building Bodega One. Here's what it is and why I'm building it.

    It's a local-first AI desktop IDE. Two modes: Chat Mode for general AI conversation, Code Mode for agentic software development. Runs entirely on your machine. No cloud dependency unless you want one.

    I got tired of tools that route everything through someone else's servers. Not because I have something to hide -- because I don't want to depend on a company's uptime, rate limits, or pricing decisions to do my work. Your code, your hardware, your data.

    The tech stack: Electron 40 for the desktop shell, React 19 + TypeScript on the frontend, Express + SQLite on the backend. It supports Ollama out of the box, with OpenAI-compatible endpoints as a fallback for when you need a heavier model.

    The thing I kept noticing with other AI coding tools is that they're mostly fancy autocomplete with a chat window bolted on. What I wanted was something that could actually reason about what it's doing -- extract requirements from what you ask, verify its own output, and refuse to ship half-finished work. That's the Quality Enforcement Layer. More on that later.

    First commit dropped today. It's rough but it runs. The bones are there.

    Building this in public. Wins, bugs, architecture decisions -- all of it.

For polished release notes, see the Changelog · Join Discord

Follow the build.

Join the waitlist and get notified when the beta opens. First 200 users. Complete 14 days and we'll send you a $30 promo code before the full launch.