Skip to main content

Intelligence & verification

Memory & Knowledge

Bodega One keeps two persistent context stores across sessions: Agent Memory (key-value facts the agent saves and recalls automatically) and the Knowledge Base (web pages and text the agent searches on demand). A third, separate system - Codebase Embeddings - indexes your source files for code-aware context and has its own configuration path under Settings → Models.

How Agent Memory works

The agent can save facts as named key-value pairs and recall them in future sessions. Memory is stored in SQLite and reloaded on startup, so it survives app restarts.

Three scopes:

  • shared - available across all sessions (the default)
  • long_term - tied to a specific agent or session
  • project - tied to a specific folder path

On every turn, the agent searches memory using a fallback chain:

  1. Embedding cosine similarity (if rag.embedding_enabled is on)
  2. Keyword extraction → FTS5 BM25 ranking
  3. Full cleaned message → FTS5 / LIKE substring search
  4. Self-referential fallback for phrases like "what do you remember about me" (searches all scopes for keys prefixed with user)

Up to 10 relevant entries are injected into the agent's dynamic context message before each LLM call. The total memory budget is capped at 10% of the active model's context window.

File affinity: keys matching heuristic:file_affinity:* are separated out and injected as a FREQUENTLY ACCESSED FILES block, capped at 20% of the memory budget.

Stale entries: each memory entry has a decayed confidence score. When it drops below 0.3, the injected text includes [stale - may be outdated]. This happens silently - there is no UI indicator, but the agent knows to treat that entry cautiously.

Using memory conversationally

Memory is fully automatic - you don't need to manage it. The agent saves facts it observes during conversation and recalls them on future turns.

To save something explicitly:

  • "Remember that my preferred language is Python."
  • "Save this for later: I'm targeting Node 22."

To recall what the agent knows:

  • "What do you remember about me?"
  • "Tell me everything you know about me."
  • "Do you know my preferred editor?"

To remove a memory:

  • "Forget that."
  • "Forget my preferred language."

Note: the agent cannot delete memories via a tool call. To delete entries, use Settings → Memory → Browse Entries and click the X next to the entry.

Rate limit: the agent is limited to 5 memory writes per session (3 during the agentic loop, 2 during post-processing). Overwriting an existing key by the same name does not count against this limit.

Managing memory in Settings

Go to Settings → Memory.

  • Persistence - toggle Persist Memory to control whether long-term memories survive app restarts. When off, all memory is lost when the app closes.
  • Capacity - set the maximum number of short-term (in-session) and long-term (cross-session) entries the agent holds. When the database reaches capacity, the oldest 5% of entries (by last-update time) are evicted automatically.
  • Add Memory - type a key and value to store a memory directly, without the agent's involvement.
  • Browse Entries - search and delete individual entries. The list updates immediately when you delete an entry.

Click Save Memory Settings after changing the persistence toggle or capacity sliders.

Memory capacity and eviction

Setting Key Default
Master on/off memory.persistence_enabled on
Short-term capacity (in-session, RAM only) memory.max_short_term 1000
Long-term capacity (cross-session, SQLite) memory.max_long_term 10000

Short-term memory is never written to disk. Long-term memory persists across restarts when memory.persistence_enabled is on.

When capacity is reached, the oldest 5% of entries by updated_at are evicted. Stale entries (decayed confidence below 0.3) are still recalled but flagged in the agent's context - they are not auto-deleted.

How the Knowledge Base works

The Knowledge Base stores reference content - web pages or free-form text - that the agent searches when answering questions. Unlike memory, it is not injected automatically. The agent searches it on demand via the query_knowledge tool when the conversation warrants it.

URL entries: the server fetches the page, strips HTML and script tags, and keeps up to 100,000 characters of text.

Text entries: stored as-is.

Chunking: long content is split into 1,500-character segments with 200-character overlaps at sentence boundaries. Chunks under 50 characters are dropped. Each chunk gets its own row in the database (baseurl#chunk-1, baseurl#chunk-2, etc.). The Settings UI hides chunk rows - only the base card is shown.

Search: the agent uses a hybrid of cosine similarity (if embeddings are enabled) and keyword term-overlap scoring, blended by rag.hybrid_search_weight (default 0.5, where 0 = keyword-only and 1 = embedding-only). FTS5 with Porter stemming is the text fallback. Results are capped at 20 per call.

Deletion: deleting a card removes the base entry and all its chunk siblings.

Adding content to the Knowledge Base

Three entry points:

1. Settings → Knowledge → Add Knowledge

  1. Choose URL or Text mode.
  2. Paste the URL or text content.
  3. Optionally add a title and comma-separated tags.
  4. Press Add or use Ctrl+Enter.

2. Chat mode - URL quick-add

  1. Click the + button in the chat input.
  2. Select Add knowledge from URL.
  3. Paste the URL and confirm. (Text cards are not available from this modal.)

3. Code mode - pin a Research response In the Research panel, click the pin button ("Pin last response to knowledge base") to save the agent's last response as a text card.

To delete a card: go to Settings → Knowledge, find the entry, and click the X button.

To filter by tag: click any tag in the Settings → Knowledge list. Tags are comma-separated and clickable.

Important limits:

  • The Settings → Knowledge UI loads a maximum of 200 entries and filters client-side. Large knowledge bases may not be fully browsable without the search box.
  • URL fetching is blocked entirely when general.air_gap is enabled.
  • Private IP ranges (127.x, 10.x, 192.168.x, 172.16–31.x, 169.254.x) and non-HTTP protocols are blocked by SSRF protection regardless of air-gap setting.

Prompting the agent to use the Knowledge Base

The agent decides when to call query_knowledge based on context - it does not search the Knowledge Base on every turn. If you want it to check what you've saved, say it explicitly:

  • "Search your knowledge base for [topic]."
  • "What do you know about [topic] from your knowledge base?"

There is no public search API for the Knowledge Base - it is agent-only. The search box in Settings → Knowledge is a client-side filter over the loaded list, not the same hybrid search the agent runs.

Semantic search for Memory and Knowledge (RAG embeddings)

When rag.embedding_enabled is on, both Agent Memory and the Knowledge Base use vector embeddings for semantic similarity search. This lets the agent find relevant entries even when the exact words don't match.

How it works:

  • Embeddings are generated by EmbeddingService, which routes to Ollama (POST /api/embed) or OpenAI (/embeddings) based on your llm.provider setting - not embeddings.provider.
  • The embedding model is set by rag.embedding_model (default: nomic-embed-text for Ollama, text-embedding-ada-002 for OpenAI).
  • Vectors are stored as Float32Array blobs in SQLite. Cosine similarity is computed in-memory at search time.
  • EmbeddingService caches up to 500 vectors in memory (FIFO eviction) to avoid redundant API calls.

This system is entirely separate from Codebase Embeddings. They do not share settings, providers, models, or storage. If rag.embedding_enabled is off, the system falls back to FTS5 keyword search and LIKE substring matching.

Enabling RAG embeddings

  1. Go to Settings → Memory → Knowledge & RAG.
  2. Toggle Enable embeddings for memory/knowledge search.
  3. Select an embedding model.
    • For Ollama: pull the model first - e.g., run ollama pull nomic-embed-text in a terminal.
    • For OpenAI-compatible providers: an API key is required.
  4. Once enabled, semantic search runs automatically for both memory recall and knowledge base queries.

Relevant settings:

Setting Key Notes
Embeddings on/off rag.embedding_enabled Master switch
Embedding model rag.embedding_model e.g. nomic-embed-text
Search blend weight rag.hybrid_search_weight 0 = keyword-only, 1 = embedding-only, default 0.5
Backend routing llm.provider Determines Ollama vs OpenAI - NOT embeddings.provider

Codebase Embeddings - source file index

A separate embedding system that indexes your project's source files so the agent can inject relevant code snippets into context when you ask question-intent messages in Code mode.

Indexed file types: TypeScript, JavaScript, Python, Go, Rust, Java, C, and Markdown.

How it works:

  • The index is stored per-project and built incrementally - only modified files are re-indexed on subsequent builds.
  • EmbeddingProvider (backed by OllamaEmbeddingProvider or OpenAIEmbeddingProvider) handles API calls with a 60-second cold-start timeout on the first request and a 10-second timeout for subsequent requests.
  • Batch size is 16 files per request.
  • Model mismatch matters: nomic-embed-text produces 768-dimensional vectors, mxbai-embed-large produces 1024, all-minilm produces 384, text-embedding-3-small produces 1536. Vectors built with one model cannot be searched with another.

This system has no relation to rag.embedding_enabled or rag.embedding_model - it uses the embeddings.* settings namespace, separate providers, and separate storage tables.

Configuring Codebase Embeddings

Go to Settings → Models and scroll to the Codebase Embeddings section at the bottom.

  1. Select a provider: Ollama, llama.cpp, OpenAI, or Off.
  2. Select or type an embedding model.
    • Ollama default: qwen3-embedding:4b (pulled automatically on first use).
    • Common alternatives: nomic-embed-text, mxbai-embed-large, all-minilm.
    • OpenAI: text-embedding-3-small, text-embedding-3-large, text-embedding-ada-002.
  3. Optionally enable Auto-index - the index builds in the background 60 seconds after a project opens.
  4. Click Build Index to index immediately.

Model mismatch warning: if the model shown in the "built with" badge differs from your current model setting, an amber banner appears with a Rebuild now button. Rebuild before using semantic context - mismatched vector dimensions produce garbage results.

Air-gap restriction: under general.air_gap, the Ollama URL must resolve to loopback. Cloud OpenAI codebase embeddings are blocked entirely.

Setting Key
Provider embeddings.provider
Model embeddings.model
Auto-index on project open embeddings.auto_index
Ollama base URL embeddings.ollama_url
OpenAI API key embeddings.openai_api_key

Two embedding systems - do not confuse them

Bodega One has two completely separate embedding systems:

Memory & Knowledge RAG Codebase Embeddings
Purpose Semantic search for agent memory and knowledge base Semantic search over your source files
Settings namespace rag.* embeddings.*
Backend routing llm.provider (Ollama or OpenAI) embeddings.provider
Storage memory_store / knowledge_store SQLite tables Separate index table per project
Toggle rag.embedding_enabled embeddings.provider = off
Config location Settings → Memory → Knowledge & RAG Settings → Models → Codebase Embeddings

Changing one system's model or provider has no effect on the other. They do not share configuration, API calls, or stored vectors.

Keyboard shortcuts

KeysAction
Ctrl+EnterSubmit the Add Knowledge form (Settings → Knowledge)

This page mirrors the in-app docs hub for app version 1.0.0-beta.26.1. Found something unclear or out of date? Tell us on Discord. New here? Download the free beta and follow along.