ADR-0012: Local-First Offline RAG Pipeline¶
Status: Accepted Date: 2026-02-23 Decision Makers: Brandon Fox
Context¶
The vindicta-platform requires a Retrieval-Augmented Generation (RAG) subsystem (the vindicta-oracle) capable of storing and searching the complex, hierarchical rulesets of Warhammer 40k.
A traditional GenAI application would typically deploy an external cloud-hosted Vector Database (Pinecone, Weaviate) and rely on commercial embedding models (OpenAI text-embedding-3-large). However, the Vindicta Platform architecture enforces a strict Local-First MVP Constraint for all underlying primitives (as defined in the Zero-Order Axioms surrounding deterministic play and cost efficiency).
Decision¶
The RAG pipeline MUST be implemented using purely local, embedded, and "cost-free" infrastructure during the platform's MVP and Gen-Zero phases.
- Storage: The pipeline will utilize
ChromaDBrunning inPersistentClientmode (embedded directly into the local filesystem) backed by SQLite for metadata persistence. - Embeddings: The pipeline will exclusively use the
ollamaunified Python client calling locally hosted embedding models (e.g.,nomic-embed-textorllama3.2).
Consequences¶
Positive¶
- Zero Operational Cost: The RAG subsystem incurs zero API charges for computing embeddings or storing vectors, aligning with the
Economy-Engine's stringent cost limits. - Offline Capability: Developers and agents can run full test suites, integration layers, and queries entirely offline, removing network latency bottlenecks during rapid iterations.
- Privacy and Sovereignty: Proprietary wargaming lists, transcripts, and potential future private strategies are kept strictly local, upholding data sovereignty for players.
Negative¶
- Resource Constraint: Local embedding generation and ChromaDB lookups consume significant local RAM/CPU, meaning development machines (and eventually production agents) require sufficient hardware to host the models.
- Scaling Complexity: An embedded local SQLite/Chroma setup is not horizontally scalable. As the platform moves beyond the MVP into concurrent multi-tenant usage, an externalized service will be required.
Neutral¶
- Dependency Injection Protocols (e.g.,
EmbeddingProviderandVectorStore) MUST be strictly utilized to ensure that the core logic can seamlessly swap from the local Ollama/Chroma implementation to cloud equivalents without rewriting the subsystem.
Alternatives Considered¶
- OpenAI / Pinecone: Rejected due to high ongoing costs and violation of the local-first MVP mandate.
- PostgreSQL + pgvector: Rejected. While open-source, it requires a dedicated background database daemon running (or a heavy Docker compose setup), violating the simplistic install-and-run requirement for the MVP architecture compared to embedded ChromaDB.