System Overview
contextdb is a layered system with pluggable storage backends, an auto-embedding pipeline, conflict detection, background workers, and parallel retrieval paths.
Component diagram
Layer responsibilities
Client layer (pkg/client)
DB: connection handle, analogous tosql.DBNamespaceHandle: scoped read/write operations- Four modes: embedded (in-process), standard (Postgres), remote (gRPC), scaled (Qdrant + Redis)
Server layer (internal/server)
- gRPC server on
:7700with JSON codec (no protobuf codegen required) - REST server on
:7701with Go 1.22+ routing patterns - Multi-tenant isolation via
X-Tenant-IDheader or Bearer token prefix - RBAC middleware with
tenant:permissions:secrettoken format - Observe server on
:7702with Prometheus metrics, pprof, health check, and admin UI
Embedding (internal/embedding)
- Auto-embeds text when
Options.Embedderis configured - Providers: OpenAI-compatible, local HTTP sidecar
- LRU cache with SHA256 text hashing avoids redundant API calls
Write path (internal/ingest)
- Source resolution and credibility lookup
- Admission gate: credibility floor, near-duplicate detection, novelty threshold
- Conflict detection: identifies contradictions, creates
contradictsedges - Credibility learning: Bayesian updates adjust source trust over time
- Graph upsert + vector indexing + event logging
Read path (internal/retrieval)
- Concurrent fan-out: vector search + graph traversal + session context
- Fusion: deduplicate and merge results from all paths
- Scoring: composite score with caller-supplied weights
- Reranking: optional LLM cross-encoder reranking after fusion
- Label filtering: push-down filter on node labels
Background workers (internal/compact)
- RAPTOR compaction: hierarchical summarisation
- Memory consolidation: episodic → semantic promotion via LLM
- Active recall: spaced-repetition utility boosting
Snapshot/restore (internal/snapshot)
- NDJSON export and import per namespace
- Supports full namespace dump or seed-based BFS subgraph export
Store interfaces (internal/store)
GraphStore: node/edge CRUD, versioning, walkVectorIndex: ANN search, index, deleteKVStore: key-value with TTL (caching, sessions)EventLog: append-only temporal event stream
Backends
- Memory: in-process maps and slices, zero dependencies
- BadgerDB + HNSW: embedded persistent storage, single binary
- Postgres + pgvector: production-grade with recursive CTE graph traversal
- Qdrant: dedicated vector index for scaled mode
- Redis: KV store and event log for scaled mode
Project layout
contextdb/
├── cmd/contextdb/ # server entrypoint
├── internal/
│ ├── core/ # domain types: Node, Edge, Source, ScoreParams
│ ├── store/ # store interfaces
│ │ ├── memory/ # in-process backend
│ │ ├── badger/ # BadgerDB + HNSW backend
│ │ ├── postgres/ # Postgres + pgvector backend
│ │ ├── qdrant/ # Qdrant vector backend
│ │ ├── redis/ # Redis KV + EventLog backend
│ │ └── remote/ # gRPC remote store client
│ ├── embedding/ # auto-embedding pipeline
│ ├── extract/ # LLM entity/relation extraction
│ ├── ingest/ # write path: admission, conflict detection, credibility
│ ├── compact/ # RAPTOR compaction, consolidation, active recall
│ ├── retrieval/ # read path: fusion, scoring, reranking
│ ├── server/ # gRPC + REST + RBAC + auth
│ ├── admin/ # admin dashboard UI
│ ├── snapshot/ # NDJSON export/import
│ ├── namespace/ # mode presets and config
│ └── observe/ # metrics, pprof, health
├── pkg/client/ # Go SDK
├── sdk/
│ ├── python/ # Python SDK
│ └── typescript/ # TypeScript SDK
├── bench/ # benchmarks and evaluation
│ ├── longmemeval/ # LongMemEval benchmark harness
│ ├── mteb/ # MTEB retrieval quality
│ └── adversarial/ # poisoning and temporal consistency
└── deploy/helm/contextdb/ # Helm chart for Kubernetes