Skip to content

Source Credibility

contextdb tracks the trustworthiness of information sources and uses it to gate admission and weight retrieval.

How it works

docs-crawlerBeta(1, 1)
Credibility
50.0%
Beta(1, 1) distribution — P(credibility = x)
0↑ threshold0.51
ADMITTEDcredibility > 0.05
Event log
Click Validate or Refute to start

Sources

Every piece of data has a source. Sources are automatically created on first write:

go
// First write from "user:bob" creates a source with credibility 0.5
ns.Write(ctx, client.WriteRequest{
    Content:  "Go is fast",
    SourceID: "user:bob",
    Vector:   embedding,
})

Source fields:

FieldDescription
ExternalIDYour identifier (Discord ID, agent name, URL)
CredibilityScoreBase credibility [0, 1], starts at 0.5
LabelsOverride labels: "moderator", "admin", "troll", "flagged"
ClaimsAssertedTotal claims from this source
ClaimsValidatedClaims later confirmed
ClaimsRefutedClaims later contradicted

Label overrides

Labels override the numeric score entirely:

go
// Full trust: credibility always 1.0
ns.LabelSource(ctx, "moderator:alice", []string{"moderator"})

// Blocked: credibility always 0.05, all writes rejected
ns.LabelSource(ctx, "user:spammer", []string{"troll"})
LabelEffective Credibility
moderator1.0
admin1.0
flagged0.05
troll0.05

The admission gate

Three rules run in order on every write:

Rule 1: Credibility floor

Sources with effective credibility <= 0.05 are always rejected. This stops troll floods at the gate.

Rule 2: Near-duplicate detection

If an existing node has cosine similarity >= 0.95 to the candidate, the write is rejected as a duplicate.

Rule 3: Novelty threshold

The combined score credibility * novelty must exceed the namespace's admission threshold:

NamespaceThresholdEffect
belief_system0.15Low bar. Credibility gates retrieval instead
general0.25Balanced
agent_memory0.35Stricter. Avoids low-value episodes
procedural0.40Only well-established procedures admitted

Bayesian credibility learning

Sources use a Beta distribution to model credibility:

  • Alpha = 1 + validated claims (evidence for trustworthiness)
  • Beta = 1 + refuted claims (evidence against)
  • Credibility = Alpha / (Alpha + Beta)

New sources start at Beta(1,1) — a uniform prior meaning "we know nothing." Each validation or refutation shifts the distribution. The more observations, the more confident the estimate.

This is mathematically principled: 1000 validated claims from a source that then gets one wrong doesn't crash its credibility to zero. The Beta distribution naturally handles this — it becomes a small dip in a well-established track record.

How this compares

Most systems use static trust scores (0-100 set by an admin) or binary allow/deny lists. contextdb's Bayesian approach means credibility is learned from evidence — no manual tuning required. A new source starts neutral and earns or loses trust based on how its claims hold up over time.

Domain-scoped credibility

A source can be credible in one domain and unreliable in another. standup_notes is highly credible for project status but less so for technical correctness.

go
// Source credibility varies by domain
cred := source.DomainCredibility("project-status")  // 0.92
cred = source.DomainCredibility("security")          // 0.45
cred = source.DomainCredibility("")                   // 0.68 (global fallback)

Domain credibility is tracked per (source, domain) pair. When no domain-specific data exists, the global credibility is used as a fallback.

Confidence propagation

When a write is admitted, the node's confidence is:

node.confidence = source_credibility * confidence_multiplier

This means a moderator's claims (credibility 1.0) carry full confidence, while unknown sources (credibility 0.5) are automatically discounted.

Released under the MIT License.