thinkn
  • Product
    Manifesto
    The reason we exist
    Founder Studioprivate beta
    Make better product decisions faster
    Belief SDKinvite only
    Add belief states to your AI system
    Request Access →Join the private beta waitlist
  • Docs
  • Pricing
  • FAQ
  • Docs
  • Pricing
  • FAQ
Sign In
Welcome
  • Start Here
  • Install
  • Quickstart
  • FAQ
  • Why beliefs
  • Architecture
  • How it works
  • Behavioral contracts
internals/contracts.mdx

Behavioral contracts

What the engine guarantees: the behaviors you can build against.

Eight guarantees about how the engine behaves under your code. Each is testable, audited, and load-bearing for the SDK above. A regression on any of these is release-blocking.

The implementation behind these guarantees evolves as the engine improves. The guarantees themselves are stable. The surface you build against doesn't shift underneath you.


1. Retries and no-ops are safe

Replaying the same (idempotencyKey, scope) on add, after, or observe produces the same state. Empty inputs (after(''), an empty BeliefDelta) leave state unchanged.

What this means for you: at-least-once delivery from queues, webhooks, or flaky networks doesn't double-count evidence. Defensive patterns ("call after() every turn even if nothing happened") are zero-cost and zero-risk.


2. Fusion is order-independent

Combining the same set of contributions in any order produces the same result.

What this means for you: retrying a failed after() after a peer's write doesn't change the outcome. Multi-agent pipelines have no hidden ordering bug class.


3. Trust knobs behave predictably

Lowering an agent's or source's trust attenuates its contributions proportionally without affecting any other agent or source. Locked overrides stay where you set them; the engine's learning never drifts them.

What this means for you: beliefs.trust.set({ kind: 'agent', id: 'unreliable-scout' }, { confidence: 0.1, strength: 50 }) reduces that scout's pull at fusion time without surprising side effects elsewhere.


4. Older evidence carries less weight

Evidence is downweighted by a freshness factor as time passes, scaled to the workspace's configured decay rate. Stale claims surface for re-verification rather than silently dominating fresh ones.

What this means for you: the system creates pressure to refresh. Old analyses lose their grip without being deleted, and new evidence wins on equal footing.


5. Confidence labels are calibrated

When the SDK reports confidence: 'high', those events resolve true at roughly the rate the label implies. Calibration is enforced in CI; regressions don't ship.

What this means for you: the labels are honest. You can route on 'high' / 'medium' / 'low' without building your own calibration layer on top.


6. Supersession is a clean cut

When belief B explicitly supersedes belief A, A leaves the active candidate set. read() and list() no longer return A; trace() still surfaces it for audit.

What this means for you: an agent updating its position on a claim doesn't leave the prior position competing for attention. Audit history is preserved separately from current state.


7. Belief shapes don't contaminate each other

Beliefs of different shapes (binary, categorical, numeric) compose safely. Adding a categorical claim doesn't perturb a binary one.

What this means for you: multi-modal world models are safe. Your numeric measurements aren't at risk from a new yes/no claim landing in the same workspace.


8. Confidence and evidence count are tracked separately

A claim at 70% with 100 supporting observations is a different signal from a claim at 70% with 2 observations. The SDK exposes both. See Clarity for the two-channel model.

What this means for you: you can distinguish "we haven't investigated yet" from "we've investigated extensively and the answer is genuinely close." They demand opposite next actions.


What's deliberately not promised

  • Extraction model choice. The model behind after() and observe() may change between releases. Only the shape of the resulting BeliefDelta is contracted.
  • Absolute confidence numbers across version bumps. Calibration shifts when models swap; the calibration quality is bounded, not the exact numbers.
  • Cost or token usage. Telemetry is intentionally not part of the public SDK contract.
  • Implementation details. How fusion combines contributions, how decay scales evidence, how confidence is computed: these evolve as the engine improves. Build against the guarantees, not the implementation.

If you find a case where SDK behavior appears to violate one of these contracts, file it as a P0.

PreviousHow it works

On this page

  • 1. Retries and no-ops are safe
  • 2. Fusion is order-independent
  • 3. Trust knobs behave predictably
  • 4. Older evidence carries less weight
  • 5. Confidence labels are calibrated
  • 6. Supersession is a clean cut
  • 7. Belief shapes don't contaminate each other
  • 8. Confidence and evidence count are tracked separately
  • What's deliberately not promised