Eight guarantees about how the engine behaves under your code. Each is testable, audited, and load-bearing for the SDK above. A regression on any of these is release-blocking.
The implementation behind these guarantees evolves as the engine improves. The guarantees themselves are stable. The surface you build against doesn't shift underneath you.
1. Retries and no-ops are safe
Replaying the same (idempotencyKey, scope) on add, after, or observe produces the same state. Empty inputs (after(''), an empty BeliefDelta) leave state unchanged.
What this means for you: at-least-once delivery from queues, webhooks, or flaky networks doesn't double-count evidence. Defensive patterns ("call after() every turn even if nothing happened") are zero-cost and zero-risk.
2. Fusion is order-independent
Combining the same set of contributions in any order produces the same result.
What this means for you: retrying a failed after() after a peer's write doesn't change the outcome. Multi-agent pipelines have no hidden ordering bug class.
3. Trust knobs behave predictably
Lowering an agent's or source's trust attenuates its contributions proportionally without affecting any other agent or source. Locked overrides stay where you set them; the engine's learning never drifts them.
What this means for you: beliefs.trust.set({ kind: 'agent', id: 'unreliable-scout' }, { confidence: 0.1, strength: 50 }) reduces that scout's pull at fusion time without surprising side effects elsewhere.
4. Older evidence carries less weight
Evidence is downweighted by a freshness factor as time passes, scaled to the workspace's configured decay rate. Stale claims surface for re-verification rather than silently dominating fresh ones.
What this means for you: the system creates pressure to refresh. Old analyses lose their grip without being deleted, and new evidence wins on equal footing.
5. Confidence labels are calibrated
When the SDK reports confidence: 'high', those events resolve true at roughly the rate the label implies. Calibration is enforced in CI; regressions don't ship.
What this means for you: the labels are honest. You can route on 'high' / 'medium' / 'low' without building your own calibration layer on top.
6. Supersession is a clean cut
When belief B explicitly supersedes belief A, A leaves the active candidate set. read() and list() no longer return A; trace() still surfaces it for audit.
What this means for you: an agent updating its position on a claim doesn't leave the prior position competing for attention. Audit history is preserved separately from current state.
7. Belief shapes don't contaminate each other
Beliefs of different shapes (binary, categorical, numeric) compose safely. Adding a categorical claim doesn't perturb a binary one.
What this means for you: multi-modal world models are safe. Your numeric measurements aren't at risk from a new yes/no claim landing in the same workspace.
8. Confidence and evidence count are tracked separately
A claim at 70% with 100 supporting observations is a different signal from a claim at 70% with 2 observations. The SDK exposes both. See Clarity for the two-channel model.
What this means for you: you can distinguish "we haven't investigated yet" from "we've investigated extensively and the answer is genuinely close." They demand opposite next actions.
What's deliberately not promised
- Extraction model choice. The model behind
after()andobserve()may change between releases. Only the shape of the resultingBeliefDeltais contracted. - Absolute confidence numbers across version bumps. Calibration shifts when models swap; the calibration quality is bounded, not the exact numbers.
- Cost or token usage. Telemetry is intentionally not part of the public SDK contract.
- Implementation details. How fusion combines contributions, how decay scales evidence, how confidence is computed: these evolve as the engine improves. Build against the guarantees, not the implementation.
If you find a case where SDK behavior appears to violate one of these contracts, file it as a P0.