Build a Research Agent

This tutorial teaches the model by building one thing end-to-end: a research agent that investigates a question, accumulates evidence, detects when sources disagree, and stops when it has enough to act.

You will not call an LLM during this tutorial. Every "agent output" is a literal string so the tutorial runs deterministically with no API key beyond the Beliefs key. At the end you'll see how to swap in Claude, GPT, or any other model.

You'll learn one concept per section. Each builds on the previous. Copy the code as you go — the final section assembles everything into one runnable file.

What you need

Node 18+
A Beliefs API key (request access if you don't have one yet)
~30 minutes

Numbers in this tutorial will vary

The engine uses LLM extraction, so the specific claims it pulls from a given input — and therefore the clarity score, change counts, and belief texts — are non-deterministic. The expected outputs below are illustrative; your runs will show the same shape (clarity rising, contradictions detected, gaps closing) but different exact values.

What you're building

By the end, you'll have a research agent that:

Takes a research question
Reads what it currently believes about the question
Investigates the highest-priority gap
Detects when new evidence contradicts what it already knew
Stops when its clarity score crosses a threshold
Reports what it found — and what it deliberately doesn't know

1┌────────────────────────────────────────────────────────────┐
2│  Turn 1   clarity 0.18  ████░░░░░░░░░░░░░░░░░░░░░░░░░░░░  │
3│  Turn 2   clarity 0.34  ██████████░░░░░░░░░░░░░░░░░░░░░░  │
4│  Turn 3   clarity 0.51  ███████████████░░░░░░░░░░░░░░░░░  │
5│  Turn 4   clarity 0.68  ████████████████████░░░░░░░░░░░░  │
6│  Turn 5   clarity 0.74  ██████████████████████░░░░░░░░░░  │  → STOP
7└────────────────────────────────────────────────────────────┘

The clarity score is what makes this different from a turn-counter or a token budget. The agent stops when it has learned enough — not when it has talked enough.

1. Setup

Install

1mkdir research-agent && cd research-agent
2npm init -y
3npm pkg set type=module
4npm install beliefs tsx typescript
5npm install -D @types/node

The npm pkg set type=module line marks the package as ESM so top-level await works in your script. Without it, you'd need to wrap each section in async function main() { ... } — a small annoyance, but worth setting up once.

Configure TypeScript

Create tsconfig.json:

1{
2  "compilerOptions": {
3    "target": "ES2022",
4    "module": "ES2022",
5    "moduleResolution": "Bundler",
6    "esModuleInterop": true,
7    "strict": true
8  }
9}

Set your key

1export BELIEFS_KEY=bel_live_xxx

First call

Create agent.ts:

1import Beliefs from 'beliefs'
2
3const beliefs = new Beliefs({
4  apiKey: process.env.BELIEFS_KEY!,
5  namespace: `research-${Date.now()}`,
6  writeScope: 'space',
7})
8
9const world = await beliefs.read()
10console.log('beliefs:', world.beliefs.length)
11console.log('clarity:', world.clarity)

Run it:

1npx tsx agent.ts

You should see something like:

1beliefs: 0
2clarity: 0.25

What just happened. You created a fresh belief state in a unique namespace. It has zero beliefs — clarity defaults to a low baseline because there's nothing to be clear about yet. Every call from here will operate against this same namespace.

Why a unique namespace

namespace: \research-$`` makes every tutorial run a fresh slate. In production, pick stable namespaces (one per project, customer, or session).

→ Concept: World — the read-only view of everything your agent currently believes.

2. Tell the agent what to research

A research agent needs a goal. Goals are first-class in beliefs — not just a string in your prompt.

Add this to agent.ts:

1await beliefs.add('Determine the size of the AI developer tools market', {
2  type: 'goal',
3})
4
5const world = await beliefs.read()
6console.log('goals:', world.goals)

Output:

1goals: [ 'Determine the size of the AI developer tools market' ]

What just happened. add() with type: 'goal' registers normative intent — what the agent is trying to do, distinct from claims (what the agent thinks is true). The clarity score now considers whether the goal has been resolved.

→ Concept: Intent — the normative layer: goals, decisions, constraints. What the agent wants, not what it knows.

3. Seed what you already know

Often you start an investigation with priors — things you've heard, suspect, or have weak evidence for. Beliefs lets you assert these explicitly with a confidence score.

1await beliefs.add('AI dev tools market is around $4B', {
2  confidence: 0.6,
3  type: 'assumption',
4})
5
6await beliefs.add('GitHub Copilot has the largest market share', {
7  confidence: 0.7,
8  type: 'assumption',
9})
10
11await beliefs.add('Missing breakdown by enterprise vs individual developers', {
12  type: 'gap',
13})
14
15const state = await beliefs.read()
16console.log('beliefs:', state.beliefs.length)
17console.log('gaps:   ', state.gaps.length)
18console.log('clarity:', state.clarity.toFixed(2))

Output (clarity will vary):

1beliefs: 2
2gaps:    1
3clarity: 0.3-0.4

What just happened. Three calls, three different shapes:

type: 'assumption' — a stated belief without supporting evidence yet. Confidence = 0.6 means "I lean this way, but haven't done the work."
type: 'gap' — something the agent has flagged as unknown. Gaps are first-class — they reduce clarity until filled.

Notice clarity went up — but only slightly. That's because stated confidence is not the same as evidence. The system tracks both:

Decision resolution — how confident you are in the answer (0.6 here)
Knowledge certainty — how much evidence backs the answer (zero here, you just stated it)

A claim stated at 0.95 with zero evidence is in a different epistemic category than a claim at 0.65 with 40 supporting observations. The clarity score reflects this.

→ Concept: Beliefs — what types exist, how confidence works, why the two-channel model matters.

4. The core loop

Now the loop that defines a beliefs-aware agent: read context, act, feed observation.

For this tutorial, "act" returns a literal string — what your agent might output if you'd run a real LLM. In Section 9 you'll swap it for a real model.

1async function fakeAgent(_systemPrompt: string, _userMessage: string): Promise<string> {
2  // Pretend an LLM ran. Return a realistic agent output.
3  return `Based on a Gartner 2024 report, the AI developer tools market
4is valued at $4.2B. The top three players (GitHub Copilot, Cursor, and
5Tabnine) account for approximately 65% of the market. Enterprise adoption
6is currently around 40% of total spend, with individual developers making
7up the remainder. The market is growing at roughly 25% year over year.`
8}
9
10const userMessage = 'Research the AI developer tools market'
11
12// 1. Read what the agent currently believes
13const context = await beliefs.before(userMessage)
14
15// 2. Run the agent
16const output = await fakeAgent(context.prompt, userMessage)
17
18// 3. Feed the observation
19const delta = await beliefs.after(output)
20
21console.log('changes:  ', delta.changes.length)
22console.log('clarity:  ', delta.clarity.toFixed(2))
23console.log('readiness:', delta.readiness)

Output (your numbers will differ — extraction is non-deterministic):

1changes:   ~5
2clarity:   ~0.5
3readiness: medium

What just happened. Three calls did real work:

before(message) — returned a BeliefContext with prompt (a serialized summary of the current belief state, ready to inject into a system prompt), plus the agent's beliefs, goals, gaps, clarity, and recommended next moves.
fakeAgent(...) — produced output. In production this is your LLM call; the system prompt is context.prompt.
after(output) — extracted beliefs from the agent's text, detected if any conflicted with what was already there, and updated the world state. delta.changes is the list of what changed; delta.readiness is a coarse 'low' | 'medium' | 'high' label derived from clarity.

Clarity jumped because the agent now has evidenced claims (Gartner cited as a source) instead of bare assumptions.

→ Concept: The loop — patterns for single-turn, multi-turn, streaming, and tool-aware agent loops.

5. Watching clarity rise

A research agent should run more than one turn. Let's loop until clarity is high enough.

1async function fakeAgentTurn2(_prompt: string, _focus: string): Promise<string> {
2  return `Looking deeper into enterprise adoption: among Fortune 500 companies,
372% have at least piloted an AI coding assistant, but only 31% have rolled it
4out company-wide. The biggest blockers cited are security review (mentioned by
558% of CIOs surveyed), licensing complexity (44%), and uncertainty about ROI
6(37%). Adoption is highest in technology and financial services, lowest in
7healthcare and government.`
8}
9
10async function fakeAgentTurn3(_prompt: string, _focus: string): Promise<string> {
11  return `On individual developer adoption: of approximately 28 million
12professional developers worldwide, 9.2 million have used an AI coding
13assistant at least monthly in 2024 — about 33% penetration. Among
14those, 4.1 million pay personally for a tool (the rest use free tiers
15or employer-provided licenses). Average individual spend is ~$15/month
16across paid users.`
17}
18
19const turns = [fakeAgentTurn2, fakeAgentTurn3]
20
21for (let i = 0; i < turns.length; i++) {
22  const ctx = await beliefs.before(userMessage)
23
24  // If clarity is already high, stop early
25  if (ctx.clarity > 0.7) {
26    console.log(`\nclarity ${ctx.clarity.toFixed(2)} — stopping`)
27    break
28  }
29
30  // Use the highest-value move as the focus for this turn
31  const focus = ctx.moves[0]?.target ?? userMessage
32  const output = await turns[i](ctx.prompt, focus)
33  const d = await beliefs.after(output)
34
35  console.log(
36    `turn ${i + 2}: clarity ${d.clarity.toFixed(2)}, ` +
37    `+${d.changes.length} changes, readiness ${d.readiness}`,
38  )
39}

Output (your specific clarity values will differ — typically reaches 'high' readiness within a few turns):

1turn 2: clarity ~0.6, +N changes, readiness medium
2turn 3: clarity ~0.75, +M changes, readiness high

What just happened. The loop reads clarity and stops when it's high enough. Each turn:

Uses ctx.moves[0].target as the focus — the engine suggests the highest-value gap to investigate next.
Calls the "agent" with ctx.prompt — a serialized summary of current state, so the agent acts with awareness of what's already known.
Feeds the output back via after() — extraction, conflict detection, and clarity recompute happen automatically.

After a few turns, clarity typically crosses your 'high' threshold. The agent has enough to act.

→ Concept: Clarity — what the score actually measures, the two-channel model, and how to use it for routing decisions.

→ Concept: Moves — how the engine ranks next-best actions by expected information gain.

6. Contradictions

What happens if a tool returns evidence that disagrees with what the agent already believes?

1const conflictingTool = `Tool result from market_research_db:
2{
3  "source": "IDC Q4 2024 AI DevTools Tracker",
4  "finding": "Global AI developer tools market is $6.8B, not $4.2B as
5    earlier estimates suggested. The discrepancy is because earlier
6    figures excluded embedded AI features in mainstream IDEs (VS Code
7    Copilot, JetBrains AI Assistant). When those are included, the
8    market is 60% larger than Gartner's narrower scope.",
9  "methodology": "Bottom-up survey of 2,400 enterprises across 18 countries"
10}`
11
12const delta = await beliefs.after(conflictingTool, {
13  tool: 'market_research_db',
14  source: 'IDC Q4 2024 Tracker',
15})
16
17const world = await beliefs.read()
18console.log('contradictions:', world.contradictions.length)
19for (const c of world.contradictions) {
20  console.log(' -', c)
21}

Output (the contradiction summary string is engine-formatted and will vary):

1contradictions: 1
2 - <engine-formatted summary of the conflicting beliefs>

What just happened. The engine extracted a new belief from the tool result ("market is $6.8B") and recognized it directly conflicts with an existing belief ("market is around $4B"). Both are kept — nothing is silently overwritten. The contradiction surfaces in world.contradictions (a string[]) and reduces the clarity score until you resolve it.

The two beliefs aren't equally weighted, though. The new claim has:

A tool source (market_research_db) tagged via { tool, source }
A concrete methodology cited in the text

The original was type: 'assumption' with no evidence. When the system fuses them, the evidenced claim dominates — but the original is preserved in the trace so you can see how the agent's view shifted.

→ Concept: World — how world.contradictions and world.edges surface conflicts and supersedence.

7. Resolving — and following moves

Now use the engine's recommended next move to direct what to investigate.

1const ctx = await beliefs.before(userMessage)
2
3console.log('top 3 moves the engine suggests:')
4for (const m of ctx.moves.slice(0, 3)) {
5  console.log(` - [${m.action}] ${m.target}`)
6  console.log(`   reason: ${m.reason}`)
7}
8
9// Investigate the top move
10const topMove = ctx.moves[0]
11if (topMove) {
12  const investigation = `Resolving the market-size question: I cross-checked
13the IDC figure against Forrester and McKinsey reports. Forrester pegs the
14"AI-augmented dev tools" market at $7.1B for 2024 — closer to IDC than
15Gartner. The discrepancy is methodology: Gartner's $4.2B excludes embedded
16AI features in IDEs, while IDC and Forrester include them. The $6.8-7.1B
17range is the broader market; $4.2B is the narrow "AI-native" tools market.`
18
19  await beliefs.after(investigation, { source: 'Forrester + McKinsey cross-check' })
20}
21
22const final = await beliefs.read()
23console.log('\nfinal clarity:', final.clarity.toFixed(2))
24console.log('contradictions:', final.contradictions.length)

Output (move actions, targets, and reasons are engine-generated and will vary):

1top 3 moves the engine suggests:
2 - [<action>] <target>
3   reason: <engine's reasoning>
4 - ...
5
6final clarity: ~0.8
7contradictions: 0

Common action values: clarify, gather_evidence, resolve_uncertainty, compare_paths, validate. The target points at the belief or gap the move would act on; reason is the engine's plain-English explanation.

What just happened. ctx.moves is a ranked list of recommended next actions. The engine derived these from the current state — it knows which gaps are open, which beliefs are weakly evidenced, and which contradictions need clarifying. You don't have to plan the next step yourself; you can just route on moves[0].

After feeding the cross-check, the engine sees the methodology distinction, supersedes the old "around $4B" assumption, and the contradiction typically resolves. Clarity climbs.

→ Concept: Moves — Q-value ranking, executor types, and how to use moves for autonomous routing.

8. Trace — what changed and why

Every transition is recorded. Look at the audit trail.

1const entries = await beliefs.trace()
2
3console.log(`total transitions: ${entries.length}\n`)
4console.log('most recent 5:')
5for (const e of entries.slice(0, 5)) {
6  const conf = e.confidence
7    ? ` (${e.confidence.before?.toFixed(2) ?? '?'} → ${e.confidence.after?.toFixed(2) ?? '?'})`
8    : ''
9  console.log(` - ${e.action}${conf} | ${e.reason ?? '—'}`)
10}

Output (abbreviated; specific reasons and confidence shifts will vary):

1total transitions: <N>
2
3most recent 5:
4 - updated (0.X → 0.Y) | <engine reason>
5 - resolved | <engine reason>
6 - created | <engine reason>
7 - ...

Each TraceEntry carries action ('created' | 'updated' | 'removed' | 'resolved'), optional beliefId, optional confidence shift { before, after }, optional agent, optional source, timestamp, and optional reason.

What just happened. Every belief mutation — created, updated, removed, resolved — landed in the ledger with the reason and the confidence shift. You can replay the agent's reasoning at any point. In production this is what you show on a "why did the agent decide X?" debug page.

→ Concept: Ledger — what's recorded, replay semantics, and how to query the trail.

9. The complete agent

Here's everything assembled into one file. Save as agent.ts and run.

1import Beliefs from 'beliefs'
2
3// ─── Setup ─────────────────────────────────────────────────────────
4
5const beliefs = new Beliefs({
6  apiKey: process.env.BELIEFS_KEY!,
7  namespace: `research-${Date.now()}`,
8  writeScope: 'space',
9})
10
11const userMessage = 'Research the AI developer tools market'
12
13// ─── Stub agent ────────────────────────────────────────────────────
14// Each turn returns a realistic agent output. In production, replace
15// the body of `runAgent` with a call to your LLM (see Section 10).
16
17const turnOutputs = [
18  `Based on a Gartner 2024 report, the AI developer tools market is valued
19at $4.2B. The top three players (GitHub Copilot, Cursor, and Tabnine)
20account for approximately 65% of the market. Enterprise adoption is
21currently around 40% of total spend, with individual developers making up
22the remainder. Growing at roughly 25% YoY.`,
23
24  `Looking deeper into enterprise adoption: among Fortune 500 companies,
2572% have at least piloted an AI coding assistant, but only 31% have rolled
26it out company-wide. Biggest blockers: security review (58%), licensing
27complexity (44%), and uncertainty about ROI (37%). Highest in technology
28and financial services, lowest in healthcare and government.`,
29
30  `On individual developer adoption: of approximately 28 million professional
31developers worldwide, 9.2 million have used an AI coding assistant at least
32monthly in 2024 — about 33% penetration. Of those, 4.1 million pay personally
33for a tool. Average individual spend is ~$15/month across paid users.`,
34]
35
36const conflictingTool = `Tool result from market_research_db:
37{ "source": "IDC Q4 2024 AI DevTools Tracker",
38  "finding": "Global AI developer tools market is $6.8B, not $4.2B. Earlier
39    estimates excluded embedded AI features in mainstream IDEs.",
40  "methodology": "Bottom-up survey of 2,400 enterprises across 18 countries" }`
41
42const reconciliation = `Cross-checked IDC against Forrester and McKinsey.
43Forrester: $7.1B for 2024, closer to IDC. The discrepancy is methodology —
44Gartner's $4.2B excludes embedded IDE features; IDC and Forrester include
45them. The $6.8-7.1B range is the broader market; $4.2B is the narrow
46"AI-native" tools market.`
47
48async function runAgent(_systemPrompt: string, turn: number): Promise<string> {
49  return turnOutputs[turn] ?? ''
50}
51
52// ─── Goal + priors ─────────────────────────────────────────────────
53
54await beliefs.add(userMessage, { type: 'goal' })
55await beliefs.add('AI dev tools market is around $4B', {
56  confidence: 0.6,
57  type: 'assumption',
58})
59await beliefs.add('GitHub Copilot has the largest market share', {
60  confidence: 0.7,
61  type: 'assumption',
62})
63await beliefs.add('Missing breakdown by enterprise vs individual developers', {
64  type: 'gap',
65})
66
67// ─── Research loop ─────────────────────────────────────────────────
68
69const TARGET_CLARITY = 0.7
70const MAX_TURNS = 5
71
72for (let turn = 0; turn < MAX_TURNS; turn++) {
73  const ctx = await beliefs.before(userMessage)
74  console.log(
75    `turn ${turn + 1}: clarity ${ctx.clarity.toFixed(2)}, ` +
76    `${ctx.beliefs.length} beliefs, ${ctx.gaps.length} gaps`,
77  )
78
79  if (ctx.clarity >= TARGET_CLARITY) {
80    console.log(`  → clarity hit ${TARGET_CLARITY}, stopping`)
81    break
82  }
83
84  const output = await runAgent(ctx.prompt, turn)
85  if (!output) break
86  await beliefs.after(output)
87}
88
89// ─── Conflicting evidence + reconciliation ─────────────────────────
90
91console.log('\nfeeding conflicting tool result...')
92await beliefs.after(conflictingTool, {
93  tool: 'market_research_db',
94  source: 'IDC Q4 2024 Tracker',
95})
96
97const afterConflict = await beliefs.read()
98console.log(`  contradictions: ${afterConflict.contradictions.length}`)
99
100console.log('\ncross-checking and reconciling...')
101await beliefs.after(reconciliation, { source: 'Forrester + McKinsey' })
102
103// ─── Report ────────────────────────────────────────────────────────
104
105const final = await beliefs.read()
106
107console.log('\n── Final state ──')
108console.log(`clarity:        ${final.clarity.toFixed(2)}`)
109console.log(`beliefs:        ${final.beliefs.length}`)
110console.log(`gaps remaining: ${final.gaps.length}`)
111console.log(`contradictions: ${final.contradictions.length}`)
112
113console.log('\n── Top beliefs ──')
114const top = [...final.beliefs]
115  .sort((a, b) => b.confidence - a.confidence)
116  .slice(0, 5)
117for (const b of top) {
118  console.log(` [${b.confidence.toFixed(2)}] ${b.text}`)
119}
120
121console.log('\n── What we still don\'t know ──')
122for (const gap of final.gaps) console.log(` - ${gap}`)

Run it:

1npx tsx agent.ts

Expected output (specific numbers and extracted texts will vary across runs):

1turn 1: clarity ~0.3, 2 beliefs, 1 gaps
2turn 2: clarity ~0.5, ~6 beliefs, 1 gaps
3turn 3: clarity ~0.65, ~9 beliefs, 1 gaps
4turn 4: clarity ~0.75, ~12 beliefs, 0 gaps
5  → clarity hit 0.7, stopping
6  (or: loop exits when stub outputs are exhausted)
7
8feeding conflicting tool result...
9  contradictions: 1
10
11cross-checking and reconciling...
12
13── Final state ──
14clarity:        ~0.8
15beliefs:        ~14
16gaps remaining: 0
17contradictions: 0
18
19── Top beliefs ──
20 [~0.9] <engine-extracted claim about broader market>
21 [~0.85] <engine-extracted claim about adoption>
22 [~0.8] <engine-extracted claim about market share>
23 ...
24
25── What we still don't know ──
26(empty when the agent has filled its declared gaps)

That's a complete agent. It investigated, recognized when its assumptions were wrong, reconciled competing sources, and stopped when it had enough to act — all without you writing any tracking code.

10. Swap in a real LLM

Replace runAgent with a call to your model of choice. The rest of the file stays identical.

Anthropic

1import Anthropic from '@anthropic-ai/sdk'
2const client = new Anthropic()
3
4async function runAgent(systemPrompt: string, turn: number): Promise<string> {
5  const focus = turn === 0
6    ? userMessage
7    : `Investigate further: ${(await beliefs.before(userMessage)).moves[0]?.target ?? userMessage}`
8
9  const msg = await client.messages.create({
10    model: 'claude-sonnet-4-20250514',
11    max_tokens: 1024,
12    system: systemPrompt,
13    messages: [{ role: 'user', content: focus }],
14  })
15  return msg.content
16    .filter((b): b is { type: 'text'; text: string } => b.type === 'text')
17    .map((b) => b.text)
18    .join('')
19}

Vercel AI SDK

1import { generateText } from 'ai'
2import { anthropic } from '@ai-sdk/anthropic'
3
4async function runAgent(systemPrompt: string, _turn: number): Promise<string> {
5  const { text } = await generateText({
6    model: anthropic('claude-sonnet-4-20250514'),
7    system: systemPrompt,
8    prompt: userMessage,
9  })
10  return text
11}

OpenAI

1import OpenAI from 'openai'
2const openai = new OpenAI()
3
4async function runAgent(systemPrompt: string, _turn: number): Promise<string> {
5  const completion = await openai.chat.completions.create({
6    model: 'gpt-4o',
7    messages: [
8      { role: 'system', content: systemPrompt },
9      { role: 'user', content: userMessage },
10    ],
11  })
12  return completion.choices[0]?.message?.content ?? ''
13}

The belief layer is provider-agnostic. Anything that takes a system prompt and returns text plugs in here.

→ Reference: Hack Guide — full integration patterns for every major framework.

What you learned

You touched eight concepts in one build:

Concept	Where it appeared	Reference
World state	`beliefs.read()` returning the full picture	World
Goals	`add(text, { type: 'goal' })`	Intent
Beliefs + types	`add` with `type: 'assumption'`, `'gap'`	Beliefs
The loop	`before → act → after`	Loop Patterns
Clarity	Stopping condition	Clarity
Moves	Ranked next actions	Moves
Contradictions	Auto-detected via `after()`	World
Trace	Audit trail of every transition	Ledger

Where to go next

Multi-agent

Multiple agents contributing to one shared belief state. Trust-weighted fusion, conflict resolution.

Learn more

Streaming

SSE streams for live belief updates and per-extraction events.

Learn more

Patterns

Common integration patterns: tool-aware loops, clarity-driven routing, partial-extraction.

Learn more

You now have the model. The rest of the docs are reference for the parts you haven't needed yet.

You'll learn one concept per section. Each builds on the previous. Copy the code as you go — the final section assembles everything into one runnable file.

What you need

Node 18+
A Beliefs API key (request access if you don't have one yet)
~30 minutes

Numbers in this tutorial will vary

What you're building

By the end, you'll have a research agent that:

Takes a research question
Reads what it currently believes about the question
Investigates the highest-priority gap
Detects when new evidence contradicts what it already knew
Stops when its clarity score crosses a threshold
Reports what it found — and what it deliberately doesn't know

1┌────────────────────────────────────────────────────────────┐
2│  Turn 1   clarity 0.18  ████░░░░░░░░░░░░░░░░░░░░░░░░░░░░  │
3│  Turn 2   clarity 0.34  ██████████░░░░░░░░░░░░░░░░░░░░░░  │
4│  Turn 3   clarity 0.51  ███████████████░░░░░░░░░░░░░░░░░  │
5│  Turn 4   clarity 0.68  ████████████████████░░░░░░░░░░░░  │
6│  Turn 5   clarity 0.74  ██████████████████████░░░░░░░░░░  │  → STOP
7└────────────────────────────────────────────────────────────┘

The clarity score is what makes this different from a turn-counter or a token budget. The agent stops when it has learned enough — not when it has talked enough.

1. Setup

Install

1mkdir research-agent && cd research-agent
2npm init -y
3npm pkg set type=module
4npm install beliefs tsx typescript
5npm install -D @types/node

Configure TypeScript

Create tsconfig.json:

1{
2  "compilerOptions": {
3    "target": "ES2022",
4    "module": "ES2022",
5    "moduleResolution": "Bundler",
6    "esModuleInterop": true,
7    "strict": true
8  }
9}

Set your key

1export BELIEFS_KEY=bel_live_xxx

First call

Create agent.ts:

1import Beliefs from 'beliefs'
2
3const beliefs = new Beliefs({
4  apiKey: process.env.BELIEFS_KEY!,
5  namespace: `research-${Date.now()}`,
6  writeScope: 'space',
7})
8
9const world = await beliefs.read()
10console.log('beliefs:', world.beliefs.length)
11console.log('clarity:', world.clarity)

Run it:

1npx tsx agent.ts

You should see something like:

1beliefs: 0
2clarity: 0.25

Why a unique namespace

namespace: \research-$`` makes every tutorial run a fresh slate. In production, pick stable namespaces (one per project, customer, or session).

→ Concept: World — the read-only view of everything your agent currently believes.

2. Tell the agent what to research

A research agent needs a goal. Goals are first-class in beliefs — not just a string in your prompt.

Add this to agent.ts:

1await beliefs.add('Determine the size of the AI developer tools market', {
2  type: 'goal',
3})
4
5const world = await beliefs.read()
6console.log('goals:', world.goals)

Output:

1goals: [ 'Determine the size of the AI developer tools market' ]

→ Concept: Intent — the normative layer: goals, decisions, constraints. What the agent wants, not what it knows.

3. Seed what you already know

Often you start an investigation with priors — things you've heard, suspect, or have weak evidence for. Beliefs lets you assert these explicitly with a confidence score.

1await beliefs.add('AI dev tools market is around $4B', {
2  confidence: 0.6,
3  type: 'assumption',
4})
5
6await beliefs.add('GitHub Copilot has the largest market share', {
7  confidence: 0.7,
8  type: 'assumption',
9})
10
11await beliefs.add('Missing breakdown by enterprise vs individual developers', {
12  type: 'gap',
13})
14
15const state = await beliefs.read()
16console.log('beliefs:', state.beliefs.length)
17console.log('gaps:   ', state.gaps.length)
18console.log('clarity:', state.clarity.toFixed(2))

Output (clarity will vary):

1beliefs: 2
2gaps:    1
3clarity: 0.3-0.4

What just happened. Three calls, three different shapes:

type: 'assumption' — a stated belief without supporting evidence yet. Confidence = 0.6 means "I lean this way, but haven't done the work."
type: 'gap' — something the agent has flagged as unknown. Gaps are first-class — they reduce clarity until filled.

Notice clarity went up — but only slightly. That's because stated confidence is not the same as evidence. The system tracks both:

Decision resolution — how confident you are in the answer (0.6 here)
Knowledge certainty — how much evidence backs the answer (zero here, you just stated it)

A claim stated at 0.95 with zero evidence is in a different epistemic category than a claim at 0.65 with 40 supporting observations. The clarity score reflects this.

→ Concept: Beliefs — what types exist, how confidence works, why the two-channel model matters.

4. The core loop

Now the loop that defines a beliefs-aware agent: read context, act, feed observation.

For this tutorial, "act" returns a literal string — what your agent might output if you'd run a real LLM. In Section 9 you'll swap it for a real model.

1async function fakeAgent(_systemPrompt: string, _userMessage: string): Promise<string> {
2  // Pretend an LLM ran. Return a realistic agent output.
3  return `Based on a Gartner 2024 report, the AI developer tools market
4is valued at $4.2B. The top three players (GitHub Copilot, Cursor, and
5Tabnine) account for approximately 65% of the market. Enterprise adoption
6is currently around 40% of total spend, with individual developers making
7up the remainder. The market is growing at roughly 25% year over year.`
8}
9
10const userMessage = 'Research the AI developer tools market'
11
12// 1. Read what the agent currently believes
13const context = await beliefs.before(userMessage)
14
15// 2. Run the agent
16const output = await fakeAgent(context.prompt, userMessage)
17
18// 3. Feed the observation
19const delta = await beliefs.after(output)
20
21console.log('changes:  ', delta.changes.length)
22console.log('clarity:  ', delta.clarity.toFixed(2))
23console.log('readiness:', delta.readiness)

Output (your numbers will differ — extraction is non-deterministic):

1changes:   ~5
2clarity:   ~0.5
3readiness: medium

What just happened. Three calls did real work:

before(message) — returned a BeliefContext with prompt (a serialized summary of the current belief state, ready to inject into a system prompt), plus the agent's beliefs, goals, gaps, clarity, and recommended next moves.
fakeAgent(...) — produced output. In production this is your LLM call; the system prompt is context.prompt.
after(output) — extracted beliefs from the agent's text, detected if any conflicted with what was already there, and updated the world state. delta.changes is the list of what changed; delta.readiness is a coarse 'low' | 'medium' | 'high' label derived from clarity.

Clarity jumped because the agent now has evidenced claims (Gartner cited as a source) instead of bare assumptions.

→ Concept: The loop — patterns for single-turn, multi-turn, streaming, and tool-aware agent loops.

5. Watching clarity rise

A research agent should run more than one turn. Let's loop until clarity is high enough.

1async function fakeAgentTurn2(_prompt: string, _focus: string): Promise<string> {
2  return `Looking deeper into enterprise adoption: among Fortune 500 companies,
372% have at least piloted an AI coding assistant, but only 31% have rolled it
4out company-wide. The biggest blockers cited are security review (mentioned by
558% of CIOs surveyed), licensing complexity (44%), and uncertainty about ROI
6(37%). Adoption is highest in technology and financial services, lowest in
7healthcare and government.`
8}
9
10async function fakeAgentTurn3(_prompt: string, _focus: string): Promise<string> {
11  return `On individual developer adoption: of approximately 28 million
12professional developers worldwide, 9.2 million have used an AI coding
13assistant at least monthly in 2024 — about 33% penetration. Among
14those, 4.1 million pay personally for a tool (the rest use free tiers
15or employer-provided licenses). Average individual spend is ~$15/month
16across paid users.`
17}
18
19const turns = [fakeAgentTurn2, fakeAgentTurn3]
20
21for (let i = 0; i < turns.length; i++) {
22  const ctx = await beliefs.before(userMessage)
23
24  // If clarity is already high, stop early
25  if (ctx.clarity > 0.7) {
26    console.log(`\nclarity ${ctx.clarity.toFixed(2)} — stopping`)
27    break
28  }
29
30  // Use the highest-value move as the focus for this turn
31  const focus = ctx.moves[0]?.target ?? userMessage
32  const output = await turns[i](ctx.prompt, focus)
33  const d = await beliefs.after(output)
34
35  console.log(
36    `turn ${i + 2}: clarity ${d.clarity.toFixed(2)}, ` +
37    `+${d.changes.length} changes, readiness ${d.readiness}`,
38  )
39}

Output (your specific clarity values will differ — typically reaches 'high' readiness within a few turns):

1turn 2: clarity ~0.6, +N changes, readiness medium
2turn 3: clarity ~0.75, +M changes, readiness high

What just happened. The loop reads clarity and stops when it's high enough. Each turn:

Uses ctx.moves[0].target as the focus — the engine suggests the highest-value gap to investigate next.
Calls the "agent" with ctx.prompt — a serialized summary of current state, so the agent acts with awareness of what's already known.
Feeds the output back via after() — extraction, conflict detection, and clarity recompute happen automatically.

After a few turns, clarity typically crosses your 'high' threshold. The agent has enough to act.

→ Concept: Clarity — what the score actually measures, the two-channel model, and how to use it for routing decisions.

→ Concept: Moves — how the engine ranks next-best actions by expected information gain.

6. Contradictions

What happens if a tool returns evidence that disagrees with what the agent already believes?

1const conflictingTool = `Tool result from market_research_db:
2{
3  "source": "IDC Q4 2024 AI DevTools Tracker",
4  "finding": "Global AI developer tools market is $6.8B, not $4.2B as
5    earlier estimates suggested. The discrepancy is because earlier
6    figures excluded embedded AI features in mainstream IDEs (VS Code
7    Copilot, JetBrains AI Assistant). When those are included, the
8    market is 60% larger than Gartner's narrower scope.",
9  "methodology": "Bottom-up survey of 2,400 enterprises across 18 countries"
10}`
11
12const delta = await beliefs.after(conflictingTool, {
13  tool: 'market_research_db',
14  source: 'IDC Q4 2024 Tracker',
15})
16
17const world = await beliefs.read()
18console.log('contradictions:', world.contradictions.length)
19for (const c of world.contradictions) {
20  console.log(' -', c)
21}

Output (the contradiction summary string is engine-formatted and will vary):

1contradictions: 1
2 - <engine-formatted summary of the conflicting beliefs>

The two beliefs aren't equally weighted, though. The new claim has:

A tool source (market_research_db) tagged via { tool, source }
A concrete methodology cited in the text

→ Concept: World — how world.contradictions and world.edges surface conflicts and supersedence.

7. Resolving — and following moves

Now use the engine's recommended next move to direct what to investigate.

1const ctx = await beliefs.before(userMessage)
2
3console.log('top 3 moves the engine suggests:')
4for (const m of ctx.moves.slice(0, 3)) {
5  console.log(` - [${m.action}] ${m.target}`)
6  console.log(`   reason: ${m.reason}`)
7}
8
9// Investigate the top move
10const topMove = ctx.moves[0]
11if (topMove) {
12  const investigation = `Resolving the market-size question: I cross-checked
13the IDC figure against Forrester and McKinsey reports. Forrester pegs the
14"AI-augmented dev tools" market at $7.1B for 2024 — closer to IDC than
15Gartner. The discrepancy is methodology: Gartner's $4.2B excludes embedded
16AI features in IDEs, while IDC and Forrester include them. The $6.8-7.1B
17range is the broader market; $4.2B is the narrow "AI-native" tools market.`
18
19  await beliefs.after(investigation, { source: 'Forrester + McKinsey cross-check' })
20}
21
22const final = await beliefs.read()
23console.log('\nfinal clarity:', final.clarity.toFixed(2))
24console.log('contradictions:', final.contradictions.length)

Output (move actions, targets, and reasons are engine-generated and will vary):

1top 3 moves the engine suggests:
2 - [<action>] <target>
3   reason: <engine's reasoning>
4 - ...
5
6final clarity: ~0.8
7contradictions: 0

After feeding the cross-check, the engine sees the methodology distinction, supersedes the old "around $4B" assumption, and the contradiction typically resolves. Clarity climbs.

→ Concept: Moves — Q-value ranking, executor types, and how to use moves for autonomous routing.

8. Trace — what changed and why

Every transition is recorded. Look at the audit trail.

1const entries = await beliefs.trace()
2
3console.log(`total transitions: ${entries.length}\n`)
4console.log('most recent 5:')
5for (const e of entries.slice(0, 5)) {
6  const conf = e.confidence
7    ? ` (${e.confidence.before?.toFixed(2) ?? '?'} → ${e.confidence.after?.toFixed(2) ?? '?'})`
8    : ''
9  console.log(` - ${e.action}${conf} | ${e.reason ?? '—'}`)
10}

Output (abbreviated; specific reasons and confidence shifts will vary):

1total transitions: <N>
2
3most recent 5:
4 - updated (0.X → 0.Y) | <engine reason>
5 - resolved | <engine reason>
6 - created | <engine reason>
7 - ...

→ Concept: Ledger — what's recorded, replay semantics, and how to query the trail.

9. The complete agent

Here's everything assembled into one file. Save as agent.ts and run.

1import Beliefs from 'beliefs'
2
3// ─── Setup ─────────────────────────────────────────────────────────
4
5const beliefs = new Beliefs({
6  apiKey: process.env.BELIEFS_KEY!,
7  namespace: `research-${Date.now()}`,
8  writeScope: 'space',
9})
10
11const userMessage = 'Research the AI developer tools market'
12
13// ─── Stub agent ────────────────────────────────────────────────────
14// Each turn returns a realistic agent output. In production, replace
15// the body of `runAgent` with a call to your LLM (see Section 10).
16
17const turnOutputs = [
18  `Based on a Gartner 2024 report, the AI developer tools market is valued
19at $4.2B. The top three players (GitHub Copilot, Cursor, and Tabnine)
20account for approximately 65% of the market. Enterprise adoption is
21currently around 40% of total spend, with individual developers making up
22the remainder. Growing at roughly 25% YoY.`,
23
24  `Looking deeper into enterprise adoption: among Fortune 500 companies,
2572% have at least piloted an AI coding assistant, but only 31% have rolled
26it out company-wide. Biggest blockers: security review (58%), licensing
27complexity (44%), and uncertainty about ROI (37%). Highest in technology
28and financial services, lowest in healthcare and government.`,
29
30  `On individual developer adoption: of approximately 28 million professional
31developers worldwide, 9.2 million have used an AI coding assistant at least
32monthly in 2024 — about 33% penetration. Of those, 4.1 million pay personally
33for a tool. Average individual spend is ~$15/month across paid users.`,
34]
35
36const conflictingTool = `Tool result from market_research_db:
37{ "source": "IDC Q4 2024 AI DevTools Tracker",
38  "finding": "Global AI developer tools market is $6.8B, not $4.2B. Earlier
39    estimates excluded embedded AI features in mainstream IDEs.",
40  "methodology": "Bottom-up survey of 2,400 enterprises across 18 countries" }`
41
42const reconciliation = `Cross-checked IDC against Forrester and McKinsey.
43Forrester: $7.1B for 2024, closer to IDC. The discrepancy is methodology —
44Gartner's $4.2B excludes embedded IDE features; IDC and Forrester include
45them. The $6.8-7.1B range is the broader market; $4.2B is the narrow
46"AI-native" tools market.`
47
48async function runAgent(_systemPrompt: string, turn: number): Promise<string> {
49  return turnOutputs[turn] ?? ''
50}
51
52// ─── Goal + priors ─────────────────────────────────────────────────
53
54await beliefs.add(userMessage, { type: 'goal' })
55await beliefs.add('AI dev tools market is around $4B', {
56  confidence: 0.6,
57  type: 'assumption',
58})
59await beliefs.add('GitHub Copilot has the largest market share', {
60  confidence: 0.7,
61  type: 'assumption',
62})
63await beliefs.add('Missing breakdown by enterprise vs individual developers', {
64  type: 'gap',
65})
66
67// ─── Research loop ─────────────────────────────────────────────────
68
69const TARGET_CLARITY = 0.7
70const MAX_TURNS = 5
71
72for (let turn = 0; turn < MAX_TURNS; turn++) {
73  const ctx = await beliefs.before(userMessage)
74  console.log(
75    `turn ${turn + 1}: clarity ${ctx.clarity.toFixed(2)}, ` +
76    `${ctx.beliefs.length} beliefs, ${ctx.gaps.length} gaps`,
77  )
78
79  if (ctx.clarity >= TARGET_CLARITY) {
80    console.log(`  → clarity hit ${TARGET_CLARITY}, stopping`)
81    break
82  }
83
84  const output = await runAgent(ctx.prompt, turn)
85  if (!output) break
86  await beliefs.after(output)
87}
88
89// ─── Conflicting evidence + reconciliation ─────────────────────────
90
91console.log('\nfeeding conflicting tool result...')
92await beliefs.after(conflictingTool, {
93  tool: 'market_research_db',
94  source: 'IDC Q4 2024 Tracker',
95})
96
97const afterConflict = await beliefs.read()
98console.log(`  contradictions: ${afterConflict.contradictions.length}`)
99
100console.log('\ncross-checking and reconciling...')
101await beliefs.after(reconciliation, { source: 'Forrester + McKinsey' })
102
103// ─── Report ────────────────────────────────────────────────────────
104
105const final = await beliefs.read()
106
107console.log('\n── Final state ──')
108console.log(`clarity:        ${final.clarity.toFixed(2)}`)
109console.log(`beliefs:        ${final.beliefs.length}`)
110console.log(`gaps remaining: ${final.gaps.length}`)
111console.log(`contradictions: ${final.contradictions.length}`)
112
113console.log('\n── Top beliefs ──')
114const top = [...final.beliefs]
115  .sort((a, b) => b.confidence - a.confidence)
116  .slice(0, 5)
117for (const b of top) {
118  console.log(` [${b.confidence.toFixed(2)}] ${b.text}`)
119}
120
121console.log('\n── What we still don\'t know ──')
122for (const gap of final.gaps) console.log(` - ${gap}`)

Run it:

1npx tsx agent.ts

Expected output (specific numbers and extracted texts will vary across runs):

1turn 1: clarity ~0.3, 2 beliefs, 1 gaps
2turn 2: clarity ~0.5, ~6 beliefs, 1 gaps
3turn 3: clarity ~0.65, ~9 beliefs, 1 gaps
4turn 4: clarity ~0.75, ~12 beliefs, 0 gaps
5  → clarity hit 0.7, stopping
6  (or: loop exits when stub outputs are exhausted)
7
8feeding conflicting tool result...
9  contradictions: 1
10
11cross-checking and reconciling...
12
13── Final state ──
14clarity:        ~0.8
15beliefs:        ~14
16gaps remaining: 0
17contradictions: 0
18
19── Top beliefs ──
20 [~0.9] <engine-extracted claim about broader market>
21 [~0.85] <engine-extracted claim about adoption>
22 [~0.8] <engine-extracted claim about market share>
23 ...
24
25── What we still don't know ──
26(empty when the agent has filled its declared gaps)

10. Swap in a real LLM

Replace runAgent with a call to your model of choice. The rest of the file stays identical.

Anthropic

1import Anthropic from '@anthropic-ai/sdk'
2const client = new Anthropic()
3
4async function runAgent(systemPrompt: string, turn: number): Promise<string> {
5  const focus = turn === 0
6    ? userMessage
7    : `Investigate further: ${(await beliefs.before(userMessage)).moves[0]?.target ?? userMessage}`
8
9  const msg = await client.messages.create({
10    model: 'claude-sonnet-4-20250514',
11    max_tokens: 1024,
12    system: systemPrompt,
13    messages: [{ role: 'user', content: focus }],
14  })
15  return msg.content
16    .filter((b): b is { type: 'text'; text: string } => b.type === 'text')
17    .map((b) => b.text)
18    .join('')
19}

Vercel AI SDK

1import { generateText } from 'ai'
2import { anthropic } from '@ai-sdk/anthropic'
3
4async function runAgent(systemPrompt: string, _turn: number): Promise<string> {
5  const { text } = await generateText({
6    model: anthropic('claude-sonnet-4-20250514'),
7    system: systemPrompt,
8    prompt: userMessage,
9  })
10  return text
11}

OpenAI

1import OpenAI from 'openai'
2const openai = new OpenAI()
3
4async function runAgent(systemPrompt: string, _turn: number): Promise<string> {
5  const completion = await openai.chat.completions.create({
6    model: 'gpt-4o',
7    messages: [
8      { role: 'system', content: systemPrompt },
9      { role: 'user', content: userMessage },
10    ],
11  })
12  return completion.choices[0]?.message?.content ?? ''
13}

The belief layer is provider-agnostic. Anything that takes a system prompt and returns text plugs in here.

→ Reference: Hack Guide — full integration patterns for every major framework.

What you learned

You touched eight concepts in one build:

Concept	Where it appeared	Reference
World state	`beliefs.read()` returning the full picture	World
Goals	`add(text, { type: 'goal' })`	Intent
Beliefs + types	`add` with `type: 'assumption'`, `'gap'`	Beliefs
The loop	`before → act → after`	Loop Patterns
Clarity	Stopping condition	Clarity
Moves	Ranked next actions	Moves
Contradictions	Auto-detected via `after()`	World
Trace	Audit trail of every transition	Ledger

Where to go next

Multi-agent

Multiple agents contributing to one shared belief state. Trust-weighted fusion, conflict resolution.

Learn more

Streaming

SSE streams for live belief updates and per-extraction events.

Learn more

Patterns

Common integration patterns: tool-aware loops, clarity-driven routing, partial-extraction.

Learn more

You now have the model. The rest of the docs are reference for the parts you haven't needed yet.