@takk/agenticstash - v1.0.0 - Apache-2.0

Replay the bug you cannot reproduce.

An agent breaks in production and you cannot reproduce it locally: the model sampled differently, a tool returned new data, the clock moved. Agentic Stash records every source of non-determinism a run touched and serves it back on replay, so an irreproducible failure becomes step-through-debuggable. Then fork it, diff it, and seal it.

86tests passing
88%line coverage
0runtime deps
SLSAprovenance
What it is

One library, two answers.

In a few words

You wrap each non-deterministic call in your agent (a model output, a tool or MCP response, the clock, a random draw) with one function. While recording, Agentic Stash runs the call for real and saves what it returned. To debug, you load that recording and run the same code again: every wrapped call is served the recorded value instead of going live, so the run reproduces exactly and you can step through it. The same code path records and replays, you change nothing but how you construct the stash.

Technically

A content-addressed tape of recorded events keyed by (channel, key, ordinal), captured through a single intercept seam and replayed by substitution. An input hash per call detects divergence; a one-pass collect mode reports every departure (input-mismatch, extra-call, missing-call). Fork truncates or overrides a decision; diff aligns two tapes by identity; a SHA-256 hash chain seals a recording; a redaction hook strips secrets before storage. Zero runtime dependencies, node-free core (the seal uses Web Crypto).

Before and after

The same production bug, two timelines.

Take a real scene: a customer reports a critical bug. The logs show the input, but the agent sampled a different plan and called a tool that has since changed, so you cannot reproduce it locally.

Without Agentic Stash

Forensic guesswork for two days

  1. Day 1 The bug report lands. The run is gone; only the input survives in the logs.
  2. Day 1 You re-run the agent locally; it samples a different path and works fine.
  3. Day 1 to 2 Five engineers in a room try to reproduce it, and fail.
  4. Day 2 The tool the run hit has new data; the original response is unrecoverable.
  5. Day 2 You ship a hot-fix based on a guess, no way to confirm it.
  6. Week 5 The guess was wrong; the bug returns.
  7. The root cause was never actually seen.
With Agentic Stash

Replay the exact run, fix with proof

  1. Minute 1 The failing run was recorded in production; you load its tape locally.
  2. Minute 2 loadStash(tape) replays it deterministically: same model outputs, same tool responses, same clock.
  3. Minute 3 You step through and see the exact decision that broke.
  4. Minute 5 You change the code and replay; report() shows the first divergence, the call your fix changed.
  5. Minute 8 fork() forces the corrected decision and replays the branch to confirm the fix.
  6. You ship a fix you watched work, sealed and verifiable.

The "With Agentic Stash" timeline is the literal workflow the loadStash, report, and fork API supports: a recorded run replays deterministically by substitution, and the divergence report points at the first call your change altered.

Install

Five minutes from install to first replay.

1. Add the package

pnpm add @takk/agenticstash
npm install @takk/agenticstash
yarn add @takk/agenticstash
bun add @takk/agenticstash

2. Optional: bridge a sibling package

The core has zero runtime dependencies. Skip this unless you want the optional @takk/alkaline bridge or another sibling package.

pnpm add @takk/alkaline # optional: bridge
                sibling packages

3. Quickstart: record, then replay

import { createStash, loadStash } from '@takk/agenticstash';

// Wrap each non-deterministic call once; this code records and replays.
async function plan(stash, prompt) {
  return stash.intercept('llm', 'plan', () => callModel(prompt), { input: prompt });
}

// Record a run in production.
const rec = createStash({ id: 'run-1' });
const answer = await plan(rec, 'summarize the incident');
const tape = rec.save();

// Replay it locally, deterministically, with no live model call.
const replay = loadStash(tape);
const same = await plan(replay, 'summarize the incident');

4. Quickstart: the CLI

# Summarize a recording, diff two runs, fork a decision.
npx @takk/agenticstash inspect run.json
npx @takk/agenticstash diff before.json after.json
npx @takk/agenticstash fork run.json --at 4 --out branch.json

# Seal a recording, then verify it (exit 1 if tampered).
npx @takk/agenticstash seal run.json --out run.seal.json
npx @takk/agenticstash verify run.json run.seal.json
Features

Nine capabilities, every one tied to a measurable outcome.

Multi-source recording

One intercept seam captures every source of non-determinism: model outputs, tool and MCP responses, the clock, randomness, any wrapped call. The recorder is transparent: it returns the value unchanged.

Instrument a run once and the entire tape of what made it irreproducible is captured, in memory, ready to persist.

Deterministic replay

Load a recording and run the same code: each wrapped call is served its recorded value, recorded errors re-throw at the original site. Determinism is by substitution.

A production failure you could not reproduce becomes a local, step-through-debuggable run, with no live model or tool call.

Divergence report

Replay new code against an old recording and collect every divergence in one pass (input-mismatch, extra-call, missing-call) instead of throwing on the first.

"Find where the new code diverged" becomes a single call: the report points at the first call your change altered.

Fork engine

Keep every event before a point, optionally override that decision with a new value, drop the tail. Replay serves the shared prefix, applies the change, then runs live.

Explore "what if the planner had chosen B here?" without re-running the expensive, non-deterministic prefix.

Diff engine

Compare two recordings aligned by (channel, key, ordinal) identity, not raw sequence, and report added, removed, and changed events plus the first divergence.

See exactly which call changed between a passing run and a failing one, an inserted call no longer shifts everything after it.

Tamper-evident seal

A SHA-256 hash chain over the recording, via the Web Crypto API. verifyRecording detects any change to an event, value, order, or id. An integrity seal, not a signature.

Prove a recorded execution happened as logged, the tamper-evidence EU AI Act Article 12 record-keeping asks for.

Redaction

A record-time hook transforms each value and input before storage; return a masked value or the DROP sentinel for a metadata-only event.

Keep secrets and PII out of recordings entirely, so a sealed audit tape passes security review.

Framework-agnostic

Interceptors wrap any function (a model call, a tool, an HTTP fetch); the MCP bridge records tool calls duck-typed, importing no SDK. Nothing couples you to a framework.

Works with whatever agent stack you run today, and the one you migrate to tomorrow.

Zero-dep, edge, provenance

Zero required runtime dependencies; the node-free core runs in Cloudflare Workers, Vercel Edge, Deno, Bun, and the browser. Every release ships with SLSA provenance.

Record and replay wherever your agent runs, and verify the tarball you installed was built from the source commit you trust.

Recorded sources

Every source of non-determinism, one seam.

Each source flows through a named channel. Capture it once; replay serves the recorded value back on the same channel and key.

Source How you capture it What replay serves
Model output (llm) stash.intercept('llm', key, () => callModel(...), { input }) The recorded completion, byte for byte.
Tool or function (tool) wrap(stash, 'tool', name, fn) The recorded result, with a divergence flagged if the input changed.
MCP tool call (mcp) interceptMcpClient(stash, client) The recorded tool response, no live network call.
Clock (clock) createDeterministicClock(stash) The recorded timeline of reads, in order.
Randomness (random) createSeededRandom(stash, seed) The same sequence, from the recorded seed.
Anything else (custom) stash.intercept('<channel>', key, fn, { input }) The recorded value for that channel and key.

Replay matches by per-key call order; for concurrent same-key calls use distinct keys, or wait for the input-matched mode planned for 1.1.

Entry points

Ten subpaths, import only what you need.

Entry point Subpath export What it gives you
Core @takk/agenticstash createStash, loadStash, every function, types, errors.
Record @takk/agenticstash/record createRecorder, the DROP redaction sentinel.
Replay @takk/agenticstash/replay createReplayer with divergence collection.
Storage @takk/agenticstash/storage BlobStore, encodeRecording, decodeRecording, recordingStats.
Fork @takk/agenticstash/fork fork.
Diff @takk/agenticstash/diff diffRecordings.
Interceptors @takk/agenticstash/interceptors deterministic clock, seeded random, wrap, wrapSync.
MCP @takk/agenticstash/mcp interceptMcpClient, recordMcpTool, no SDK import.
Seal @takk/agenticstash/seal sealRecording, verifyRecording.
Edge @takk/agenticstash/edge The full core under a worker condition.
Command line

Inspect, diff, fork, seal, and verify from the terminal.

The CLI operates on recording JSON files (the output of stash.save()), so an engineer or an auditor can work a recording with no code. Exit codes follow the sysexits convention.

Inspect a recording and diff two runs

npx @takk/agenticstash inspect run.json
npx @takk/agenticstash diff before.json after.json

# diff prints the first divergence: llm/plan #2, for example

Fork a decision point

npx @takk/agenticstash fork run.json --at 4 --out branch.json

Seal and verify (exit 1 if tampered)

npx @takk/agenticstash seal run.json --out run.seal.json
npx @takk/agenticstash verify run.json run.seal.json

# verify exits 0 when the recording matches the seal, 1 when it was modified
Typed results

Every result is a plain, typed object.

No event bus, no agent, no vendor. A recording is JSON you can store and inspect; a divergence report is a typed object you can branch on with full TypeScript narrowing.

const recording = stash.recording();
// {
//   version: 1,
//   id: 'run-1',
//   events: [
//     { seq: 0, channel: 'llm', key: 'plan', ordinal: 0,
//       outcome: 'return', ref: 'a1b2c3d4e5f6a7', inputRef: 'd4e5f6a7b8c9d0' },
//     { seq: 1, channel: 'mcp', key: 'search', ordinal: 0,
//       outcome: 'return', ref: '0f1e2d3c4b5a69' }
//   ],
//   blobs: { 'a1b2c3d4e5f6a7': { plan: 'step A' }, ... },
//   meta: {}
// }

A divergence report

const report = replay.report();
// {
//   matched: false,
//   consumed: 7,
//   firstDivergence: { kind: 'input-mismatch', channel: 'llm', key: 'plan', ordinal: 2, seq: 5 },
//   divergences: [
//     { kind: 'input-mismatch', channel: 'llm', key: 'plan', ordinal: 2, seq: 5, ... },
//     { kind: 'extra-call', channel: 'tool', key: 'search', ... }
//   ]
// }

Errors are just as structured: every refusal is an AgenticStashError carrying a stable code (ERR_DIVERGENCE, ERR_RECORDING_EXHAUSTED, ERR_INVALID_RECORDING, ERR_INVALID_INPUT, ERR_NOT_FOUND). Branch on the code, never the message.

Compare

Agentic Stash vs the alternatives.

The other tools in this space solve adjacent problems well. The contrast clarifies where Agentic Stash sits.

Capability Agentic Stash Forkline LangGraph TT Hosted platforms Hand-rolled
Distribution npm library library framework feature SaaS / SDK your repo
How it runs re-executes your code in-process reconstructs state dashboards varies
Framework-coupled no no yes no varies
Tamper-evident seal yes, SHA-256 n/a no varies no
Redaction hook yes varies varies managed varies
Runtime dependencies 0 varies framework n/a (hosted) varies
Edge-ready (node-free) yes varies no via fetch varies
License Apache-2.0 open MIT commercial / open your call

The honest summary: the deterministic-replay category is real and not empty. Pick a hosted platform for dashboards and team analytics, or LangGraph time-travel if you live in LangGraph. Pick Agentic Stash when you want the recording itself, a zero-dependency primitive that re-executes your own code, in-process and on the edge, with a tamper-evident seal and redaction.

What it does not do

The honest limits of v1.0.0.

Determinism here is by substitution, not by magic. Agentic Stash replays the values it recorded; it does not make the model deterministic. These are the boundaries, stated plainly, with the workaround or the roadmap item for each.

Limit What it means Workaround / roadmap
Determinism by substitution Replay reproduces recorded values; it does not make the model deterministic. A code path that calls a key you never recorded has nothing to substitute. Record the path first, or use onDivergence: 'collect' to see exactly what changed.
Deterministic call order Replay pairs events by channel, key, and ordinal. Concurrent same-key calls resolved in a different order than recorded will mismatch. Roadmap 1.1: input-matched mode pairs by input hash for Promise.all fan-out.
Redaction trades replay fidelity A redacted value is gone for good. The run reaches that point and reports a divergence or a missing value, by design. Roadmap 1.1: dual-tape export keeps a full local tape and a redacted audit tape.
The seal proves integrity, not identity The SHA-256 hash chain shows a recording was not altered after sealing. It does not prove who produced it. It is a seal, not a signature. Roadmap 1.1: optional Ed25519 signing on top of the seal for non-repudiation.
Capture is opt-in Agentic Stash records what you route through intercept or wrap. Sources you never instrument stay non-deterministic. Wrap every non-deterministic seam: model calls, tools, clock, random, network.
Streams are captured as resolved values A token stream is recorded as its final resolved value, not as the incremental chunk sequence. Roadmap 1.1: streaming capture records and replays the chunk sequence.
Not a dashboard or profiler It produces recordings, divergence reports, and a CLI. It is not hosted analytics or a metrics surface. Pipe the typed results into your own observability; a replay UI is a horizon item, not the library.
Quality and validation

The receipts behind v1.0.0.

Tests & coverage

86 tests passing across 14 suites under Vitest 4. Coverage: 88.62% lines, 87.96% statements, 89.83% functions, 76.94% branches. Run pnpm test on a fresh clone to reproduce.

Type safety

TypeScript 6 in maximum strict mode (exactOptionalPropertyTypes, useUnknownInCatchVariables, noUncheckedIndexedAccess, noImplicitOverride, noImplicitReturns). Zero errors under tsc --noEmit.

Lint, format, types

Biome 2 clean across src and tests. publint clean and attw clean. The exports map is dual ESM + CJS with separate .d.ts and .d.cts across all ten entry points.

Zero dependencies and size

Zero runtime dependencies. The core entry point is about 4.1 kB brotli, enforced by size-limit in CI. The core is node-free, so it runs in Node, edge runtimes, and the browser without a shim.

CLI smoke

A 10-assertion smoke test spawns the built agenticstash binary from dist against fixture recordings and asserts inspect, diff, fork, seal, and verify behavior plus their exit codes.

Supply chain

Committed pnpm lockfile, a files allowlist that ships only dist and the core docs, and SLSA provenance attestation on every published version. Verify with npm view @takk/agenticstash@1.0.0 --json | jq .dist.attestations.

Roadmap

What is shipped, what is next, what is later.

Now (1.0)

Shipped in v1.0.0

  • Record and replay across every seam
  • Divergence report in a single pass
  • Fork a run, diff two runs
  • Tamper-evident SHA-256 seal
  • Redaction hook with a metadata-only drop
  • Ten entry points plus the CLI
  • Dual ESM + CJS, SLSA provenance
Next (1.1)

Targeted for 1.1

  • Input-matched replay for concurrent fan-out
  • OpenAI and Anthropic SDK adapters
  • Dual-tape redacted, sealed audit export
  • Streaming capture for token streams
  • Ed25519 signing on top of the seal
Later

On the horizon

  • Replay UI for step-by-step state inspection
  • Redaction policy presets (PII, JSON-path)
  • VS Code and Cursor time-travel extensions
  • Compression beyond content-addressed dedup
FAQ

Common questions.

Is Agentic Stash production-ready at 1.0.0?

Yes, for what it claims. 86 tests across 14 suites pass under Vitest 4 with 88.62% line coverage; TypeScript 6 maximum strict mode is clean; Biome 2 lint is clean; publint and attw are clean. The core has zero runtime dependencies and is about 4.1 kB brotli. Every published release carries SLSA provenance produced by GitHub Actions. Read the honest limits section above before you adopt it: determinism here is by substitution, not magic.

Does this make my agent deterministic?

No, and we will not pretend otherwise. The model still samples live when you run normally. Agentic Stash records every non-deterministic value an agent run touches, then, during replay, substitutes the recorded value instead of calling out again. The run becomes reproducible because the answers are fixed, not because the model became deterministic.

How is this different from a mock or a VCR cassette?

A cassette records HTTP. Agentic Stash records every seam an agent touches: model outputs, tool and MCP responses, the clock, randomness, and environment reads, in one content-addressed tape that deduplicates repeated payloads. On top of that it gives you a divergence report, fork to explore an alternate decision, diff between two runs, a tamper-evident seal, and a redaction hook. A mock has none of that.

Why not LangGraph time-travel or a hosted replay platform?

The deterministic-replay category is real and we respect it. LangGraph time-travel is excellent if you live in LangGraph; hosted platforms give you dashboards and team analytics. Agentic Stash is the opposite trade: a zero-dependency library you pnpm add, framework-agnostic, that re-executes your own code in-process and on the edge. You own the recording file.

Does this work in Cloudflare Workers, Vercel Edge, Bun, Deno, or the browser?

Yes. The core and the ./edge entry point are node-free and run on any modern JavaScript runtime. The tamper-evident seal uses Web Crypto SHA-256, which is available in all of them. The ./storage file backend is the only Node-specific piece; bring your own store on the edge.

What happens on replay if the code calls something that was not recorded?

That is a divergence, and it is the whole point. By default replay throws an AgenticStashError with code ERR_DIVERGENCE or ERR_RECORDING_EXHAUSTED. Set onDivergence: 'collect' to walk the entire run in one pass and get a DivergenceReport listing every input mismatch, extra call, and missing call.

How do I record without capturing secrets or PII?

Pass a redact hook. It runs before anything is written, so you can rewrite a value or return the DROP sentinel to keep only metadata. Redaction is one-way and trades replay fidelity: a dropped value cannot be replayed. A dual-tape export that keeps a full local tape and a redacted audit tape is on the 1.1 roadmap.

Is the seal a digital signature?

No. sealRecording builds a SHA-256 hash chain over the events and verifyRecording proves the recording was not altered after sealing. It proves integrity, not identity: it tells you the tape is intact, not who produced it. Optional Ed25519 signing for non-repudiation is on the 1.1 roadmap.

How big is it, and what does it depend on?

Zero runtime dependencies. The core entry point is about 4.1 kB brotli, enforced by size-limit in CI. Content hashing uses a small pure-JS function for blob dedup, and the seal uses the platform Web Crypto API, so nothing is pulled in at install time.

Does it support streaming responses?

In 1.0 a stream is captured as its final resolved value, not as the incremental chunk sequence. That is enough to replay the outcome of a streamed call, but it does not reproduce the timing of individual tokens. First-class streaming capture is on the 1.1 roadmap.

How do I verify a published version's provenance?

Every release is published with npm publish --provenance. Check the attestations with npm view @takk/agenticstash@<version> --json | jq .dist.attestations. The attestation links the tarball you installed to the GitHub Actions workflow that built it from a specific source commit.

What is the policy on breaking changes?

Strict SemVer 2.0.0, starting from 1.0.0. The binding stability surface, including the recording format and the error codes, is documented in SPEC.md. Major bumps require a deprecation cycle; security fixes follow the disclosure flow in SECURITY.md.

Author

Built and maintained by David C Cavalcante.

David C Cavalcante

Founder, Takk Innovate Studio

Product Engineer, AI Engineer, ML Engineer, LLM Engineer, LLM Architect, AI Researcher. Builder of the @takk family of NPM packages for Massive Intelligence (IM) and non-human entity (NHE) infrastructure.

Agentic Stash is one package in a planned portfolio of NPM libraries targeting Massive Intelligence (IM) infrastructure for 2026 to 2030. It sits in the agent stack alongside the rest of the @takk family. Adjacent research by the author covers the systemic intelligence frameworks MAIC, the universe, HIM, the model, and NHE, the agent, published independently of this codebase, with research notes on PhilPapers and PhilArchive linked from the repository README.

If a replay turned a two-day production mystery into a fifteen-minute fix, the most useful thing you can do is open a GitHub issue with the run that broke. The releasing runbook, the threat model, and the contributor agreement all live in the repository.