@takk/alkaline - v1.0.0 - Apache-2.0

Your agent survives the crash.

Your agent runs for six hours collecting data. The server restarts for a deploy. With most setups, everything is lost. Alkaline records every step as it happens, so when the process comes back the workflow replays its history and resumes exactly where it stopped. No step runs twice. No state is lost. No sidecar service to run.

69tests passing
88%coverage
0runtime deps
SLSAprovenance
What it is

One library, two answers.

In a few words

You define a workflow as a plain function and run it. Alkaline records every step as it happens, so if the process dies, the next run replays the recorded history and continues from the last completed step. Nothing is lost, nothing runs twice. It persists to a battery cell you choose, in memory, a SQLite file, or your own Postgres or Redis, and it prevents the loops and runaway token bills that break long-running agents. Your code keeps its shape: you call ctx.step() and the kernel makes it durable.

Technically

An in-process, event-sourced durable execution kernel. Every effect routed through the context is recorded and replayed deterministically by cursor memoization, with divergence detection. Opt-in step retries with exponential backoff, native graph cycle detection via depth headers, an enforced token budget, signals, queries, pause, resume, child workflows, and continueAsNew. A swappable, rechargeable StateStore with four cells, plus a Hermes-Agent-style multi-agent task board. Zero runtime dependencies.

Before and after

The same incident, two timelines.

Take a real production scene: your research agent has been running for two hours, twelve durable steps deep, when the server restarts for a routine deploy.

Without Alkaline

Lost work, a six-hour run from zero

  1. 09:14:01 A deploy restarts the worker mid-run.
  2. 09:14:01 The in-memory state of the workflow is gone with the process.
  3. 09:14:02 On boot, the agent has no record of the twelve steps it finished.
  4. 09:14:02 It starts the six-hour job again from step one.
  5. 09:14:03 Every external call it already made runs a second time.
  6. 09:14:03 The token bill doubles; duplicate writes corrupt downstream data.
  7. 10:40:00 An engineer reconstructs progress by hand.
With Alkaline

Replay, the agent resumes mid-run

  1. 09:14:01 A deploy restarts the worker mid-run.
  2. 09:14:01 Alkaline had recorded each of the twelve steps to its cell as they completed.
  3. 09:14:02 On boot, runtime.start with the same id replays the history.
  4. 09:14:02 Steps one through twelve return their recorded results without re-running.
  5. 09:14:02 The workflow continues from step thirteen, the first one it had not finished.
  6. 09:14:02 No external call repeats. The token budget is intact.
  7. The user saw a brief pause. The on-call was never paged.

The "With Alkaline" timeline is the behavior of the durable runtime's suspend-and-resume path, exercised by the test suite across all four store cells on Node 20, 22, and 24.

Install

Five minutes from install to first durable workflow.

1. Add the package

pnpm add @takk/alkaline
npm install @takk/alkaline
yarn add @takk/alkaline
bun add @takk/alkaline

2. Pick a durable cell

The default is an in-memory cell. For durability across restarts, choose a battery cell. SQLite needs no install (it uses the built-in node:sqlite on Node 22.5+); Postgres and Redis take a client you already have, Alkaline bundles no driver.

import { createSqliteStore } from '@takk/alkaline/sqlite';
              import { createPostgresStore } from '@takk/alkaline/postgres';

              // SQLite: a single file, no driver to install.
              const sqlite = createSqliteStore({ path: './agent.alkaline' });
              // Postgres or Redis: inject the client you already use.
              const pg = createPostgresStore({ client: pool
              });

3. Define and run a durable workflow

import { createRuntime, defineWorkflow } from '@takk/alkaline';

              const research = defineWorkflow({
              name: 'research',
              async handler(ctx, input) {
              // Steps are memoized: on replay the recorded result returns, never
                re-runs.
              const sources = await ctx.step('search', () => searchTheWeb(input.topic));
              // Retry is opt-in and per step: at-least-once execution, exactly-once
                effect.
              return await ctx.step('summarize', () => callTheModel(sources), {
              retry: { maxAttempts: 3, backoffMs: 500 },
              });
              },
              });

              const runtime = createRuntime();
              const run = await runtime.start('research', { topic: 'durable execution' });
              const summary = await run.result();

4. Suspend on a signal, resume after a crash

import { createRuntime, defineWorkflow } from '@takk/alkaline';
              import { createSqliteStore } from '@takk/alkaline/sqlite';

              const approval = defineWorkflow({
              name: 'approval',
              async handler(ctx) {
              const decision = await ctx.waitForSignal('decision'); // suspends durably
              return decision;
              },
              });

              // Persist to a durable cell so a restart resumes from history.
              const runtime = createRuntime({ store: createSqliteStore({ path: './agent.alkaline' })
              });
              const run = await runtime.start('approval', null); // status: suspended

              // ... the process restarts, a human approves hours later ...
              await runtime.signal(run.id, 'decision', 'approved'); // resumes, completes
Features

Nine capabilities, every one tied to a measurable outcome.

Deterministic replay

Every effect routed through the context is recorded; on replay each step returns its stored result without re-running. Divergence detection catches a workflow that drifts from its history.

A crash mid-run resumes exactly where it stopped, with no step repeated and no state lost.

Opt-in retries

A per-step, per-workflow, or per-runtime retry policy with exponential backoff and a retryable predicate. The default is a single attempt, so a non-idempotent step never repeats by surprise.

A transient tool-call failure recovers on its own; the workflow never notices.

Cycle detection

Native graph cycle detection via depth headers. A workflow that re-enters its own ancestry, or a child chain past the configured depth, fails fast with a clear error.

An agent that would loop forever stops at the first cycle, before the token bill climbs.

Token budget

Declare a budget and charge against it with ctx.spend; the execution halts the moment a charge would pass the declared limit.

A runaway agent halts instead of draining a balance; the cost ceiling is enforced, not hoped for.

Signals and lifecycle

waitForSignal, signal, query, pause, resume, and cancel drive a live or suspended execution from the outside.

Human-in-the-loop approval and pauses that last minutes or months are first-class, not bolted on.

continueAsNew

End and restart the same execution id with fresh input and an empty history, carrying a compact summary forward instead of replaying thousands of past events.

A long-horizon agent runs for days without its history growing without bound.

Swappable battery cells

A StateStore contract with four cells: in-memory, node:sqlite, and dependency-injected Postgres and Redis. Hot-swap or migrate at runtime.

Start in memory, recharge into a durable file or your own database, with no change to the workflow.

Multi-agent task board

A durable task board with explicit states, heartbeats, lease-based zombie reclaim, a cycle-checked dependency graph, and an event log, the NPM equivalent of the Hermes Agent Kanban.

Many agents coordinate on shared work without a fragile in-process swarm.

SLSA provenance

Every published version signed with npm publish --provenance through GitHub Actions OIDC. Lockfile committed; zero runtime dependencies shrink the supply-chain surface to nothing.

Verify in one command that the tarball you installed was built from the source commit you trust.

The battery

Four state-store cells, swap or recharge at runtime.

Pass a cell to createRuntime({ store }), or hot-swap with runtime.swapStore. All four are zero-dependency.

Cell How it persists When to use it
createMemoryStore In-process maps, discarded on exit. The zero-config default. Tests, demos, or a host that recharges into a durable cell on shutdown.
createSqliteStore A single file via the built-in node:sqlite (Node 22.5+). A single embedded agent or one self-hosted server. No server, no driver to install.
createPostgresStore Parameterized SQL through the client you inject. You already run Postgres and want durable state in it; Alkaline bundles no driver.
createRedisStore Keys, lists, and hashes through the client you inject. You already run Redis and want fast durable state in it; Alkaline bundles no driver.

A custom cell is any object implementing the StateStore contract; migrateStore moves a workflow set between cells. Distributed multi-writer leasing is on the roadmap.

Entry points

Nine entry points, one for each job.

Entry Import Use it for
core @takk/alkaline createRuntime, defineWorkflow, the memory cell, clocks, the full type contract.
store @takk/alkaline/store The StateStore contract and migrateStore for custom or migrated cells.
sqlite @takk/alkaline/sqlite The node:sqlite durable cell, a single file (Node 22.5+).
postgres @takk/alkaline/postgres A Postgres cell over the client you inject.
redis @takk/alkaline/redis A Redis cell over the client you inject.
board @takk/alkaline/board The durable multi-agent task board.
replay @takk/alkaline/replay Inspect, format, and diff a past execution's durable trace.
mcp @takk/alkaline/mcp Wrap a Model Context Protocol tool call as a durable, budgeted step.
edge @takk/alkaline/edge The edge-safe surface, no Node built-in, the SQLite cell omitted.
CLI

Inspect a durable store from the terminal.

The alkaline binary reads a SQLite store and prints what is inside it: every execution, and the full durable trace of any one. No library import required.

List the executions in a store

npx @takk/alkaline list --db ./agent.alkaline

Show one execution and its durable trace

npx @takk/alkaline show exe_... --db ./agent.alkaline

Or read the trace as JSON

npx @takk/alkaline show exe_... --db ./agent.alkaline --json |
              jq
Observability

Five lifecycle events. No OpenTelemetry runtime dependency.

Pass an onEvent listener to createRuntime. Every event is a typed object; the union RuntimeEvent is exported so you can branch on event.kind with full narrowing in TypeScript.

const runtime = createRuntime({
              store,
              onEvent(event) {
              switch (event.kind) {
              case 'started': return log.info({ run: event.execution, workflow: event.workflow });
              case 'signal': return log.info({ run: event.execution, signal: event.name });
              case 'suspended': return log.info({ run: event.execution, waiting: event.waiting });
              case 'completed': return metrics.increment('alkaline.completed');
              case 'failed': return alerts.notify(`${event.execution}: ${event.failure.message}`);
              }
              },
              });

Meter every token charge

const runtime = createRuntime({
              store,
              meter: {
              record(execution, tokens, label) {
              metrics.histogram('alkaline.tokens', tokens, { execution, label });
              },
              },
              });

Inspect an execution on demand

const record = await runtime.getExecution(run.id);
              // {
                // id: 'exe_7Hq...',
                // workflow: 'research',
                // status: 'suspended', // running | suspended | completed | failed | paused | cancelled
                // attempt: 2, // replay passes from the top
                // spent: 1840, // tokens charged so far
                // cursorLength: 12, // recorded history events, the replay cursor
                // waiting: 'approval', // the signal it is parked on
                // updatedAt: 1748449183210
                // }

The listener runs in process and must not throw. Alkaline makes zero network calls of its own, so nothing leaves your process unless you wire it to leave. The durable FailureInfo keeps the message, name, and code, but deliberately drops the stack trace so a thrown error cannot smuggle host paths into your history.

Compare

Alkaline vs the alternatives.

Durable execution is a real category with strong tools. The contrast clarifies where Alkaline sits, and where it does not.

Capability Alkaline DBOS Temporal Inngest Hand-rolled
Distribution npm library npm library server + SDK SaaS + dev server your repo
Runtime hop in-process in-process external cluster external service in-process
Runtime dependencies 0 Postgres driver many SDK + service varies
Pluggable durable store memory / SQLite / PG / Redis Postgres only server-managed managed varies
Runs with no server yes needs Postgres no no yes
Deterministic replay yes, with divergence detection checkpointed yes step-memoized rarely
Agent-native primitives budget, cycles, task board no no no no
License Apache-2.0 MIT MIT Apache-2.0 your call

The honest summary: pick Temporal if you want the category-defining engine and can run a cluster. Pick DBOS if you want an embeddable library and already standardize on Postgres. Pick Inngest or Vercel WDK if you want a managed service with a dashboard. Pick Alkaline when you embed durable execution inside an agent or NHE and want zero runtime dependencies, a cell you can swap, and the agent-native primitives (token budget, cycle detection, a task board) in the box.

Error codes

Every guardrail raises one typed error.

Each refusal is an AlkalineError carrying a stable code. Branch on the code, never on the message. Nine codes cover everything the kernel can refuse; anything else is an error you threw, preserved verbatim through the durable history.

Code Raised when Guards against
ERR_INVALID_INPUT a workflow, signal, or option is malformed bad input reaching the engine
ERR_NOT_FOUND an execution, task, or record id does not exist silent no-ops on missing state
ERR_DETERMINISM a replay diverges from recorded history non-deterministic handlers corrupting state
ERR_CYCLE_DETECTED a child call or board link would form a cycle runaway recursion between workflows
ERR_DEPTH_EXCEEDED a child-workflow chain passes maxDepth unbounded child fan-out
ERR_BUDGET_EXCEEDED an execution spends past its token budget cost blowouts inside an agent loop
ERR_CONFLICT a duplicate execution id or concurrent-update clash two writers racing the same record
ERR_STORE a state-store cell raises an error storage faults leaking as untyped errors
ERR_CLOSED the runtime or a cell is used after close use-after-close bugs
Quality and validation

The receipts behind v1.0.0.

Tests & coverage

69 tests passing across 13 suites under Vitest 4. Coverage: 88.3% lines, 88.3% statements, 92.8% functions, 72.9% branches. Run pnpm test on a fresh clone to reproduce.

Type safety

TypeScript 6 in maximum strict mode (exactOptionalPropertyTypes, useUnknownInCatchVariables, noUncheckedIndexedAccess, noImplicitOverride, noImplicitReturns). Zero errors under tsc --noEmit.

Lint & format

Biome 2 clean across src, tests, and examples. publint clean. Exports map is dual ESM + CJS with separate .d.ts and .d.cts per subpath.

Crash and resume

Validated end-to-end: a workflow suspended on a signal, the runtime dropped, then rebuilt from a SQLite cell and resumed from history to completion, with the durable trace byte-identical across the restart.

CLI smoke

Functional CLI smoke test that spawns the published binary against a real SQLite store and asserts list and show print every execution and its durable trace, with help and version intact and exit code 0.

Supply chain

Committed pnpm lockfile, supply-chain policy with minimum release age, SLSA provenance attestation on every published version. Verify with npm view @takk/alkaline@1.0.0 --json | jq .dist.attestations.

Roadmap

What is shipped, what is next, what is later.

Now (1.0)

Shipped in v1.0.0

  • Event-sourced deterministic replay
  • Opt-in retries and continueAsNew
  • Four cells: memory, SQLite, Postgres, Redis
  • Cycle detection and token budget
  • Multi-agent durable task board
  • Nine entry points and the alkaline CLI
  • Dual ESM + CJS, SLSA provenance
Next (1.1)

Targeted for 1.1

  • Parallel step groups (ctx.all)
  • Built-in cron and interval triggers
  • Encrypted-at-rest option for cells
  • IndexedDB cell for the browser
  • Replay timeline inspector
Later

On the horizon

  • Distributed lease for multi-process workers
  • First-class OpenTelemetry exporter (opt-in)
  • Adaptive scheduling learned from history
  • A managed control plane, optional, never required
FAQ

Common questions.

Is Alkaline production-ready?

Yes. Version 1.0.0 ships with 69 tests passing across 13 suites, 88% statement coverage, green on Node 20, 22, and 24, TypeScript 6 maximum strict mode, Biome 2 lint clean, publint and are-the-types-wrong clean across all entry points, and SLSA provenance on every release.

Why not just use Temporal, DBOS, or the Vercel Workflow DevKit?

Temporal is distributed durable execution that needs a cluster or Temporal Cloud. DBOS is general-purpose, Postgres-centric durable execution. Vercel, Inngest, and Cloudflare tie you to a platform. Alkaline is the agent-native, embeddable, zero-dependency option: you import it, it runs single-writer in your process, and it ships cycle detection, a token budget, and a multi-agent board built in.

Does Alkaline require runtime dependencies?

No. Every shipped bundle has zero runtime dependencies. The SQLite cell uses the built-in node:sqlite; the Postgres and Redis cells take a client you inject, so Alkaline bundles no database driver.

How does Alkaline guarantee a workflow resumes correctly after a crash?

It is event-sourced: every effect routed through the context is recorded, and on replay each recorded step returns its stored result without re-running, so the successful outcome is exactly-once. A step with a retry policy executes at least once. Keep non-determinism inside ctx.step, ctx.now, ctx.random, and ctx.uuid and replay is faithful, with divergence detection if the code drifts from history.

How does Alkaline stop an agent from looping forever?

Two guardrails are built in. Native cycle detection via depth headers fails a workflow that re-enters its own ancestry or passes the depth limit. An enforced token budget halts the execution the moment a charge would pass the declared limit, the cost and denial-of-service vector the kernel exists to prevent.

Which durable cell should I use?

The memory cell for tests and ephemeral runs. The SQLite cell for single-process durability in one file. The Postgres or Redis cell, over a client you inject, when separate processes or services need to read the same durable state. Swap between them with runtime.swapStore and migrateStore without rewriting a workflow.

Can I run Alkaline across multiple processes against one store?

Alkaline is single-writer per execution in 1.0: one runtime advances a given execution at a time. Separate processes can own different executions over a shared Postgres or Redis cell. A distributed lease that lets many workers safely contend for the same execution is on the roadmap, deliberately held back until it is correct rather than shipped half-built.

Does it run in Cloudflare Workers, Vercel Edge, Bun, or Deno?

The core and the @takk/alkaline/edge surface run on any modern JavaScript runtime with no Node built-in required; the SQLite cell is omitted there. The SQLite cell itself needs node:sqlite on Node 22.5+. The Postgres and Redis cells run wherever the client you inject runs.

How do I verify a published version's provenance?

Every release is published with npm publish --provenance. Check the attestations with npm view @takk/alkaline@<version> --json | jq .dist.attestations. The attestation links the tarball you installed to the GitHub Actions workflow that built it from a specific source commit.

What is the policy on breaking changes?

Strict SemVer 2.0.0, starting from 1.0.0. The binding stability surface is documented in SPEC.md. Major bumps require a deprecation cycle; security fixes follow the disclosure flow in SECURITY.md.

Author

Built and maintained by David C Cavalcante.

David C Cavalcante

Founder, Takk Innovate Studio

Product Engineer, AI Engineer, ML Engineer, LLM Engineer, LLM Architect, AI Researcher. Builder of the @takk family of NPM packages for Massive Intelligence (IM) infrastructure.

Alkaline is part of a planned portfolio of NPM libraries targeting Massive Intelligence (IM) infrastructure for 2026 to 2030. Adjacent research by the author covers systemic intelligence frameworks (MAIC, HIM, NHE) published independently of this codebase, with research notes on PhilPapers and PhilArchive linked from the repository README.

If Alkaline saved an agent run this quarter, the most useful thing you can do is open a GitHub issue when you find an edge case the test suite missed. The runbook for releases, the threat model, and the contributor agreement all live in the repository.