Which HTTP status codes trigger automatic failover?

Transient classes are retried and failed over: 429 rate-limited (respects Retry-After), 500/502/503/504 and any 5xx server-error including Anthropic 529, 408/425 timeout, and network errors with no HTTP status. Terminal classes are not retried, failing over immediately or throwing: 401/403 unauthorized and 400 or other 4xx bad-request. When the whole pool is exhausted or circuit-open, modelchain throws AllModelsExhaustedError with the router snapshot.

Does modelchain work with the Vercel AI SDK, Edge runtimes, and the browser?

Yes. toVercelAILanguageModel(router) returns a LanguageModelV2-compatible object usable with generateText, streamText, and tool calling. The core, /edge, and /web entries use only Web Fetch and Web Streams with no node:* built-ins, so modelchain runs on Cloudflare Workers, Vercel Edge, Deno, Bun, and in the browser via a key resolver.

@takk/modelchain - v1.0.0 - Apache-2.0

Route every prompt to the right model.

Name: @takk/modelchain
Availability: InStock
Author: David C Cavalcante

You hold keys for OpenAI, Anthropic, and Gemini. Which one should answer this request? The cheapest that clears your quality bar? The fastest? The one that has not started failing yet? modelchain declares a pool of models, picks the best one per request by cost, latency, and observed quality, streams the tokens, normalises tool calls, scores the answer, and feeds that score into the next decision.

Install in 30 seconds View on GitHub

182tests passing

5.6 KBcore, brotli

0runtime deps

7routing strategies

What it is

One library, two answers.

In a few words

You give modelchain a list of models for one or more providers (OpenAI, Anthropic, Gemini, or any OpenAI-compatible HTTP endpoint), each with its cost and a key resolver. It hands each request to the model your strategy selects, retries with backoff on a transient failure, fails over to the next eligible model, and scores the response so the next decision is better-informed. You call router.complete({ prompt }) for a full answer, or iterate router.stream({ prompt }) for tokens as they arrive. The same package runs in Node, on the Edge, and in the browser.

Technically

A routing layer that wraps a model registry behind a per-request loop of select, pre-flight budget, dispatch, classify error, retry or fail over, score. Seven pluggable routing strategies, six built-in scorers, per-model circuit breaker with a closed -> open -> half-open state machine, full-jitter exponential backoff that honours upstream Retry-After, a hard budget guard, native streaming over Web Streams, normalised tool calling, a Vercel AI SDK adapter, and thirteen telemetry events. State persistence is pluggable.

Static vs measured

The same pool, two routing philosophies.

Take a real scene: you wire three models into your app on launch day and hard-code "always call the cheap one, fall back to the expensive one." Three months later the providers have shipped new models, prices have moved, and one endpoint has quietly gotten slower.

Static routing that decays

Hard-coded rules, silently wrong by Q3

Day 0 You hand-write if cheap fails, use expensive and ship it.
Day 30 The "cheap" model starts truncating long answers, but the rule has no idea - it only checks for HTTP errors.
Day 45 A new, cheaper, better model launches. Nobody edits the rule.
Day 60 The "expensive" fallback's p50 latency doubles. Still hard-coded as the fallback.
Day 75 Costs drift up and quality drifts down; no signal connects the two.
Day 90 An engineer rediscovers the rule, re-benchmarks by hand, edits the if, and redeploys.
The pool never learned anything. The next drift starts the cycle again.

Measured routing that adapts

Scored every call, adjusts on its own

Day 0 You declare the pool with strategy: 'cost-then-quality' and the latency + token-budget scorers.
Every call modelchain selects the cheapest model that clears the quality floor and emits model.selected with the reason.
Every response the scorers normalise quality to [0, 1] and score.recorded fires; truncation drops the model's quality state.
Add a model you append one line to the pool. Cold-start models are explored first to gather data.
Latency drifts the EWMA health and latency-first signal notice; the router stops favouring the slow endpoint.
A model degrades repeated failure trips its circuit; circuit.open fires and the router skips it until it recovers.
The golden routing suite locks every decision as a SemVer contract, so upgrades never silently change behaviour.

The "measured routing" decisions are the literal output locked by the golden routing suite (tests/golden/routing.test.ts) in the modelchain repository, validated front to back against the real Google Gemini REST API.

Install

Five lines from `install` to first route.

1. Add the package

pnpm add @takk/modelchain

npm install @takk/modelchain

yarn add @takk/modelchain

bun add @takk/modelchain

2. Install only the optional peers you use

modelchain calls every provider's REST API directly via Web Fetch and imports no SDK at runtime. The peers are optional and exist for richer types in your editor. The HTTP provider needs nothing.

pnpm add openai # types for
                openaiModel(...)
              pnpm add @anthropic-ai/sdk # types only
              pnpm add @google/genai # types only
              pnpm add ai @ai-sdk/provider # for the Vercel AI
                SDK adapter

3. Declare a pool and route a prompt

import { createModelchain } from '@takk/modelchain';
              import { anthropicModel, geminiModel, openaiModel } from '@takk/modelchain/providers';

              const router = createModelchain({
              models: [
              openaiModel('gpt-4o-mini', {
              cost: { costPer1kInput: 0.00015, costPer1kOutput: 0.00060 },
              keys: process.env.OPENAI_API_KEY ?? '',
              }),
              anthropicModel('claude-3-5-haiku-latest', {
              cost: { costPer1kInput: 0.00080, costPer1kOutput: 0.00400 },
              keys: process.env.ANTHROPIC_API_KEY ?? '',
              }),
              geminiModel('gemini-2.0-flash', {
              cost: { costPer1kInput: 0.00010, costPer1kOutput: 0.00040 },
              keys: process.env.GEMINI_API_KEY ?? '',
              }),
              ],
              strategy: 'cost-then-quality',
              scoring: { built: ['latency', 'token-budget']
              },
              budget: { perRequestUsd: 0.02, dailyUsd: 5 },
              telemetry: { enabled: true },
              });

              const response = await router.complete({ prompt: 'Summarise X in 3 bullets.',
              maxTokens: 200 });
              console.log(response.text, response.finishReason,
              response.usage);

4. Or run it on the Edge with any OpenAI-compatible endpoint

import { createModelchain } from '@takk/modelchain/edge';
              import { httpModel } from '@takk/modelchain/providers';

              export default {
              async fetch(req, env) {
              const { prompt } = await req.json();
              const router = createModelchain({
              models: [
              httpModel('llama-3.1-70b', {
              baseUrl: 'https://api.groq.com/openai/v1',
              cost: { costPer1kInput: 0.00059, costPer1kOutput: 0.00079 },
              keys: env.GROQ_API_KEY,
              }),
              ],
              strategy: 'cost-first',
              budget: { perRequestUsd: 0.005 },
              });
              const response = await router.complete({ prompt });
              return Response.json({ text: response.text });
              },
              };

Features

Eleven capabilities, every one tied to a measurable outcome.

Declarative pool + 7 strategies

Declare models once; route by cost-then-quality (default), cost-first, quality-first, latency-first, weighted, round-robin, or sequential-fallback. Custom logic via the RoutingStrategy interface.

Stop hand-picking a model per call site; change routing policy by editing one string, not every request.

Pluggable scorers

Six built-in scorers (latency, token-budget, length-bound, regex-match, exact-match, schema-valid) normalise every response to [0, 1] and feed the next routing decision. Plug LLM-as-judge via ScoringStrategy.

Routing reflects what actually came back, not a static guess that decays as providers ship new models.

Native streaming

router.stream({...}) returns an AsyncIterable<CompletionChunk> normalised across OpenAI deltas, Anthropic content_block_delta, and Gemini streamGenerateContent, over Web Streams. Always ends with exactly one finish chunk.

Write the streaming loop once; the same code consumes every provider with a reliable terminal chunk for cost reconciliation.

Native tool calling

Declare tools: ToolDefinition[] once; modelchain translates to OpenAI tools, Anthropic input_schema, and Gemini functionDeclarations, then parses the result back into a normalised ToolCall[]. Works on complete() and stream().

One tool schema, every provider; no per-provider function-calling shim to maintain.

Vercel AI SDK adapter

toVercelAILanguageModel(router) returns a LanguageModelV2-compatible object, typed structurally with no compile-time dependency on @ai-sdk/provider. Works with generateText, streamText, and tool calling.

Drop measured routing into an existing Vercel AI SDK app without rewriting a single call site.

Hard budget guard

Per-request, per-task, and daily (UTC) ceilings. A breach throws BudgetExceededError before the network call ever happens.

No surprise invoice; a runaway loop hits a hard ceiling instead of the provider's billing meter.

Circuit breaker + retry + failover

Per-model breaker (closed -> open -> half-open, threshold 3, cooldown 30s), full-jitter exponential backoff (max 3, base 250ms) honouring Retry-After, and automatic failover to the next eligible model.

A single upstream outage produces zero user-facing failures as long as one model in the pool is healthy.

EWMA health

Each model carries an exponentially-weighted health score that decays on failure and recovers on success, folding into the routing decision alongside cost and latency.

The pool quietly favours models that are actually working right now, with no curated whitelist to maintain.

Telemetry

Thirteen in-process events (request.start/success/fail, model.selected, stream.start/finish, circuit.open/half-open/closed, model.degraded, budget.exhausted, score.recorded, all.exhausted), zero OpenTelemetry dependency.

Drop events into the logger or metrics pipeline you already run; no new agent, no new vendor, and no event ever leaks a key or a prompt.

Six entry points, multi-runtime

Core, /providers, /web, /edge, /ai-sdk, and /cli — each tree-shakeable, with no node:* in the universal core. Runs on Node, Cloudflare Workers, Vercel Edge, Deno, Bun, and the browser.

Write once, ship to every runtime your stack touches without forking the integration.

Zero runtime deps

The core has no required runtime dependencies and weighs 5.6 KB brotli; providers 4.1 KB; the AI SDK adapter 1.2 KB. Every provider calls REST directly via Web Fetch.

A tiny, auditable dependency surface you can read end to end, with nothing to keep patched but modelchain itself.

SLSA provenance

Every published version is signed with npm publish --provenance through GitHub Actions OIDC. Lockfile committed; attw clean for all six entry points across all four resolution modes.

Verify in one command that the tarball you installed was built from the source commit you trust.

Routing strategies

Seven built-in strategies, all opt-in by name.

Pass strategy: '<name>' in the config, or implement RoutingStrategy with a select(candidates, request) method for custom logic.

Strategy	How it picks	When to use it
`cost-then-quality`	Cheapest model that meets a configurable quality floor; cold-start models are explored first to gather data.	Default. You want the cheapest answer that is still good enough.
`cost-first`	Cheapest model by averaged input/output price, ignoring quality.	Cost is the only thing that matters and any model in the pool is acceptable.
`quality-first`	Highest observed quality, falling back to health score on cold start.	Quality matters more than cost; spend to get the best answer.
`latency-first`	Lowest observed average latency; unseen models are tried first.	Interactive paths where time-to-first-byte dominates.
`weighted`	Random pick proportional to each model's declared `weight` (negative weights clamp to 0).	Canary rollouts; send a fixed share of traffic to a new model.
`round-robin`	Cyclic across eligible models in registration order.	Even distribution across models with similar quotas.
`sequential-fallback`	Always the first eligible model in declaration order; the rest are pure fallbacks.	A primary model with a deterministic fallback chain behind it.

The decisions in tests/golden/routing.test.ts lock every strategy's behaviour as part of the v1.0.0 SemVer contract; a change to any of them is a major version bump.

Scorers

Six ways to measure a response, then feed it back.

Scorers run after every successful response, normalise their judgment to [0, 1], feed into per-model quality state, and emit a score.recorded telemetry event. Pass them by name via scoring.built, or plug your own through scoring.custom.

Built-in	What it measures	Use it when
`latency`	Time-to-first-byte against a target ms.	Latency is a quality dimension you want routing to react to.
`token-budget`	Efficient use of `maxTokens`; flags truncation.	You want to penalise models that hit the token ceiling and cut answers short.
`length-bound`	Character-length sanity check on the output.	Responses should fall inside a known size band.
`regex-match`	Structural check via a supplied regular expression.	The answer must match a known shape or contain a required token.
`exact-match`	Comparison against `request.metadata.expected`.	You have a reference answer and want exactness scored.
`schema-valid`	Top-level JSON schema: required keys plus primitive types.	You expect structured JSON and want malformed output to lower the score.

A custom ScoringStrategy (an LLM-as-judge, an external eval service) plugs in through scoring.custom and participates in the same feedback loop.

Streaming

Tokens as they arrive, normalised across providers.

Each provider emits its own wire format — OpenAI SSE deltas, Anthropic content_block_delta events, Gemini streamGenerateContent JSON chunks. modelchain normalises all three into a single CompletionChunk discriminated union over the Web Streams API. The stream always ends with exactly one finish chunk carrying usage and finishReason, so consumers can rely on it for budget reconciliation.

for await (const chunk of router.stream({ prompt: 'Tell me a story.' })) {
              if (chunk.type === 'text-delta')
              process.stdout.write(chunk.delta);
              if (chunk.type === 'tool-call-delta')
              console.log('\nTool call:', chunk.toolCall);
              if (chunk.type === 'finish') console.log('\nDone:', chunk.finishReason, chunk.usage);
              }

An AbortSignal passed on the request halts an in-flight stream within one chunk of the next provider yield; it never deadlocks.

Tool calling

One tool shape across every provider.

Declare a tool once as a normalised ToolDefinition. modelchain translates it to each provider's native shape (OpenAI tools with function entries, Anthropic tools with input_schema, Gemini functionDeclarations) and parses the response (tool_calls, tool_use blocks, functionCall parts) back into a normalised ToolCall[]. Available on both complete() and stream().

const result = await router.complete({
              prompt: 'What is the weather in Tokyo?',
              tools: [
              {
              name: 'get_weather',
              description: 'Get the current weather in a city.',
              parameters: {
              type: 'object',
              properties: { city: { type: 'string', description: 'City name' } },
              required: ['city'],
              },
              },
              ],
              });
              // result.toolCalls -> [{ id: 'call_1', name: 'get_weather', arguments: { city:
                'Tokyo' } }]
              // result.finishReason -> 'tool-calls'

Vercel AI SDK

Drop modelchain into any Vercel AI SDK app.

toVercelAILanguageModel(router) implements the LanguageModelV2 contract structurally — no compile-time dependency on @ai-sdk/provider. Its doGenerate and doStream emit the full V2 content and stream lifecycle, so it works with generateText, streamText, and tool-using flows.

import { generateText, streamText } from 'ai';
              import { toVercelAILanguageModel } from
              '@takk/modelchain/ai-sdk';
              import { createModelchain } from '@takk/modelchain';
              import { openaiModel } from '@takk/modelchain/providers';

              const router = createModelchain({
              models: [
              openaiModel('gpt-4o-mini', {
              cost: { costPer1kInput: 0.00015, costPer1kOutput: 0.00060 },
              keys: process.env.OPENAI_API_KEY ?? '',
              }),
              ],
              });

              const { text } = await generateText({
              model: toVercelAILanguageModel(router),
              prompt: 'Hello.',
              });

Providers

Four factories cover everything.

Import from @takk/modelchain/providers. Every factory calls the provider's REST API directly via Web Fetch — no SDK runtime dependency.

Provider	Factory	Use it when
OpenAI	`openaiModel(...)`	Any OpenAI chat-completions endpoint. Also Groq, Together, DeepSeek, OpenRouter, Fireworks, and Mistral La Plateforme — same shape, pass `baseUrl`.
Anthropic	`anthropicModel(...)`	The Claude family via the Messages API. Treats `529 Overloaded` as transient.
Gemini	`geminiModel(...)`	Google's Generative Language REST API.
HTTP (generic)	`httpModel(...)`	Any other JSON endpoint via `buildRequest` + `parseResponse` (and an optional `parseStream`) callbacks.

CLI proxy mode

Same code path, no library import required.

When you cannot or do not want to embed modelchain in your code, run it as a local HTTP proxy. The proxy loads a config file, dispatches through the same router the library exposes, and returns the upstream response unchanged. It also ships a benchmark subcommand.

Start the proxy

# author modelchain.config.js, then
              npx @takk/modelchain start --port 8788

Send requests through the proxy

curl -X POST http://localhost:8788/complete \
              -H 'Content-Type: application/json' \
              -d '{"prompt":"Hi in 5 words."}'

              curl -X POST http://localhost:8788/stream \
              -H 'Content-Type: application/json' \
              -d '{"prompt":"Tell a 50-word story."}'

Inspect live router state, or benchmark a pool

# from another terminal while the proxy is running
              curl http://localhost:8788/__modelchain_inspect | jq

              # benchmark the pool over N requests
              npx @takk/modelchain bench --requests 10
              --prompt 'Summarise: AI is text in, text out.'

Observability

Thirteen events. No OpenTelemetry runtime dependency.

Subscribe with router.on((event) => ...); the returned function unsubscribes. Every event is a typed object and the union TelemetryEvent is exported, so you branch on event.type with full narrowing in TypeScript. No event ever contains an API key, a prompt, or a response body.

router.on((event) => {
              switch (event.type) {
              case 'model.selected': log.info({ modelId: event.modelId, reason: event.reason }); break;
              case 'request.success': metrics.histogram('modelchain.latency', event.latencyMs);
              break;
              case 'request.fail': log.warn({ modelId: event.modelId, classification: event.classification }); break;
              case 'stream.start': log.info({ modelId: event.modelId }); break;
              case 'stream.finish': metrics.add('modelchain.cost', event.costUsd); break;
              case 'circuit.open': alerts.notify(`Model ${event.modelId} circuit OPEN`);
              break;
              case 'budget.exhausted': alerts.notify(`Budget exhausted: ${event.scope}`); break;
              case 'score.recorded': dashboards.record(event.modelId, event.scorer, event.score); break;
              case 'all.exhausted': alerts.notify(`Pool exhausted: ${event.reason}`); break;
              }
              });

Router snapshot on demand

const snapshot = router.inspect();
              // {
                // strategy: 'cost-then-quality',
                // totalRequests: 1284,
                // totalStreams: 96,
                // totalFailures: 3,
                // totalCostUsd: 0.428117,
                // budget: { perRequestUsd: 0.02, dailyUsd: 5, spentTodayUsd: 0.43, remainingTodayUsd: 4.57 },
                // models: [
                // { id: 'gpt-4o-mini', providerName: 'http',
                // circuitState: 'closed', healthScore: 0.98,
                // inFlight: 0, successCount: 642, failureCount: 1,
                // consecutiveFailures: 0, cooldownUntil: 0,
                // lastUsedAt: 1748449183210, avgLatencyMs: 612,
                // avgQualityScore: 0.86, totalCostUsd: 0.21 },
                // ...
                // ]
                // }

The snapshot carries only aggregated operational metadata. The raw API key value never appears in a snapshot, never appears in a telemetry event, and never reaches the optional file state backend.

Compare

modelchain vs the alternatives.

The other tools in this space solve adjacent problems well. The contrast clarifies where modelchain sits.

Capability	modelchain	LangChain	LiteLLM	Portkey	Hand-rolled
Distribution	npm library	npm / pip framework	pip + sidecar	SaaS / hosted	your repo
Runtime hop	in-process	in-process	network hop	network hop	in-process
Surface area	one router	large framework	gateway	gateway	varies
Runtime dependencies	0	many	many	n/a (hosted)	varies
Measured routing feedback	yes, scorers	no	partial	partial	rarely
Embeddable in Edge runtimes	yes, no node:*	partial	no	via fetch only	varies
Native streaming + tool calling	yes, normalised	yes	yes	yes	rarely
Vercel AI SDK adapter	yes	no	no	via SDK	no
SaaS lock-in	none	none	none	yes	none
SLSA provenance on releases	yes	partial	partial	n/a	no
License	Apache-2.0	MIT	MIT	commercial	your call

The honest summary: pick LangChain if you want a full framework with chains, agents, and loaders. Pick LiteLLM or Portkey if you genuinely want a gateway you operate or buy. Pick modelchain if you want one measurable router you embed in any JavaScript runtime over a service you run.

Transient vs terminal

The exact failure classes modelchain fails over on.

The classifier is deterministic and documented. Transient classes are retried with backoff and failed over; terminal classes are not retried — modelchain fails over immediately or throws.

Signal	Class	modelchain response
HTTP 429 Too Many Requests	`rate-limited` (honours `Retry-After`)	retry + fail over
HTTP 500 / 502 / 503 / 504 and any 5xx	`server-error` (includes Anthropic 529)	retry + fail over
HTTP 408 Request Timeout / 425 Too Early	`timeout`	retry + fail over
Connection failure, no HTTP status (DNS, reset, abort, `fetch failed`)	`network`	retry + fail over
HTTP 401 / 403	`unauthorized` (terminal)	fail over immediately, no retry
HTTP 400 and other 4xx	`bad-request` (terminal)	fail over immediately, no retry
Per-model circuit tripped (3 consecutive failures)	circuit breaker	skip model for the 30s cooldown
A per-request / per-task / daily ceiling would be breached	budget guard (pre-flight)	throw `BudgetExceededError` before the network call
Every eligible model failed terminally or is circuit-open	orchestrator	throw `AllModelsExhaustedError` with the router snapshot

Every error extends ModelchainError (carrying a stable code); ProviderError additionally carries classification and an optional status. All carry native Error.cause.

Quality and validation

The receipts behind v1.0.0.

Tests & coverage

182 tests across 12 suites, all passing under Vitest 4. Coverage: 76.04% lines, 75.37% statements, 79.90% functions, 59.77% branches. Run pnpm test on a fresh clone to reproduce.

Golden routing contract

The golden suite (tests/golden/routing.test.ts) locks every strategy's decision as part of the SemVer contract. Any change to a routing decision affecting an unchanged consumer is a major version bump.

Type safety

TypeScript 6 in maximum strict mode (strict, noUncheckedIndexedAccess, exactOptionalPropertyTypes, useUnknownInCatchVariables). Zero errors under tsc --noEmit.

Packaging

Biome 2 lint clean. publint clean. attw clean for every entry point: all six entries across all four resolution modes (node10, node16 CJS, node16 ESM, bundler).

Bundle weight

Brotli-compressed: core 5.6 KB, providers 4.1 KB, Vercel AI SDK adapter 1.2 KB. Zero required runtime dependencies. Engines: Node >= 20, CI on Node 20 / 22 / 24.

Live validation & provenance

Validated front to back against the real Gemini REST API: routing, HTTP, normalisation, scoring, and error classification run end to end. Every release is published with --provenance (SLSA attestation by GitHub Actions OIDC).

Roadmap

What is shipped, what is next, what is later.

Now (1.0)

Shipped in v1.0.0

Seven strategies, six scorers, four providers
Native streaming + native tool calling
Vercel AI SDK adapter
Budget guard, circuit breaker, retry, failover
Memory and file state backends
Thirteen telemetry events
CLI proxy + inspect + bench
Six entry points, dual ESM + CJS, SLSA provenance

Next (1.1)

Targeted for 1.1

Redis / KV state backends
First-class OpenTelemetry exporter
Dedicated per-provider adapters (Groq, Together, DeepSeek, OpenRouter, Mistral, Fireworks)
@takk/modelchain-pricing registry

v1.0.1 first: coverage push to 85% plus a live happy-path test.

Later (1.2)

On the horizon

Image and audio multimodal inputs
Vision-language model adapters

FAQ

Common questions.

Is modelchain production-ready at 1.0.0?

Yes. 182 tests across 12 suites pass under Vitest 4 with 76% line coverage; TypeScript 6 maximum strict mode is clean; Biome 2 lint is clean; publint is clean; attw is clean for all six entry points across all four resolution modes. Every published release carries SLSA provenance produced by GitHub Actions. The library has been validated front to back against the real Gemini REST API: the routing, HTTP, normalisation, scoring, and error-classification pipeline runs end to end.

Why not just use LangChain?

LangChain is a broad framework; modelchain is one thing — a measurable router — done excellently. There is no chain abstraction, no agent runtime, and no document loaders to learn. You call createModelchain({ models }).complete({ prompt }) and you are done. The mental model is Prisma, not LangChain.

Why not just run a gateway like LiteLLM or Portkey?

Those are services you run as a sidecar or buy as SaaS. modelchain is a library you pnpm add — embeddable in any Node, Bun, Deno, Edge, or browser process, with no extra container, no extra hop, and no vendor lock-in. When you do want a proxy, npx @takk/modelchain start runs the same code path inside a local HTTP server.

Does this work in Cloudflare Workers, Vercel Edge, Bun, Deno, or the browser?

Yes. The core, the /edge entry, and the /web entry use only Web Fetch and Web Streams — no node:* built-ins. Provider factories call the REST endpoints directly, so they run anywhere fetch exists. In the browser, pass a key resolver that fetches a short-lived token from your server instead of embedding a raw key.

Does modelchain require runtime dependencies?

No. The core has zero runtime dependencies. Provider SDKs (openai, @anthropic-ai/sdk, @google/genai) and the Vercel adapter peers (ai, @ai-sdk/provider) are optional peer dependencies, present only for richer types in your editor. modelchain calls every provider's public REST API directly via Web Fetch and imports no SDK at runtime.

How does modelchain handle my API keys?

Keys are produced by a key resolver you supply (a string, or a sync/async function) and are sent only to the upstream you configured. No telemetry event and no router.inspect() snapshot ever contains an API key, a prompt, or a response body. The optional FileStateBackend persists only aggregated per-model metadata and cost counters; the raw key value never reaches disk.

What happens when every model in the pool is exhausted at the same time?

modelchain throws an AllModelsExhaustedError with the full router snapshot attached (per-model counts, circuit states, cooldown timestamps). Catch it at the boundary of your application and decide what to surface to the user. The library never silently drops the request.

How does it differ from a static router?

A static router applies fixed rules that decay as providers ship new models and as observed latency and quality drift. modelchain measures every response with pluggable scorers and feeds the score back into the next routing decision, so the pool adapts on its own.

Where does the router state live?

In-process memory by default, discarded on router.close(). For persistence, pass a StateBackend; the shipped Node-only FileStateBackend writes an aggregated snapshot (per-model metadata plus spentTodayUsd and the UTC day) to disk. Redis and KV backends are on the 1.1 roadmap.

Does it work with the Vercel AI SDK?

Yes. toVercelAILanguageModel(router) returns a LanguageModelV2-compatible object usable with generateText, streamText, and tool calling. The adapter is typed structurally, with no compile-time dependency on @ai-sdk/provider.

Can I plug in my own routing strategy or scorer?

Yes. Implement RoutingStrategy with a select(candidates, request) method and pass the instance as strategy, or implement ScoringStrategy and pass it through scoring.custom. An LLM-as-judge or an external eval service is a few lines of code.

What is the policy on breaking changes?

Strict SemVer 2.0.0, starting from 1.0.0. The binding stability surface is documented in SPEC.md section 5; the golden routing suite is the SemVer guard for decision semantics. Major bumps require a deprecation cycle; security fixes follow the disclosure flow in SECURITY.md.

Author

Built and maintained by David C Cavalcante.

David C Cavalcante

Founder, Takk Innovate Studio

Product Engineer, AI Engineer, ML Engineer, LLM Engineer, LLM Architect, AI Researcher. Builder of the @takk family of NPM packages for AI-native infrastructure.

modelchain is part of a planned portfolio of NPM libraries targeting AI-native infrastructure for 2026 to 2030. Adjacent research by the author covers systemic intelligence frameworks (MAIC, HIM, NHE) published independently of this codebase, with research notes on PhilPapers and PhilArchive linked from the repository README.

If modelchain saved you a model-selection rewrite this quarter, the most useful thing you can do is open a GitHub issue when you find an edge case the test suite missed. The runbook for releases, the threat model, and the contributor agreement all live in the repository.

Read the source Get on npm