Declarative pool + 7 strategies
Declare models once; route by cost-then-quality (default), cost-first,
quality-first, latency-first, weighted, round-robin, or
sequential-fallback. Custom logic via the RoutingStrategy interface.
Stop hand-picking a model per call site; change routing policy by editing one string, not
every request.
Pluggable scorers
Six built-in scorers (latency, token-budget, length-bound,
regex-match, exact-match, schema-valid) normalise every response to
[0, 1] and feed the next routing decision. Plug LLM-as-judge via ScoringStrategy.
Routing reflects what actually came back, not a static guess that decays as providers
ship new models.
Native streaming
router.stream({...}) returns an AsyncIterable<CompletionChunk> normalised
across OpenAI deltas, Anthropic content_block_delta, and Gemini
streamGenerateContent, over Web Streams. Always ends with exactly one finish
chunk.
Write the streaming loop once; the same code consumes every provider with a reliable
terminal chunk for cost reconciliation.
Native tool calling
Declare tools: ToolDefinition[] once; modelchain translates to OpenAI tools,
Anthropic input_schema, and Gemini functionDeclarations, then parses the result
back into a normalised ToolCall[]. Works on complete() and stream().
One tool schema, every provider; no per-provider function-calling shim to maintain.
Vercel AI SDK adapter
toVercelAILanguageModel(router) returns a LanguageModelV2-compatible object,
typed structurally with no compile-time dependency on @ai-sdk/provider. Works with
generateText, streamText, and tool calling.
Drop measured routing into an existing Vercel AI SDK app without rewriting a single call
site.
Hard budget guard
Per-request, per-task, and daily (UTC) ceilings. A breach throws BudgetExceededError
before the network call ever happens.
No surprise invoice; a runaway loop hits a hard ceiling instead of the provider's billing
meter.
Circuit breaker + retry + failover
Per-model breaker (closed -> open -> half-open, threshold 3, cooldown 30s),
full-jitter exponential backoff (max 3, base 250ms) honouring Retry-After, and automatic
failover to the next eligible model.
A single upstream outage produces zero user-facing failures as long as one model in the
pool is healthy.
EWMA health
Each model carries an exponentially-weighted health score that decays on failure and recovers on success,
folding into the routing decision alongside cost and latency.
The pool quietly favours models that are actually working right now, with no curated
whitelist to maintain.
Telemetry
Thirteen in-process events (request.start/success/fail, model.selected,
stream.start/finish, circuit.open/half-open/closed, model.degraded,
budget.exhausted, score.recorded, all.exhausted), zero
OpenTelemetry dependency.
Drop events into the logger or metrics pipeline you already run; no new agent, no new
vendor, and no event ever leaks a key or a prompt.
Six entry points, multi-runtime
Core, /providers, /web, /edge, /ai-sdk, and
/cli — each tree-shakeable, with no node:* in the universal core. Runs on
Node, Cloudflare Workers, Vercel Edge, Deno, Bun, and the browser.
Write once, ship to every runtime your stack touches without forking the integration.
Zero runtime deps
The core has no required runtime dependencies and weighs 5.6 KB brotli; providers 4.1 KB; the AI SDK
adapter 1.2 KB. Every provider calls REST directly via Web Fetch.
A tiny, auditable dependency surface you can read end to end, with nothing to keep patched
but modelchain itself.
SLSA provenance
Every published version is signed with npm publish --provenance through GitHub Actions OIDC.
Lockfile committed; attw clean for all six entry points across all four resolution modes.
Verify in one command that the tarball you installed was built from the source commit you
trust.