Why not just use LiteLLM, Bifrost, or Portkey?

Those are gateways. They require a sidecar service and add a network hop. keymesh is a Node.js library you pnpm add, running in-process with zero runtime dependencies.

Does keymesh require runtime dependencies?

No. The core package and the generic HTTP adapter use only the runtime's built-in fetch and Node standard library. Provider SDKs are optional peer dependencies.

How does keymesh handle credential material?

API keys are held in process memory and never transmitted anywhere except the upstream you configured. Telemetry events identify keys only by the first 8 hex characters of their SHA-256 hash. The optional file backend persists only the hashed id plus operational counters.

Which HTTP status codes trigger automatic failover?

408, 425, 429, 500, 502, 503, 504, plus Anthropic-specific 529. Network errors include ECONNRESET, ETIMEDOUT, ECONNREFUSED, EAI_AGAIN, fetch failed. A 401 triggers a 24-hour cooldown.

@takk/keymesh - v1.0.0 - Apache-2.0

Stop losing API requests to rate limits.

Name: @takk/keymesh
Availability: InStock
Author: David C Cavalcante

Your application calls OpenAI, Anthropic, or Gemini and one key hits 429. The request fails. The retry hits the same key. The user waits. Your on-call gets paged. keymesh rotates a pool of keys, opens a circuit on the failing one, and retries on the next eligible key with backoff. The user never sees the 429.

Install in 30 seconds View on GitHub

145tests passing

93%line coverage

0runtime deps

SLSAprovenance

What it is

One library, two answers.

In plain English

You give keymesh a list of API keys for the same service (OpenAI, Anthropic, Gemini, or any HTTP endpoint). It hands each request to a key from the pool. When a key fails or rate-limits, keymesh moves the next request to another key. After a few consecutive failures, it stops using that key for a while. When the cooldown expires, it tries the key again. Your code does not change shape: you keep calling client.chat.completions.create() exactly as before.

Technically

A generic orchestration layer that wraps any provider client through a deep proxy mirroring the wrapped SDK shape, attaches eight telemetry events, and runs a per-request loop of pick, dispatch, classify error, retry or rotate. Pluggable selection strategies, per-key circuit breaker with closed -> open -> half-open state machine, AWS full-jitter exponential backoff with total time budget, respect for upstream Retry-After. State persistence is pluggable too.

Before and after

The same incident, two timelines.

Take a real production scene: your job sends 240 requests per minute to OpenAI, you hold three keys provisioned across three projects, and one of them hits its per-minute quota.

Without keymesh

Manual rotation, fire-fighting at 3 AM

09:14:01 Key #1 returns 429 Too Many Requests.
09:14:01 Your retry hits Key #1 again with the same payload, with exponential backoff.
09:14:03 Second 429. Third 429. Fourth.
09:14:08 User-facing request fails with Request timeout.
09:14:09 Alerting fires. On-call investigates.
09:23:00 Engineer manually disables Key #1, restarts the job with Key #2.
09:31:00 Key #1's quota window resets, but nobody remembers to re-enable it.

With keymesh

Silent rotation, circuit recovers itself

09:14:01 Key #1 returns 429. keymesh classifies it as transient.
09:14:01 The same request retries on Key #2. Success in under 200ms.
09:14:01 key.rotated event fires. Your logger records from=#1 to=#2.
09:14:04 After three consecutive failures on Key #1, the per-key circuit opens for 30 seconds.
09:14:34 Cooldown elapses. Next eligible pick lands on Key #1 in half-open.
09:14:34 The trial request succeeds. circuit.closed fires. Key #1 is back in the pool.
The user never saw a delay. The on-call was never paged.

The "With keymesh" timeline is the literal event trace from the live integration test in the keymesh repository, run against the real Google Gemini API with seven keys.

Install

Five minutes from `install` to first rotation.

1. Add the package

pnpm add @takk/keymesh

npm install @takk/keymesh

yarn add @takk/keymesh

bun add @takk/keymesh

2. Install the SDK you actually use

Adapters use optional peer dependencies. Skip this if you only need the generic HTTP adapter.

pnpm add openai # for
                @takk/keymesh/openai
              pnpm add @anthropic-ai/sdk # for
                @takk/keymesh/anthropic
              pnpm add @google/genai # for
                @takk/keymesh/gemini

3. Quickstart with the OpenAI SDK

import { createKeymesh } from '@takk/keymesh';
              import { openaiAdapter } from '@takk/keymesh/openai';

              const client = createKeymesh({
              provider: openaiAdapter,
              keys: process.env.OPENAI_API_KEYS?.split(',')
              ?? [],
              strategy: 'least-used',
              circuitBreaker: { threshold: 3, cooldownMs: 30_000
              },
              retry: { max: 5, baseMs: 200, jitter: true },
              telemetry: { enabled: true },
              });

              // Drop-in: identical to the underlying SDK shape.
              const response = await
              client.chat.completions.create({
              model: 'gpt-4.1',
              messages: [{ role: 'user', content: 'Hello.'
              }],
              });

4. Quickstart with any HTTP endpoint

import { createKeymesh } from '@takk/keymesh';
              import { httpAdapter } from '@takk/keymesh/http';

              const tavily = createKeymesh({
              provider: httpAdapter({
              baseUrl: 'https://api.tavily.com',
              authHeader: (key) => ({ Authorization: `Bearer ${key}` }),
              }),
              keys: process.env.TAVILY_API_KEYS?.split(',')
              ?? [],
              strategy: 'round-robin',
              });

              const result = await tavily.post('/search', { query: 'AI infrastructure 2026' });

Features

Nine capabilities, every one tied to a measurable outcome.

Key rotation

Pluggable selection across a pool, with four built-in strategies and a typed SelectorStrategy interface for custom logic.

Spread the same workload across N keys and lift the effective rate-limit ceiling by roughly the size of the pool.

Automatic failover

Transient errors (408, 425, 429, 500, 502, 503, 504, plus Anthropic 529 and network resets) trigger an instant rotation to the next eligible key.

User-facing failures from a single upstream outage drop to zero as long as one key in the pool is healthy.

Per-key circuit breaker

Three-state machine (closed, open, half-open) with configurable threshold and cooldown; opens after consecutive failures, recovers on first half-open success.

A failing key stops eating retry budget after a handful of attempts and rejoins the pool on its own once it recovers.

Smart retry

AWS full-jitter exponential backoff with a total time budget, honouring upstream Retry-After in both numeric-seconds and HTTP-date form.

Predictable tail latency under load instead of synchronised retry storms hammering the same upstream window.

Health scoring

Each key carries a 0-to-100 health score that decays on failure with a configurable half-life and recovers on success.

The pool quietly favours keys that are actually working, without you maintaining a curated whitelist.

Telemetry

Eight in-process events (request.start/success/fail, key.rotated, circuit.open/closed/half-open, all.exhausted), zero OpenTelemetry dependency.

Drop events into the logger or metrics pipeline you already run; no new agent, no new vendor.

Auth-failure cooldown

A key that returns 401 Unauthorized is cooled for 24 hours on the assumption that the credential itself is invalid.

A revoked or rotated key never burns retry budget; you fix it on your schedule, not the request loop's.

Pluggable state

Memory backend by default; opt-in file backend persists only the hashed id and operational counters. Raw key value never reaches disk.

Counter durability across restarts without trading away credential hygiene.

SLSA provenance

Every published version signed with npm publish --provenance through GitHub Actions OIDC. Lockfile committed; supply-chain policy enforces minimum release age on new dependency versions.

Verify in one command that the tarball you installed was built from the source commit you trust.

Selection strategies

Four built-in strategies, all opt-in by name.

Pass strategy: '<name>' in the config, or implement SelectorStrategy for custom logic.

Strategy	How it picks	When to use it
`round-robin`	Cycles through eligible keys in registration order.	Default baseline. Even distribution across keys with similar quotas.
`weighted`	Pick probability proportional to the `weight` declared on each `KeyConfig`.	Some keys have higher quotas than others (paid tier alongside free tier).
`least-used`	Lowest in-flight count; ties broken by total usage then last-used timestamp.	Keys share a quota window. Maximises throughput without overloading any one key.
`sequential-then-rotate`	Always tries the first eligible key in registration order; rotates only when ineligible.	One paid key plus several free fallbacks. Burn the cheap one until it fails.

cost-aware strategy is planned for 1.1; it requires per-key cost-tier metadata.

Adapters

Drop-in for the SDK you already use.

Adapter	Subpath export	Peer dependency	Use it when
OpenAI	`@takk/keymesh/openai`	`openai >= 4`	You import `openai` directly and want rotation without touching call sites.
Anthropic	`@takk/keymesh/anthropic`	`@anthropic-ai/sdk >= 0.30`	Same, for the official Anthropic SDK. Recognises HTTP 529 Overloaded as transient.
Gemini	`@takk/keymesh/gemini`	`@google/genai >= 0.7`	Same, for Google's generative AI SDK.
HTTP (generic)	`@takk/keymesh/http`	none	Any REST endpoint (Tavily, Serper, Stripe, GitHub API, internal microservice).

CLI proxy mode

Same code path, no library import required.

When you cannot or do not want to embed keymesh in your code, run it as a local HTTP proxy. The proxy strips inbound auth headers, dispatches through the same orchestrator the library exposes, and returns the upstream response unchanged.

Start the proxy

# With keys in the environment
              OPENAI_API_KEYS=key1,key2,key3 npx @takk/keymesh start \
              --port 8787 \
              --adapter openai \
              --strategy round-robin

Send requests through the proxy

curl http://localhost:8787/v1/chat/completions \
              -H 'Content-Type: application/json' \
              -d '{"model":"gpt-4.1","messages":[{"role":"user","content":"Hi"}]}'

Inspect live pool state

# from another terminal while the proxy is running
              curl http://localhost:8787/__keymesh_inspect |
              jq

              # or via the CLI, against a persisted state file
              npx @takk/keymesh inspect --state-file .keymesh-state.jsonl

Observability

Eight events. No OpenTelemetry runtime dependency.

Subscribe directly on the wrapped client. Every event is a typed object; the union TelemetryEvent is exported so you can branch on event.type with full narrowing in TypeScript.

client.on('request.start', (e) =>
              log.info({ keyId: e.keyId, path: e.path }));
              client.on('request.success', (e) =>
              metrics.histogram('keymesh.latency',
              e.elapsedMs));
              client.on('request.fail', (e) => log.warn({ keyId: e.keyId, status: e.status, error: e.error }));
              client.on('key.rotated', (e) => log.info({ from: e.from, to: e.to, reason: e.reason }));
              client.on('circuit.open', (e) =>
              alerts.notify(`Key ${e.keyId} circuit OPEN until
                ${new Date(e.cooldownUntil).toISOString()}`));
              client.on('circuit.closed', (e) =>
              log.info({ keyId: e.keyId }));
              client.on('circuit.half-open', (e) =>
              log.info({ keyId: e.keyId }));
              client.on('all.exhausted', (e) =>
              alerts.notify(`Pool exhausted:
                ${e.reason}`));

Pool snapshot on demand

const snapshot = client.inspect();
              // {
                // strategy: 'least-used',
                // totalRequests: 1284,
                // totalFailures: 3,
                // keys: [
                // { id: 'a1b2c3d4', label: 'key-a1b2c3d4',
                // circuitState: 'closed', healthScore: 100,
                // inFlight: 0, successCount: 642, failureCount: 1,
                // consecutiveFailures: 0, cooldownUntil: 0,
                // lastUsedAt: 1748449183210 },
                // ...
                // ]
                // }

The id is the first 8 hex characters of the key's SHA-256 hash. The raw key value never appears in a snapshot, never appears in a telemetry event, never reaches the optional file state backend.

Compare

keymesh vs the alternatives.

The other tools in this space solve adjacent problems well. The contrast clarifies where keymesh sits.

Capability	keymesh	LiteLLM	Bifrost	Portkey	Hand-rolled
Distribution	npm library	pip + sidecar	go binary	SaaS / hosted	your repo
Runtime hop	in-process	network hop	network hop	network hop	in-process
Runtime dependencies	0	many	many	n/a (hosted)	varies
TypeScript-first API	yes, strict	partial	no	client typings	varies
Embeddable in Edge runtimes	core + HTTP yes	no	no	via fetch only	varies
Per-key circuit breaker	yes	partial	yes	yes	rarely
SaaS lock-in	none	none	none	yes	none
SLSA provenance on releases	yes	partial	partial	n/a	no
License	Apache-2.0	MIT	Apache-2.0	commercial	your call

The honest summary: pick LiteLLM if your stack is Python and a sidecar is fine. Pick Bifrost or Portkey if you genuinely want a gateway. Pick keymesh if you write Node.js or TypeScript and prefer a library you embed over a service you operate.

What counts as transient

The exact failure classes keymesh rotates on.

The classifier is deterministic and documented; if a code or condition is not listed here, keymesh treats it as a programmer error or upstream-permanent failure and propagates it untouched.

Signal	Where	keymesh response
HTTP 408 Request Timeout	any adapter	retry + rotate
HTTP 425 Too Early	any adapter	retry + rotate
HTTP 429 Too Many Requests	any adapter, honours `Retry-After`	retry + rotate
HTTP 500 / 502 / 503 / 504	any adapter	retry + rotate
HTTP 529 Overloaded	Anthropic-specific	retry + rotate
HTTP 401 Unauthorized	any adapter	cool the key for 24h
`ECONNRESET` / `ETIMEDOUT` / `ECONNREFUSED` / `EAI_AGAIN` / `fetch failed`	HTTP adapter	retry + rotate
Every other 4xx	any adapter	propagate to caller untouched
Pool exhausted (every eligible key tried)	orchestrator	throw `AllKeysExhaustedError`
Retry budget exceeded	orchestrator	throw `TotalBudgetExceededError`

Quality and validation

The receipts behind v1.0.0.

Tests & coverage

145 tests passing across 21 suites under Vitest 4. Coverage: 93% lines, 91.6% statements, 89.1% functions, 79.4% branches. Run pnpm test on a fresh clone to reproduce.

Type safety

TypeScript 6 in maximum strict mode (exactOptionalPropertyTypes, useUnknownInCatchVariables, noUncheckedIndexedAccess, noImplicitOverride, noImplicitReturns). Zero errors under tsc --noEmit.

Lint & format

Biome 2 clean across src, tests, and examples. publint clean. Exports map is dual ESM + CJS with separate .d.ts and .d.cts per subpath.

Live validation

Validated against the real Gemini API: 429 quota exhaustion triggers rotation; the per-key circuit completes the full closed -> open -> half-open -> closed lifecycle on real upstream traffic.

CLI smoke

Functional CLI smoke test that spawns the published binary against a fake upstream and asserts end-to-end rotation, headers stripped, response body unchanged.

Supply chain

Committed pnpm lockfile, supply-chain policy with minimum release age, SLSA provenance attestation on every published version. Verify with npm view @takk/keymesh@1.0.0 --json | jq .dist.attestations.

Roadmap

What is shipped, what is next, what is later.

Now (1.0)

Shipped in v1.0.0

Four strategies, four adapters
Circuit breaker + smart retry
Memory and file state backends
Eight telemetry events
CLI proxy mode + inspect command
Dual ESM + CJS, full TypeScript types
SLSA provenance on every release

Next (1.1)

Targeted for 1.1

cost-aware selection strategy
Redis, SQLite, and Postgres state backends
Streaming-response failover for SDK adapters
Encrypted at-rest option for the file backend
Per-adapter Edge-runtime compatibility audit

Later

On the horizon

Federated pool across processes via shared backend
Adaptive strategy learned from telemetry
First-class OpenTelemetry exporter (opt-in)
Browser-targeted bundle for IndexedDB state

FAQ

Common questions.

Is keymesh production-ready at 1.0.0?

Yes. 145 tests across 21 suites pass under Vitest 4 with 93% line coverage; TypeScript 6 maximum strict mode is clean; Biome 2 lint is clean; publint is clean. Every published release carries SLSA provenance produced by GitHub Actions. The library has been validated live against the Gemini API: real 429 quota exhaustion triggers rotation, and the per-key circuit breaker completes the full closed -> open -> half-open -> closed recovery lifecycle on real upstream traffic.

Why not just use LiteLLM?

LiteLLM is excellent but Python-first, gateway-style, and requires you to run a sidecar service. keymesh is TypeScript-native, embeddable in any Node, Bun, or Deno process, and runs as either a library or a CLI proxy from the same code path.

Why not just run a proxy gateway like Bifrost or Portkey?

Same answer in different words. Those are services; keymesh is a library you pnpm add. No extra container, no extra hop, no SaaS lock-in. When you do want a proxy, npx @takk/keymesh start runs the same code path inside a local HTTP server.

Does this work in Cloudflare Workers, Vercel Edge, Bun, or Deno?

The core package and the http adapter run on any modern JavaScript runtime with fetch. The OpenAI, Anthropic, and Gemini adapters inherit the runtime compatibility of their respective official SDKs. Edge-optimised adapters are tracked for a later release.

How does keymesh handle my API keys?

Keys are held in process memory for the lifetime of the client and never transmitted anywhere except to the upstream you configured. In every telemetry event and in client.inspect(), a key is identified only by the first 8 hex characters of its SHA-256 hash. The optional file state backend persists only the hashed id and operational counters; the raw key value never reaches disk.

What happens when every key in the pool is exhausted at the same time?

keymesh throws an AllKeysExhaustedError with the full pool state attached (counts, circuit states, cooldown timestamps). Catch it at the boundary of your application and decide what to surface to the user. The library never silently drops the request.

Where does the pool state live?

In-process memory by default. For multi-process or restart-survival use cases, opt in to the file backend by setting state: 'file' in the config. Redis, SQLite, and Postgres backends are planned for 1.1.

Will you support cost-aware routing?

Yes, planned for 1.1. It requires per-key cost-tier metadata and a small economic model; that earns its own release rather than being shoe-horned into 1.0.

What about streaming responses from the SDKs?

Streaming is not wrapped by the rotation layer in 1.0; the wrapper hands streaming calls through to the underlying SDK unchanged. First-class streaming support is on the 1.1 roadmap.

How do I verify a published version's provenance?

Every release is published with npm publish --provenance. Check the attestations with npm view @takk/keymesh@<version> --json | jq .dist.attestations. The attestation links the tarball you installed to the GitHub Actions workflow that built it from a specific source commit.

Can I plug in my own selection strategy?

Yes. Implement SelectorStrategy with a pick(eligible, all) method and pass an instance as strategy in the config. Anything from "follow a custom priority queue" to "consult an external service" is two methods of code.

What is the policy on breaking changes?

Strict SemVer 2.0.0, starting from 1.0.0. The binding stability surface is documented in SPEC.md section 5. Major bumps require a deprecation cycle; security fixes follow the disclosure flow in SECURITY.md.

Author

Built and maintained by David C Cavalcante.

David C Cavalcante

Founder, Takk Innovate Studio

Product Engineer, AI Engineer, ML Engineer, LLM Engineer, LLM Architect, AI Researcher. Builder of the @takk family of NPM packages for AI-native infrastructure.

keymesh is the first published package in a planned portfolio of NPM libraries targeting AI-native infrastructure for 2026 to 2030. Adjacent research by the author covers systemic intelligence frameworks (MAIC, HIM, NHE) published independently of this codebase, with research notes on PhilPapers and PhilArchive linked from the repository README.

If keymesh saved you an on-call this quarter, the most useful thing you can do is open a GitHub issue when you find an edge case the test suite missed. The runbook for releases, the threat model, and the contributor agreement all live in the repository.

Read the source Get on npm