@takk/keymesh - v1.0.0 - Apache-2.0

Stop losing API requests to rate limits.

Your application calls OpenAI, Anthropic, or Gemini and one key hits 429. The request fails. The retry hits the same key. The user waits. Your on-call gets paged. keymesh rotates a pool of keys, opens a circuit on the failing one, and retries on the next eligible key with backoff. The user never sees the 429.

145tests passing
93%line coverage
0runtime deps
SLSAprovenance
What it is

One library, two answers.

In plain English

You give keymesh a list of API keys for the same service (OpenAI, Anthropic, Gemini, or any HTTP endpoint). It hands each request to a key from the pool. When a key fails or rate-limits, keymesh moves the next request to another key. After a few consecutive failures, it stops using that key for a while. When the cooldown expires, it tries the key again. Your code does not change shape: you keep calling client.chat.completions.create() exactly as before.

Technically

A generic orchestration layer that wraps any provider client through a deep proxy mirroring the wrapped SDK shape, attaches eight telemetry events, and runs a per-request loop of pick, dispatch, classify error, retry or rotate. Pluggable selection strategies, per-key circuit breaker with closed -> open -> half-open state machine, AWS full-jitter exponential backoff with total time budget, respect for upstream Retry-After. State persistence is pluggable too.

Before and after

The same incident, two timelines.

Take a real production scene: your job sends 240 requests per minute to OpenAI, you hold three keys provisioned across three projects, and one of them hits its per-minute quota.

Without keymesh

Manual rotation, fire-fighting at 3 AM

  1. 09:14:01 Key #1 returns 429 Too Many Requests.
  2. 09:14:01 Your retry hits Key #1 again with the same payload, with exponential backoff.
  3. 09:14:03 Second 429. Third 429. Fourth.
  4. 09:14:08 User-facing request fails with Request timeout.
  5. 09:14:09 Alerting fires. On-call investigates.
  6. 09:23:00 Engineer manually disables Key #1, restarts the job with Key #2.
  7. 09:31:00 Key #1's quota window resets, but nobody remembers to re-enable it.
With keymesh

Silent rotation, circuit recovers itself

  1. 09:14:01 Key #1 returns 429. keymesh classifies it as transient.
  2. 09:14:01 The same request retries on Key #2. Success in under 200ms.
  3. 09:14:01 key.rotated event fires. Your logger records from=#1 to=#2.
  4. 09:14:04 After three consecutive failures on Key #1, the per-key circuit opens for 30 seconds.
  5. 09:14:34 Cooldown elapses. Next eligible pick lands on Key #1 in half-open.
  6. 09:14:34 The trial request succeeds. circuit.closed fires. Key #1 is back in the pool.
  7. The user never saw a delay. The on-call was never paged.

The "With keymesh" timeline is the literal event trace from the live integration test in the keymesh repository, run against the real Google Gemini API with seven keys.

Install

Five minutes from install to first rotation.

1. Add the package

pnpm add @takk/keymesh
npm install @takk/keymesh
yarn add @takk/keymesh
bun add @takk/keymesh

2. Install the SDK you actually use

Adapters use optional peer dependencies. Skip this if you only need the generic HTTP adapter.

pnpm add openai # for
                @takk/keymesh/openai
              pnpm add @anthropic-ai/sdk # for
                @takk/keymesh/anthropic
              pnpm add @google/genai # for
                @takk/keymesh/gemini

3. Quickstart with the OpenAI SDK

import { createKeymesh } from '@takk/keymesh';
              import { openaiAdapter } from '@takk/keymesh/openai';

              const client = createKeymesh({
              provider: openaiAdapter,
              keys: process.env.OPENAI_API_KEYS?.split(',')
              ?? [],
              strategy: 'least-used',
              circuitBreaker: { threshold: 3, cooldownMs: 30_000
              },
              retry: { max: 5, baseMs: 200, jitter: true },
              telemetry: { enabled: true },
              });

              // Drop-in: identical to the underlying SDK shape.
              const response = await
              client.chat.completions.create({
              model: 'gpt-4.1',
              messages: [{ role: 'user', content: 'Hello.'
              }],
              });

4. Quickstart with any HTTP endpoint

import { createKeymesh } from '@takk/keymesh';
              import { httpAdapter } from '@takk/keymesh/http';

              const tavily = createKeymesh({
              provider: httpAdapter({
              baseUrl: 'https://api.tavily.com',
              authHeader: (key) => ({ Authorization: `Bearer ${key}` }),
              }),
              keys: process.env.TAVILY_API_KEYS?.split(',')
              ?? [],
              strategy: 'round-robin',
              });

              const result = await tavily.post('/search', { query: 'AI infrastructure 2026' });
Features

Nine capabilities, every one tied to a measurable outcome.

Key rotation

Pluggable selection across a pool, with four built-in strategies and a typed SelectorStrategy interface for custom logic.

Spread the same workload across N keys and lift the effective rate-limit ceiling by roughly the size of the pool.

Automatic failover

Transient errors (408, 425, 429, 500, 502, 503, 504, plus Anthropic 529 and network resets) trigger an instant rotation to the next eligible key.

User-facing failures from a single upstream outage drop to zero as long as one key in the pool is healthy.

Per-key circuit breaker

Three-state machine (closed, open, half-open) with configurable threshold and cooldown; opens after consecutive failures, recovers on first half-open success.

A failing key stops eating retry budget after a handful of attempts and rejoins the pool on its own once it recovers.

Smart retry

AWS full-jitter exponential backoff with a total time budget, honouring upstream Retry-After in both numeric-seconds and HTTP-date form.

Predictable tail latency under load instead of synchronised retry storms hammering the same upstream window.

Health scoring

Each key carries a 0-to-100 health score that decays on failure with a configurable half-life and recovers on success.

The pool quietly favours keys that are actually working, without you maintaining a curated whitelist.

Telemetry

Eight in-process events (request.start/success/fail, key.rotated, circuit.open/closed/half-open, all.exhausted), zero OpenTelemetry dependency.

Drop events into the logger or metrics pipeline you already run; no new agent, no new vendor.

Auth-failure cooldown

A key that returns 401 Unauthorized is cooled for 24 hours on the assumption that the credential itself is invalid.

A revoked or rotated key never burns retry budget; you fix it on your schedule, not the request loop's.

Pluggable state

Memory backend by default; opt-in file backend persists only the hashed id and operational counters. Raw key value never reaches disk.

Counter durability across restarts without trading away credential hygiene.

SLSA provenance

Every published version signed with npm publish --provenance through GitHub Actions OIDC. Lockfile committed; supply-chain policy enforces minimum release age on new dependency versions.

Verify in one command that the tarball you installed was built from the source commit you trust.

Selection strategies

Four built-in strategies, all opt-in by name.

Pass strategy: '<name>' in the config, or implement SelectorStrategy for custom logic.

Strategy How it picks When to use it
round-robin Cycles through eligible keys in registration order. Default baseline. Even distribution across keys with similar quotas.
weighted Pick probability proportional to the weight declared on each KeyConfig. Some keys have higher quotas than others (paid tier alongside free tier).
least-used Lowest in-flight count; ties broken by total usage then last-used timestamp. Keys share a quota window. Maximises throughput without overloading any one key.
sequential-then-rotate Always tries the first eligible key in registration order; rotates only when ineligible. One paid key plus several free fallbacks. Burn the cheap one until it fails.

cost-aware strategy is planned for 1.1; it requires per-key cost-tier metadata.

Adapters

Drop-in for the SDK you already use.

Adapter Subpath export Peer dependency Use it when
OpenAI @takk/keymesh/openai openai >= 4 You import openai directly and want rotation without touching call sites.
Anthropic @takk/keymesh/anthropic @anthropic-ai/sdk >= 0.30 Same, for the official Anthropic SDK. Recognises HTTP 529 Overloaded as transient.
Gemini @takk/keymesh/gemini @google/genai >= 0.7 Same, for Google's generative AI SDK.
HTTP (generic) @takk/keymesh/http none Any REST endpoint (Tavily, Serper, Stripe, GitHub API, internal microservice).
CLI proxy mode

Same code path, no library import required.

When you cannot or do not want to embed keymesh in your code, run it as a local HTTP proxy. The proxy strips inbound auth headers, dispatches through the same orchestrator the library exposes, and returns the upstream response unchanged.

Start the proxy

# With keys in the environment
              OPENAI_API_KEYS=key1,key2,key3 npx @takk/keymesh start \
              --port 8787 \
              --adapter openai \
              --strategy round-robin

Send requests through the proxy

curl http://localhost:8787/v1/chat/completions \
              -H 'Content-Type: application/json' \
              -d '{"model":"gpt-4.1","messages":[{"role":"user","content":"Hi"}]}'
          

Inspect live pool state

# from another terminal while the proxy is running
              curl http://localhost:8787/__keymesh_inspect |
              jq

              # or via the CLI, against a persisted state file
              npx @takk/keymesh inspect --state-file .keymesh-state.jsonl
Observability

Eight events. No OpenTelemetry runtime dependency.

Subscribe directly on the wrapped client. Every event is a typed object; the union TelemetryEvent is exported so you can branch on event.type with full narrowing in TypeScript.

client.on('request.start', (e) =>
              log.info({ keyId: e.keyId, path: e.path }));
              client.on('request.success', (e) =>
              metrics.histogram('keymesh.latency',
              e.elapsedMs));
              client.on('request.fail', (e) => log.warn({ keyId: e.keyId, status: e.status, error: e.error }));
              client.on('key.rotated', (e) => log.info({ from: e.from, to: e.to, reason: e.reason }));
              client.on('circuit.open', (e) =>
              alerts.notify(`Key ${e.keyId} circuit OPEN until
                ${new Date(e.cooldownUntil).toISOString()}`));
              client.on('circuit.closed', (e) =>
              log.info({ keyId: e.keyId }));
              client.on('circuit.half-open', (e) =>
              log.info({ keyId: e.keyId }));
              client.on('all.exhausted', (e) =>
              alerts.notify(`Pool exhausted:
                ${e.reason}`));

Pool snapshot on demand

const snapshot = client.inspect();
              // {
                // strategy: 'least-used',
                // totalRequests: 1284,
                // totalFailures: 3,
                // keys: [
                // { id: 'a1b2c3d4', label: 'key-a1b2c3d4',
                // circuitState: 'closed', healthScore: 100,
                // inFlight: 0, successCount: 642, failureCount: 1,
                // consecutiveFailures: 0, cooldownUntil: 0,
                // lastUsedAt: 1748449183210 },
                // ...
                // ]
                // }

The id is the first 8 hex characters of the key's SHA-256 hash. The raw key value never appears in a snapshot, never appears in a telemetry event, never reaches the optional file state backend.

Compare

keymesh vs the alternatives.

The other tools in this space solve adjacent problems well. The contrast clarifies where keymesh sits.

Capability keymesh LiteLLM Bifrost Portkey Hand-rolled
Distribution npm library pip + sidecar go binary SaaS / hosted your repo
Runtime hop in-process network hop network hop network hop in-process
Runtime dependencies 0 many many n/a (hosted) varies
TypeScript-first API yes, strict partial no client typings varies
Embeddable in Edge runtimes core + HTTP yes no no via fetch only varies
Per-key circuit breaker yes partial yes yes rarely
SaaS lock-in none none none yes none
SLSA provenance on releases yes partial partial n/a no
License Apache-2.0 MIT Apache-2.0 commercial your call

The honest summary: pick LiteLLM if your stack is Python and a sidecar is fine. Pick Bifrost or Portkey if you genuinely want a gateway. Pick keymesh if you write Node.js or TypeScript and prefer a library you embed over a service you operate.

What counts as transient

The exact failure classes keymesh rotates on.

The classifier is deterministic and documented; if a code or condition is not listed here, keymesh treats it as a programmer error or upstream-permanent failure and propagates it untouched.

Signal Where keymesh response
HTTP 408 Request Timeout any adapter retry + rotate
HTTP 425 Too Early any adapter retry + rotate
HTTP 429 Too Many Requests any adapter, honours Retry-After retry + rotate
HTTP 500 / 502 / 503 / 504 any adapter retry + rotate
HTTP 529 Overloaded Anthropic-specific retry + rotate
HTTP 401 Unauthorized any adapter cool the key for 24h
ECONNRESET / ETIMEDOUT / ECONNREFUSED / EAI_AGAIN / fetch failed HTTP adapter retry + rotate
Every other 4xx any adapter propagate to caller untouched
Pool exhausted (every eligible key tried) orchestrator throw AllKeysExhaustedError
Retry budget exceeded orchestrator throw TotalBudgetExceededError
Quality and validation

The receipts behind v1.0.0.

Tests & coverage

145 tests passing across 21 suites under Vitest 4. Coverage: 93% lines, 91.6% statements, 89.1% functions, 79.4% branches. Run pnpm test on a fresh clone to reproduce.

Type safety

TypeScript 6 in maximum strict mode (exactOptionalPropertyTypes, useUnknownInCatchVariables, noUncheckedIndexedAccess, noImplicitOverride, noImplicitReturns). Zero errors under tsc --noEmit.

Lint & format

Biome 2 clean across src, tests, and examples. publint clean. Exports map is dual ESM + CJS with separate .d.ts and .d.cts per subpath.

Live validation

Validated against the real Gemini API: 429 quota exhaustion triggers rotation; the per-key circuit completes the full closed -> open -> half-open -> closed lifecycle on real upstream traffic.

CLI smoke

Functional CLI smoke test that spawns the published binary against a fake upstream and asserts end-to-end rotation, headers stripped, response body unchanged.

Supply chain

Committed pnpm lockfile, supply-chain policy with minimum release age, SLSA provenance attestation on every published version. Verify with npm view @takk/keymesh@1.0.0 --json | jq .dist.attestations.

Roadmap

What is shipped, what is next, what is later.

Now (1.0)

Shipped in v1.0.0

  • Four strategies, four adapters
  • Circuit breaker + smart retry
  • Memory and file state backends
  • Eight telemetry events
  • CLI proxy mode + inspect command
  • Dual ESM + CJS, full TypeScript types
  • SLSA provenance on every release
Next (1.1)

Targeted for 1.1

  • cost-aware selection strategy
  • Redis, SQLite, and Postgres state backends
  • Streaming-response failover for SDK adapters
  • Encrypted at-rest option for the file backend
  • Per-adapter Edge-runtime compatibility audit
Later

On the horizon

  • Federated pool across processes via shared backend
  • Adaptive strategy learned from telemetry
  • First-class OpenTelemetry exporter (opt-in)
  • Browser-targeted bundle for IndexedDB state
FAQ

Common questions.

Is keymesh production-ready at 1.0.0?

Yes. 145 tests across 21 suites pass under Vitest 4 with 93% line coverage; TypeScript 6 maximum strict mode is clean; Biome 2 lint is clean; publint is clean. Every published release carries SLSA provenance produced by GitHub Actions. The library has been validated live against the Gemini API: real 429 quota exhaustion triggers rotation, and the per-key circuit breaker completes the full closed -> open -> half-open -> closed recovery lifecycle on real upstream traffic.

Why not just use LiteLLM?

LiteLLM is excellent but Python-first, gateway-style, and requires you to run a sidecar service. keymesh is TypeScript-native, embeddable in any Node, Bun, or Deno process, and runs as either a library or a CLI proxy from the same code path.

Why not just run a proxy gateway like Bifrost or Portkey?

Same answer in different words. Those are services; keymesh is a library you pnpm add. No extra container, no extra hop, no SaaS lock-in. When you do want a proxy, npx @takk/keymesh start runs the same code path inside a local HTTP server.

Does this work in Cloudflare Workers, Vercel Edge, Bun, or Deno?

The core package and the http adapter run on any modern JavaScript runtime with fetch. The OpenAI, Anthropic, and Gemini adapters inherit the runtime compatibility of their respective official SDKs. Edge-optimised adapters are tracked for a later release.

How does keymesh handle my API keys?

Keys are held in process memory for the lifetime of the client and never transmitted anywhere except to the upstream you configured. In every telemetry event and in client.inspect(), a key is identified only by the first 8 hex characters of its SHA-256 hash. The optional file state backend persists only the hashed id and operational counters; the raw key value never reaches disk.

What happens when every key in the pool is exhausted at the same time?

keymesh throws an AllKeysExhaustedError with the full pool state attached (counts, circuit states, cooldown timestamps). Catch it at the boundary of your application and decide what to surface to the user. The library never silently drops the request.

Where does the pool state live?

In-process memory by default. For multi-process or restart-survival use cases, opt in to the file backend by setting state: 'file' in the config. Redis, SQLite, and Postgres backends are planned for 1.1.

Will you support cost-aware routing?

Yes, planned for 1.1. It requires per-key cost-tier metadata and a small economic model; that earns its own release rather than being shoe-horned into 1.0.

What about streaming responses from the SDKs?

Streaming is not wrapped by the rotation layer in 1.0; the wrapper hands streaming calls through to the underlying SDK unchanged. First-class streaming support is on the 1.1 roadmap.

How do I verify a published version's provenance?

Every release is published with npm publish --provenance. Check the attestations with npm view @takk/keymesh@<version> --json | jq .dist.attestations. The attestation links the tarball you installed to the GitHub Actions workflow that built it from a specific source commit.

Can I plug in my own selection strategy?

Yes. Implement SelectorStrategy with a pick(eligible, all) method and pass an instance as strategy in the config. Anything from "follow a custom priority queue" to "consult an external service" is two methods of code.

What is the policy on breaking changes?

Strict SemVer 2.0.0, starting from 1.0.0. The binding stability surface is documented in SPEC.md section 5. Major bumps require a deprecation cycle; security fixes follow the disclosure flow in SECURITY.md.

Author

Built and maintained by David C Cavalcante.

David C Cavalcante

Founder, Takk Innovate Studio

Product Engineer, AI Engineer, ML Engineer, LLM Engineer, LLM Architect, AI Researcher. Builder of the @takk family of NPM packages for AI-native infrastructure.

keymesh is the first published package in a planned portfolio of NPM libraries targeting AI-native infrastructure for 2026 to 2030. Adjacent research by the author covers systemic intelligence frameworks (MAIC, HIM, NHE) published independently of this codebase, with research notes on PhilPapers and PhilArchive linked from the repository README.

If keymesh saved you an on-call this quarter, the most useful thing you can do is open a GitHub issue when you find an edge case the test suite missed. The runbook for releases, the threat model, and the contributor agreement all live in the repository.