by Maciej Dzierżek · published Apr 24, 2026 · updated May 15, 2026 · 20 min read · Expert

rate-limits-and-retries hero illustration — Illustration for: Rate limits in practice — pacing, retries, capacity · Generated with Nano Banana, brand style

Rate limits in practice#

Q: What does a 429 response mean exactly?

You exceeded your rate-limit window. QuizBase returns RFC 9457 Problem Details with `status: 429`, `title: "Too Many Requests"`, plus IETF rate-limit headers (`RateLimit-Limit`, `RateLimit-Remaining`, `RateLimit-Reset`) and a `Retry-After` header (seconds until the next window). Honor `Retry-After` first — it's the most accurate signal.

Q: How do I implement exponential backoff with QuizBase?

Start at the value in `Retry-After` (if present). On subsequent 429s, double the wait: `delay = max(retryAfter, prevDelay * 2)` capped at ~60s. Add ±20% jitter (`delay * (0.8 + Math.random() * 0.4)`) to prevent thundering-herd on tier-shared clients. Stop retrying after 5 attempts and surface the error to the caller.

Q: Do retries count against my quota?

Yes — every request that reaches the server counts, including the ones that get 429'd back. The is rolling; backing off into the next window resets your burst budget. The daily budget ( on free) is what eventually caps you.

Q: Can I use multiple API keys to bypass rate limits?

No. Rate limits are **per account**, not per key. All keys you create share the same window. This is documented in [`/docs/authentication`](/docs/authentication) — multiple keys exist for app-level secret rotation, not capacity multiplication.

Q: What's the difference between burst limit and daily limit?

**Burst** = on free tier (rolling 10-second window). Protects against bot scrapers. **Daily** = rolling 24h window. Caps reasonable bulk-fetch jobs. Both are tier-scaled — higher paid tiers raise both proportionally.

Q: Should I retry on 500 or 503 errors?

Yes, but with capped exponential backoff. 5xx errors are server-side and usually transient (deploy, brief DB hiccup). Use the same backoff pattern as 429 but starting at a smaller baseline (1s) and giving up faster (3 retries). Don't retry on 4xx (client error) except 429.

Q: What about idempotency for POST requests?

QuizBase REST is mostly GET-only. The only POST endpoint is [`/api/v1/report`](/docs/api/report) which is idempotent on `(questionId, category)` — multiple identical reports collapse to one server-side. You can retry safely.

Q: How do I plan capacity before hitting rate limits?

Estimate peak QPS = (your daily request count) / (~22 hours, leaving safety margin) / 1.5x for spikes. If your peak QPS approaches your tier's burst limit, batch where possible (cursor pagination instead of N single fetches) or [upgrade tier](/pricing). The [`/docs/api/usage`](/docs/api/usage) endpoint gives you current-day consumption you can graph.

Q: Does the MCP server share the same rate limit?

Yes. One `tools/call` = one REST request from the rate-limit perspective. Calling `quizbase_random` 100 times via MCP uses the same 100 budget as fetching `/api/v1/questions/random` 100 times via REST. There is no "agent discount". See [MCP server docs](/docs/sdks/mcp-server).

Q: What if I'm building an embed (lead-gen widget, classroom Page) where the key is visible?

Visible publishable keys (`qb_pk_*`) can be revoked from your dashboard if abused. Your widget hits free-tier limits sooner than server-side calls would. For high-volume embeds, proxy through a small `/api/quizbase` endpoint on your domain that adds your secret key server-side — your visitors share *your* quota but the key isn't exposed. Patterns: [quiz lead-gen widget](/docs/guides/quiz-lead-gen-widget), [Moodle/Canvas embed](/docs/guides/moodle-canvas-lms-module).

Plan your capacity#

Tier	Burst (per 10s)	Requests/day	Good for
Free	10	500	Prototypes, hobby apps, low-traffic blogs
Indie	30	10,000	Indie devs, side projects, small SaaS
Pro	200	100,000	Production apps, games, high-traffic sites
Enterprise	custom	custom	Custom rate limits, custom SLA, private datasets

Current pricing lives at /pricing. All keys on your account share one counter — qb_pk_* and qb_sk_* are scopes for browser vs server use, not separate quotas. REST requests and MCP tools/call / resources/read / prompts/get consume from the same bucket.

Read the headers#

Every response — even 200 OK — carries IETF RateLimit-* headers:

HTTP/1.1 200 OK
RateLimit-Limit: 10
RateLimit-Remaining: 7
RateLimit-Reset: 6
RateLimit-Policy: 10;w=10

Parse once, log, alert:

Idempotency#

All GET endpoints are idempotent by definition — you can retry safely. The server de-duplicates by X-Request-Id for observability, but nothing is stored twice even if you retry without an id.

Keep X-Request-Id from your first attempt in logs so you can trace retries.

Backoff with jitter#

async function quizbaseFetch<T>(
  url: string,
  key: string,
  opts: { maxAttempts?: number; logger?: Console } = {}
): Promise<T> {
  const { maxAttempts = 5, logger = console } = opts;

  for (let attempt = 1; attempt <= maxAttempts; attempt++) {
    const res = await fetch(url, { headers: { 'X-API-Key': key } });

    if (res.ok) return res.json() as Promise<T>;

    // Permanent 4xx — don't retry
    if (res.status >= 400 && res.status &lt; 500 && res.status !== 429) {
      const body = await res.text();
      throw new Error(`${res.status} ${res.statusText}: ${body}`);
    }

    // 429 — respect Retry-After exactly
    // 5xx — exponential backoff with jitter, capped at 60s
    const retryAfter = res.status === 429
      ? parseInt(res.headers.get('Retry-After') ?? '60', 10) * 1000
      : Math.min(60_000, 1000 * 2 ** attempt) + Math.floor(Math.random() * 1000);

    logger.warn(`[quizbase] attempt ${attempt} got ${res.status}, sleeping ${retryAfter}ms`);
    await new Promise((r) => setTimeout(r, retryAfter));
  }

  throw new Error(`Exhausted ${maxAttempts} retries`);
}

Pattern: cache in front#

A pre-built local cache eats 80% of your traffic. For apps that don’t need freshness:

// Simple in-memory cache for question lists (Node server)
const cache = new Map<string, { body: unknown; expires: number }>();

async function cachedQuizbase(url: string, key: string, ttlMs = 300_000) {
  const cached = cache.get(url);
  if (cached && cached.expires > Date.now()) return cached.body;

  const body = await quizbaseFetch(url, key);
  cache.set(url, { body, expires: Date.now() + ttlMs });
  return body;
}

For distributed caches, use Redis with SETEX and the URL as the key.

Pattern: queue + worker#

For large syncs (e.g. mirroring the Polish catalog):

Put the cursor URL on a queue (BullMQ, SQS, Redis stream)
Worker picks off one URL, fetches, persists, pushes the _links.next back
Worker respects rate limits — if RateLimit-Remaining < 5, sleep RateLimit-Reset seconds
Dead-letter after 5 failed attempts

This pattern drains millions of requests over hours without ever hitting 429.

Pattern: SWR on the client#

Browser apps (using qb_pk_* keys) should use Stale-While-Revalidate:

// With @tanstack/query or swr
useQuery({
  queryKey: ['quiz', 'random', lang],
  queryFn: () => fetch('/api/round').then((r) => r.json()),
  staleTime: 5 * 60 * 1000,       // 5 min — stay stale, don't hammer
  gcTime: 30 * 60 * 1000,         // 30 min — purge from memory
  retry: (count, err) => count &lt; 3 && !isClientError(err)
});

Monitoring checklist#

✅ Log X-Request-Id on every non-2xx response — makes support tickets trivial
✅ Alert when RateLimit-Remaining / RateLimit-Limit < 0.2 — lets you upgrade proactively
✅ Track Retry-After histogram — helps decide if you need to scale horizontally or upgrade tier
✅ Measure your own request-rate — your app’s view vs our headers should match
❌ Don’t poll faster after 429 — you’ll only make it worse
❌ Don’t swallow 401/403 as retryable — they indicate real config issues

FAQ#

What does a 429 response mean exactly?#

You exceeded your rate-limit window. QuizBase returns RFC 9457 Problem Details with status: 429, title: "Too Many Requests", plus IETF rate-limit headers (RateLimit-Limit, RateLimit-Remaining, RateLimit-Reset) and a Retry-After header (seconds until the next window). Honor Retry-After first — it’s the most accurate signal.

How do I implement exponential backoff with QuizBase?#

Start at the value in Retry-After (if present). On subsequent 429s, double the wait: delay = max(retryAfter, prevDelay * 2) capped at ~60s. Add ±20% jitter (delay * (0.8 + Math.random() * 0.4)) to prevent thundering-herd on tier-shared clients. Stop retrying after 5 attempts and surface the error to the caller.

Do retries count against my quota?#

Yes — every request that reaches the server counts, including the ones that get 429’d back. The 10 burst/10s is rolling; backing off into the next window resets your burst budget. The daily budget (500 requests/day on free) is what eventually caps you.

Can I use multiple API keys to bypass rate limits?#

No. Rate limits are per account, not per key. All keys you create share the same window. This is documented in /docs/authentication — multiple keys exist for app-level secret rotation, not capacity multiplication.

What’s the difference between burst limit and daily limit?#

Burst = 10 burst/10s on free tier (rolling 10-second window). Protects against bot scrapers. Daily = 500 requests/day rolling 24h window. Caps reasonable bulk-fetch jobs. Both are tier-scaled — higher paid tiers raise both proportionally.

Should I retry on 500 or 503 errors?#

Yes, but with capped exponential backoff. 5xx errors are server-side and usually transient (deploy, brief DB hiccup). Use the same backoff pattern as 429 but starting at a smaller baseline (1s) and giving up faster (3 retries). Don’t retry on 4xx (client error) except 429.

What about idempotency for POST requests?#

QuizBase REST is mostly GET-only. The only POST endpoint is /api/v1/report which is idempotent on (questionId, category) — multiple identical reports collapse to one server-side. You can retry safely.

How do I plan capacity before hitting rate limits?#

Estimate peak QPS = (your daily request count) / (~22 hours, leaving safety margin) / 1.5x for spikes. If your peak QPS approaches your tier’s burst limit, batch where possible (cursor pagination instead of N single fetches) or upgrade tier. The /docs/api/usage endpoint gives you current-day consumption you can graph.

Yes. One tools/call = one REST request from the rate-limit perspective. Calling quizbase_random 100 times via MCP uses the same 100 budget as fetching /api/v1/questions/random 100 times via REST. There is no “agent discount”. See MCP server docs.

Visible publishable keys (qb_pk_*) can be revoked from your dashboard if abused. Your widget hits free-tier limits sooner than server-side calls would. For high-volume embeds, proxy through a small /api/quizbase endpoint on your domain that adds your secret key server-side — your visitors share your quota but the key isn’t exposed. Patterns: quiz lead-gen widget, Moodle/Canvas embed.

Rate limits in practice#

Plan your capacity#

Read the headers#

Idempotency#

Backoff with jitter#

Pattern: cache in front#

Pattern: queue + worker#

Pattern: SWR on the client#

Monitoring checklist#

FAQ#

What does a 429 response mean exactly?#

How do I implement exponential backoff with QuizBase?#

Do retries count against my quota?#

Can I use multiple API keys to bypass rate limits?#

What’s the difference between burst limit and daily limit?#

Should I retry on 500 or 503 errors?#

What about idempotency for POST requests?#

How do I plan capacity before hitting rate limits?#

What if I’m building an embed (lead-gen widget, classroom Page) where the key is visible?#

See also#

Rate limits in practice#

Plan your capacity#

Read the headers#

Idempotency#

Backoff with jitter#

Pattern: cache in front#

Pattern: queue + worker#

Pattern: SWR on the client#

Monitoring checklist#

FAQ#

What does a 429 response mean exactly?#

How do I implement exponential backoff with QuizBase?#

Do retries count against my quota?#

Can I use multiple API keys to bypass rate limits?#

What’s the difference between burst limit and daily limit?#

Should I retry on 500 or 503 errors?#

What about idempotency for POST requests?#

How do I plan capacity before hitting rate limits?#

Does the MCP server share the same rate limit?#

What if I’m building an embed (lead-gen widget, classroom Page) where the key is visible?#

See also#