Open trivia data — sources, attribution, and our compliance dump

QuizBase aggregates eleven open datasets and adds multilingual translations, classification, and curated topics on top. Below: each source with attribution, plus our public dump for CC-BY-SA share-alike compliance.

Eleven datasets, openly licensed

Six are CC-BY-SA (translations and refinements republished). Five are CC-BY or MIT (our enrichment stays under our license).

opentdb

Open Trivia Database (PixelTail Games)

CC-BY-SA-4.0
Records (EN)
5,149
Records (PL)
5,146

What we add:

  • ✓ Polish translations
  • ✓ Classification (categories, subcategories, tags)
  • ✓ Curated topics + multilingual labels

opentriviaqa

OpenTriviaQA (uberspot)

CC-BY-SA-4.0
Records (EN)
48,862
Records (PL)
48,853

What we add:

  • ✓ Polish translations
  • ✓ English text refinements
  • ✓ Classification (categories, subcategories, tags)
  • ✓ Curated topics + multilingual labels

arc

AI2 Reasoning Challenge (ARC), Allen Institute for AI

CC-BY-SA-4.0
Records (EN)
7,787
Records (PL)
7,787

What we add:

  • ✓ Polish translations
  • ✓ Classification (categories, subcategories, tags)
  • ✓ Curated topics + multilingual labels

kqa-pro

KQA Pro (Cao et al. 2022)

CC-BY-SA-4.0
Records (EN)
95,735
Records (PL)
95,692

What we add:

  • ✓ Polish translations
  • ✓ English text refinements
  • ✓ Classification (categories, subcategories, tags)
  • ✓ Curated topics + multilingual labels

nq-open

Natural Questions Open (Lee et al. 2019, Google Research)

CC-BY-SA-3.0
Records (EN)
86,714
Records (PL)
86,656

What we add:

  • ✓ Polish translations
  • ✓ Quizifications (Q&A → multiple-choice / boolean)
  • ✓ Classification (categories, subcategories, tags)
  • ✓ Curated topics + multilingual labels

mkqa

MKQA: Multilingual Knowledge Questions and Answers (Apple ML Research)

CC-BY-SA-3.0
Records (EN)
6,263
Records (PL)
6,758

What we add:

  • ✓ Quizifications (Q&A → multiple-choice / boolean)
  • ✓ Classification (categories, subcategories, tags)
  • ✓ Curated topics + multilingual labels

entityq

EntityQuestions (Sciavolino et al. 2021, Princeton NLP)

MIT
Records (EN)
216,575
Records (PL)
216,479

What we add:

  • ✓ Polish translations
  • ✓ English text refinements
  • ✓ Quizifications (Q&A → multiple-choice / boolean)
  • ✓ Classification (categories, subcategories, tags)
  • ✓ Curated topics + multilingual labels

mintaka

Mintaka (Amazon Science)

CC-BY-4.0
Records (EN)
154,325
Records (PL)
154,312

What we add:

  • ✓ Polish translations
  • ✓ English text refinements
  • ✓ Quizifications (Q&A → multiple-choice / boolean)
  • ✓ Classification (categories, subcategories, tags)
  • ✓ Curated topics + multilingual labels

creak

CREAK (Onoe et al. 2021)

MIT
Records (EN)
12,047
Records (PL)
12,047

What we add:

  • ✓ Polish translations
  • ✓ Classification (categories, subcategories, tags)
  • ✓ Curated topics + multilingual labels

qasc

QASC (Khot et al. 2020, Allen Institute for AI)

CC-BY-4.0
Records (EN)
9,060
Records (PL)
9,060

What we add:

  • ✓ Polish translations
  • ✓ Classification (categories, subcategories, tags)
  • ✓ Curated topics + multilingual labels

webq

WebQuestions (Berant et al. 2013, Stanford NLP)

CC-BY-4.0
Records (EN)
5,017
Records (PL)
5,015

What we add:

  • ✓ Polish translations
  • ✓ English text refinements
  • ✓ Quizifications (Q&A → multiple-choice / boolean)
  • ✓ Classification (categories, subcategories, tags)
  • ✓ Curated topics + multilingual labels

Attribution rules

Eleven datasets, four open licenses, four sets of obligations. Here is what you owe upstream when you use QuizBase records — at a glance, per license. Every API record carries everything you need to comply; this section explains how to assemble it.

LicenseSourcesCredit authorLink to licenseIndicate modificationsShare-alike on derivatives
CC-BY-SA-4.0opentdb, opentriviaqa, arc, kqa-pro
CC-BY-SA-3.0nq-open, mkqa
CC-BY-4.0mintaka, qasc, webq
MITentityq, creak

Every API record is self-describing

The `attribution` object on every question contains the author, the license string + version + canonical URL, the upstream record id, the upstream link (when available), the list of our modifications, and a last-modified timestamp. You assemble your attribution string from these fields — no need to memorize per-source rules or look up CC docs.

{
  "attribution": {
    "author": "OpenTriviaQA contributors",
    "source": "opentriviaqa",
    "license": "CC-BY-SA-4.0",
    "licenseVersion": "4.0",
    "licenseUrl": "https://creativecommons.org/licenses/by-sa/4.0/",
    "sourceId": "otq:12345",
    "url": "https://github.com/uberspot/OpenTriviaQA",
    "modifications": ["translated_pl", "refined_text"],
    "lastModified": "2026-04-24T10:00:00Z"
  }
}

Modifications enum

The `modifications` array tells you exactly what we changed. Use it to satisfy "indicate modifications" obligations. Empty array = the upstream record is verbatim.

translated_<lang>
We translated the question text + answers from English into the target language using our calibrated LLM pipeline. Per-language tag — `translated_pl`, `translated_de`, etc.
refined_text
We rewrote the original English text for clarity, grammar, and trivia tone. The original meaning and answer are preserved.
quizified
We converted an upstream Q&A pair into a quiz item — refined the question text and either generated three multiple-choice distractors or normalized to a True/False boolean. Source attribution preserved; the quiz form is ours.

This is a developer-friendly summary, not legal advice. For commercial redistribution at scale, especially of CC-BY-SA records, consult a lawyer familiar with Creative Commons and your jurisdiction. Questions about a specific record or attribution string? Email [email protected].

BY-SA enrichment dump

CC-BY-SA share-alike requires us to publish our enrichment for the six BY-SA-licensed sources. Here it is, under CC-BY-SA-4.0.

Latest release
2026-05-04
Asset files
9
Version
v2026-05-04

Manually regenerated when content changes substantially. Originals are at upstream sources (linked above) — the dump only contains our derivative work.

What's only in the API

Beyond what we publish in the compliance dump.

  • Classification across 24 categories, ~140k subcategories, and ~627k tags.
  • Curated topic system — 2,184 topics with 6,176 aliases.
  • Multilingual display labels for every subcategory and tag.
  • Generated distractors for non-BY-SA sources (entityq, mintaka, qasc, webq).
  • Polish translations for non-BY-SA sources.
  • Text refinements for non-BY-SA sources.
  • Difficulty calibration (planned, post-launch).
  • Embeddings + semantic search (planned).

Credits

QuizBase stands on a lot of open-source work. Here is what we use and who inspired us.

Libraries

Framework

  • SvelteKit · MIT Application framework
  • Svelte · MIT UI runtime (runes)
  • Vite · MIT Bundler / dev server

UI

Database & ORM

Auth & Billing

  • Better Auth · MIT Authentication, sessions, API keys
  • Stripe · MIT (SDK) Payments, subscriptions

Dev tooling

Inspirations

Products that shaped how we think about API design and pricing.

  • Stripe API design, key prefixes, error format conventions, developer documentation gold standard.
  • Upstash Pricing tiers, free-forever dev experience, transparent overage policy.
  • Clerk Dev/prod key split (`test_*` vs `live_*`), seamless onboarding.
  • OpenAI Rate-limit response format with concrete next-step (limit, current, retry-after, upgrade link).
  • Wikimedia Enterprise Commercial API on top of CC-BY-SA content done right — attribution, share-alike compliance, pricing.
  • OpenStreetMap Diff-based licensing model and the precedent for open geographic data with sustainable hosting.

Spotted a problem?

Wrong translation, factual error, or missing attribution? Use our review form.

Report a problem →

Spotted incorrect attribution? License questions? Translation issues? Removal request? Email [email protected].

Want to contribute or report a dump issue? See the dump repository.