What is tier-0 regulatory data?

Tier-0 is the primary, official record straight from the body that issued it: regulators, official journals, agencies, and standard-setters such as EUR-Lex, ECHA, the FDA, and EFRAG. It is authoritative because it is the source itself, not a summary or a web mention of it.

Why does the data tier matter for AI compliance answers?

An AI answer is only as defensible as the data it stands on. General models learned mostly from the open web (tier-2), which is where stale and imprecise regulatory information lives. Answering from tier-0 data is what produces citable, defensible answers.

What makes a tier-0 data layer trustworthy beyond the source?

The raw documents must be parsed into one schema, deduplicated and versioned, dated and status-stamped (draft versus in force), validated, and kept with their provenance and link, so every answer traces back to an official document.

How is tier-0 data different from what ChatGPT uses by default?

By default a general model answers from its training data, which is largely open-web (tier-2) and frozen at a cut-off date. Connecting a tier-0 layer lets the same model answer from live, official, validated records instead.

What 'Tier-0' Regulatory Data Means for AI (and Why Provenance Wins)

When an AI assistant gives you a regulatory answer, the question that matters is not "does it sound right." It is "where did this come from, and can I defend it." That question is answered entirely by the data underneath. Not all regulatory data is equal, and the difference has a name: tier-0.

This article explains what tier-0 regulatory data means, how it compares to the data most AI answers are built on, what turns raw official documents into something an assistant can safely cite, and why provenance, not fluency, is what makes an answer defensible.

The tiers of regulatory data

Think of regulatory information in three tiers, by distance from the source.

Tier-0: the primary, official record

This is the regulation itself, straight from the body that issued it: regulators, official journals, agencies, and standard-setters. Examples include EUR-Lex, ECHA, the FDA, and EFRAG. Tier-0 is authoritative by definition, because it is the source, not a description of it.

Tier-1: aggregators and professional summaries

Newsletters, law-firm briefings, and database vendors that collect and summarize tier-0. Useful, but one step removed: timing lags, interpretation creeps in, and you inherit someone else's editorial choices.

Tier-2: the open web

Blogs, forums, news articles, and general web pages that mention regulation. This is the bulk of what a general-purpose model absorbed during training. It is uneven, often outdated, and frequently wrong about specifics.

Why AI trained on tier-2 fails for compliance

A general-purpose model learned the world mostly from tier-2: the open web. That is fine for general knowledge and dangerous for regulation, where the open web is exactly where stale and imprecise information lives. The model absorbs the gist and loses the specifics, which, as we cover in why AI hallucinates on regulatory questions, is how confident, wrong answers are born.

You cannot fix tier-2 training data with a better prompt. The only fix is to give the model tier-0 data to answer from at the moment of the question.

An AI answer is only as defensible as the tier of the data it stands on. Tier-0 in, citable answer out. Tier-2 in, plausible guess out.

What turns tier-0 documents into a trustworthy layer

Raw official documents are authoritative but messy: hundreds of publishers, dozens of formats, duplicates, amendments, and no common structure. Tier-0 sourcing is necessary but not sufficient. A trustworthy data layer also has to process that material so an assistant can use it safely. That means:

Parse and normalize. HTML, XML, and PDF from hundreds of sources turned into one canonical schema.
Deduplicate and version. Stable identifiers, so a republished or amended text is recognized as a new version, not a new fact.
Date and stamp status. Every record carries when it changed and whether it is a draft, in consultation, adopted, or in force.
Validate. Integrity checks before anything is surfaced, so what reaches the assistant always traces back to an official document.
Preserve provenance. The source and the link travel with the record, all the way to the answer.

This is the difference between "we scraped some regulators" and a layer an assistant can cite. Obsidian applies this pipeline across 4,600+ official sources in 50+ jurisdictions, so the data is not just tier-0, it is tier-0 made usable.

Give your AI tier-0 data, not open-web guesses

Connect Obsidian to Claude, ChatGPT, or Cursor and every answer comes back with its official source, date, and status. Free tier, two-minute setup.

Explore the Obsidian MCP

Why provenance equals defensibility

In compliance, the value of an answer is not just whether it is correct, it is whether you can show why it is correct. An answer with provenance, a named official source, a date, and a status, is something you can put in front of your board, your auditor, or your regulator. An answer without provenance is a claim you have to go and verify yourself, which means the assistant saved you nothing.

Provenance is also what makes an answer auditable over time. When a rule changes, a versioned, dated record shows what was true when you acted on it. That is not a nice-to-have in regulated work. It is the record.

How to get tier-0 data into your AI

You do not need to rebuild this yourself. The practical path is to connect your assistant to a maintained tier-0 layer over the Model Context Protocol, rather than building and feeding your own pipeline. For the build-versus-connect trade-off, see RAG vs MCP for regulatory AI. For the bigger picture, see agentic regulatory intelligence.

The takeaway

The quality of an AI regulatory answer is decided before the model says a word, by the tier of the data underneath. Tier-0, processed and provenance-stamped, is what turns a confident guess into a defensible answer. When you evaluate any AI for regulatory work, do not ask how smart the model is. Ask what tier its data is.