What is the best GPT model for regulatory questions?

In our benchmark, GPT-5.4-nano connected to the Obsidian data layer scored 98.6 out of 100, the highest of the three GPT models tested and twenty-five times cheaper than GPT-5.5. On regulatory questions the data layer mattered more than the model size: every GPT model reached 96 or above once connected, up from an average of 33 alone.

Does ChatGPT hallucinate on regulatory questions?

Answering alone, 47 percent of a GPT answer's factual claims were grounded in a verifiable source. Connected to the Obsidian data layer, 100 percent of claims were grounded across all three models and none contradicted the official text.

How do I make ChatGPT cite official regulatory sources?

Connect ChatGPT to a verified tier-0 regulatory data layer such as Obsidian. In our benchmark, every GPT answer then came back with the correct official instrument, its reference, the legal status, and a direct link to the source, often the official PDF.

The Best Regulatory AI Companion for ChatGPT, Benchmarked

Ask ChatGPT a regulatory question and the answer comes back fast and self-assured. Then you check it. The regulation number does not exist. The deadline it quotes passed two years ago. The "requirement" was a proposal that never became law. After enough of these, the verdict feels obvious: ChatGPT is not ready for regulatory work.

It is the wrong verdict. The GPT models people use every day are perfectly capable of regulatory reasoning. What fails them is not intelligence, it is reach. A general model answers from a frozen snapshot of the open web, with no way to open the actual text of a regulation or to know whether it is in force today. Give it that text, and it stops guessing.

That text is what Obsidian supplies: a verified, tier-0 regulatory data layer ChatGPT can query, returning the official wording, the current legal status, and a direct link to the source. We ran three GPT models, GPT-5.4-nano, GPT-5.4-mini, and GPT-5.5, through real, current regulatory questions, each answered twice. Once alone. Once connected to Obsidian.

33 → 98

Average regulatory accuracy, the GPT models alone vs. connected (out of 100)

100%

Connected GPT answers that came back with a direct link to the official source

47% → 100%

Share of a GPT answer's factual claims grounded in the official source

ChatGPT is inaccurate for regulatory work

Alone, the three GPT models averaged 33 out of 100. That is the experience behind the complaint, and on those terms the complaint is fair. Connected to Obsidian, the same three averaged 98. Nothing about the models changed between those two numbers. Only the data in front of them did.

Regulatory accuracy versus price for three GPT models, alone and connected to the Obsidian data layer. — Regulatory accuracy against price. Connected to Obsidian (the wider coins), every GPT model converges near the top.

Regulatory accuracy versus average response time for three GPT models, alone and connected to Obsidian. — The same against response time.

The standout is the smallest one. GPT-5.4-nano, at $0.46 per million tokens and under two seconds an answer, scores 12 alone and 98.6 connected, the highest of the three and twenty-five times cheaper than GPT-5.5. On regulation, the data layer outweighs the model size. You are not buying accuracy from a bigger GPT. You are handing the data to a small one.

ChatGPT cannot point you to the official source

Ask a GPT model alone for the instrument behind a rule and you get a plausible-looking citation you then have to go and verify yourself, assuming it exists. Connected to Obsidian, every GPT answer came back with the official source attached, cited correctly on every single question across all three models: the instrument, its exact reference, the jurisdiction, the legal status, and a direct link to the official document, often the source PDF itself.

An answer with the tier-0 source attached is one you can forward to an auditor without re-checking it. That is the difference between a draft a model imagined and an obligation you can act on.

ChatGPT hallucinates

We broke every answer into its individual factual claims and checked each against the official source. Alone, 47% of a GPT answer's claims held up. Connected to Obsidian, 100% of them did, across all three models, and none contradicted the official text. A database built for the model to read does not make it incapable of error, but here it removed the dangerous kind entirely: the confident statement with nothing behind it.

The full data, for the purists

Three GPT models, both conditions. "Alone" is the model with no data layer; "with Obsidian" is the same model connected. Accuracy is a 0 to 100 score from a blind judge against human-verified ground truth. "Grounded claims" is the share of the answer's atomic factual claims that trace back to the official source, alone versus with Obsidian.

#	Model	Tier	Acc. alone	Acc. + Obsidian	Lift	Cites source	Status correct	Grounded claims (alone → +Obs)	Latency	Speed	Price /1M	Cost / question
1	gpt-5.4-nano	light	12.4	98.6	+86.2	100%	100%	30% → 100%	2.0s	107 tok/s	$0.46	$0.008
2	gpt-5.4-mini	mid	68.8	98.0	+29.2	100%	100%	57% → 100%	1.9s	96 tok/s	$0.70	$0.008
3	gpt-5.5	advanced	19.0	96.0	+77.0	100%	100%	50% → 100%	5.1s	58 tok/s	$11.25	$0.024

The questions are five current ESG cases, each in a different jurisdiction (Belgium, France, Italy, the Netherlands, the United States), answered by each model alone and with Obsidian, then scored by a blind judge against human-verified ground truth tied to the official source. This first run is a five-question pilot; a full-scale benchmark across more questions and domains is in progress, published the same way, every number reproducible.

Make ChatGPT the model in row one

Connect Obsidian to ChatGPT and every regulatory answer comes back with its official source, date, and legal status. Free tier, two-minute setup.

Explore the Obsidian data layer

What this means if you run regulatory work through ChatGPT

You do not need a bigger model, and you do not need to settle for guesses. The GPT you already use, given verified regulatory data, answers with the precision of a specialist and the receipts of an auditor, often for less than a cent a question. The background is here too: why AI hallucinates on regulatory questions, what tier-0 regulatory data is, and the idea of agentic regulatory intelligence. To test it on your own questions, connect the Obsidian regulatory data layer. The full cross-provider results are in the regulatory AI benchmark.

Request a Demo