What is the best Claude model for regulatory questions?

In our benchmark, every Claude model improved sharply with the Obsidian data layer. Claude Sonnet 4.6 connected to Obsidian scored 99 out of 100, the highest. Even Claude Haiku, the lightest, reached 90.2, up from 11.4 alone. The deciding factor was the data the model could reach, not the size of the Claude model.

Does Claude hallucinate on regulatory questions?

Answering alone, only 30 percent of a Claude answer's factual claims were grounded in a verifiable source. Connected to the Obsidian data layer, 84 percent were grounded and none contradicted the official text, so every citation resolves to a real, checkable document.

How do I make Claude cite official regulatory sources?

Connect Claude to a verified tier-0 regulatory data layer such as Obsidian through the Model Context Protocol. In our benchmark, every Claude answer then came back with the correct official instrument, its reference, the legal status, and a direct link to the source document.

The Best Regulatory AI Companion for Claude, Benchmarked

Claude is one of the strongest reasoners you can put in front of a problem. Ask it a regulatory question, though, and the fluent, confident answer often does not survive a fact-check: a regulation number that does not exist, a deadline long past, a "requirement" that was never adopted. The natural conclusion is that even a model this capable cannot be trusted on regulation.

It is the wrong conclusion, and the data shows why. Claude's reasoning was never the problem. Its reach was. A general model answers from a frozen snapshot of the open web, with no way to open the actual text of a regulation or to know whether it is in force today. Give Claude that text, and it stops guessing.

That text is what Obsidian supplies: a verified, tier-0 regulatory data layer Claude can query, returning the official wording, the current legal status, and a direct link to the source. We ran all three Claude models, Sonnet 4.6, Opus 4.8, and Haiku 4.5, through real, current regulatory questions, each answered twice. Once alone. Once connected to Obsidian.

47 → 96

Average regulatory accuracy, the Claude models alone vs. connected (out of 100)

100%

Connected Claude answers that came back with a direct link to the official source

30% → 84%

Share of a Claude answer's factual claims grounded in the official source

Claude is inaccurate for regulatory work

Alone, the three Claude models averaged 47 out of 100. Better than most of the field, and still not a number you would file. Connected to Obsidian, the same three averaged 96, and Sonnet 4.6 reached 99. Nothing about the model changed between those two figures. Only the data in front of it did.

Regulatory accuracy versus price for the three Claude models, alone and connected to the Obsidian data layer. — Regulatory accuracy against price. Connected to Obsidian (the wider coins), every Claude model converges near the top.

Regulatory accuracy versus average response time for the three Claude models, alone and connected to Obsidian. — The same against response time.

Haiku, the lightest and cheapest Claude, jumps from 11 to 90 once connected and lands beside Opus. You do not need the largest Claude to be accurate on regulation. You need to hand it the data.

Claude cannot point you to the official source

Ask Claude alone for the instrument behind a rule and you get a plausible-looking citation you then have to go and verify yourself, assuming it exists. Connected to Obsidian, every Claude answer came back with the official source attached, cited correctly on every question across all three models: the instrument, its exact reference, the jurisdiction, the legal status, and a direct link to the official document, often the source PDF itself.

An answer with the tier-0 source attached is one you can forward to an auditor without re-checking it. That is the difference between a draft Claude imagined and an obligation you can act on.

Claude hallucinates

We broke every answer into its individual factual claims and checked each against the official source. Alone, 30% of Claude's claims held up. Connected to Obsidian, 84%, and not one contradicted the official text. Claude writes rich, elaborated answers and adds context beyond the strict source, which is why that figure is not higher, but the dangerous kind of error, the confident statement with nothing behind it, is gone.

The full data, for the purists

Three Claude models, both conditions. "Alone" is the model with no data layer; "with Obsidian" is the same model connected. Accuracy is a 0 to 100 score from a blind judge against human-verified ground truth. "Grounded claims" is the share of the answer's atomic factual claims that trace back to the official source, alone versus with Obsidian.

#	Model	Tier	Acc. alone	Acc. + Obsidian	Lift	Cites source	Status correct	Grounded claims (alone → +Obs)	Latency	Speed	Price /1M	Cost / question
1	sonnet-4.6	mid	63.4	99.0	+35.6	100%	100%	29% → 75%	7.6s	56 tok/s	$6.00	$0.020
2	opus-4.8	advanced	67.4	98.0	+30.6	100%	100%	34% → 85%	6.5s	75 tok/s	$10.00	$0.033
3	haiku-4.5	light	11.4	90.2	+78.8	100%	80%	26% → 90%	4.3s	88 tok/s	$2.00	$0.022

The questions are five current ESG cases, each in a different jurisdiction (Belgium, France, Italy, the Netherlands, the United States), answered by each model alone and with Obsidian, then scored by a blind judge against human-verified ground truth tied to the official source. This first run is a five-question pilot; a full-scale benchmark across more questions and domains is in progress, published the same way, every number reproducible.

Make Claude the model in row one

Connect Obsidian to Claude and every regulatory answer comes back with its official source, date, and legal status. Free tier, two-minute setup.

Explore the Obsidian data layer

What this means if you run regulatory work through Claude

You do not need a different model, and you do not need to settle for guesses. The Claude you already use, given verified regulatory data, answers with the precision of a specialist and the receipts of an auditor. The background is here too: why AI hallucinates on regulatory questions, what tier-0 regulatory data is, and the idea of agentic regulatory intelligence. To test it on your own questions, connect the Obsidian regulatory data layer. The full cross-provider results are in the regulatory AI benchmark.

Request a Demo