Claude is one of the strongest reasoners you can put in front of a problem. Ask it a regulatory question, though, and the fluent, confident answer often does not survive a fact-check: a regulation number that does not exist, a deadline long past, a "requirement" that was never adopted. The natural conclusion is that even a model this capable cannot be trusted on regulation.
It is the wrong conclusion, and the data shows why. Claude's reasoning was never the problem. Its reach was. A general model answers from a frozen snapshot of the open web, with no way to open the actual text of a regulation or to know whether it is in force today. Give Claude that text, and it stops guessing.
That text is what Obsidian supplies: a verified, tier-0 regulatory data layer Claude can query, returning the official wording, the current legal status, and a direct link to the source. We ran all three Claude models, Sonnet 4.6, Opus 4.8, and Haiku 4.5, through real, current regulatory questions, each answered twice. Once alone. Once connected to Obsidian.
Claude is inaccurate for regulatory work
Alone, the three Claude models averaged 47 out of 100. Better than most of the field, and still not a number you would file. Connected to Obsidian, the same three averaged 96, and Sonnet 4.6 reached 99. Nothing about the model changed between those two figures. Only the data in front of it did.
Haiku, the lightest and cheapest Claude, jumps from 11 to 90 once connected and lands beside Opus. You do not need the largest Claude to be accurate on regulation. You need to hand it the data.
Claude cannot point you to the official source
Ask Claude alone for the instrument behind a rule and you get a plausible-looking citation you then have to go and verify yourself, assuming it exists. Connected to Obsidian, every Claude answer came back with the official source attached, cited correctly on every question across all three models: the instrument, its exact reference, the jurisdiction, the legal status, and a direct link to the official document, often the source PDF itself.
An answer with the tier-0 source attached is one you can forward to an auditor without re-checking it. That is the difference between a draft Claude imagined and an obligation you can act on.
Claude hallucinates
We broke every answer into its individual factual claims and checked each against the official source. Alone, 30% of Claude's claims held up. Connected to Obsidian, 84%, and not one contradicted the official text. Claude writes rich, elaborated answers and adds context beyond the strict source, which is why that figure is not higher, but the dangerous kind of error, the confident statement with nothing behind it, is gone.
The full data, for the purists
Three Claude models, both conditions. "Alone" is the model with no data layer; "with Obsidian" is the same model connected. Accuracy is a 0 to 100 score from a blind judge against human-verified ground truth. "Grounded claims" is the share of the answer's atomic factual claims that trace back to the official source, alone versus with Obsidian.
| # | Model | Tier | Acc. alone | Acc. + Obsidian | Lift | Cites source | Status correct | Grounded claims (alone → +Obs) | Latency | Speed | Price /1M | Cost / question |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | sonnet-4.6 | mid | 63.4 | 99.0 | +35.6 | 100% | 100% | 29% → 75% | 7.6s | 56 tok/s | $6.00 | $0.020 |
| 2 | opus-4.8 | advanced | 67.4 | 98.0 | +30.6 | 100% | 100% | 34% → 85% | 6.5s | 75 tok/s | $10.00 | $0.033 |
| 3 | haiku-4.5 | light | 11.4 | 90.2 | +78.8 | 100% | 80% | 26% → 90% | 4.3s | 88 tok/s | $2.00 | $0.022 |
The questions are five current ESG cases, each in a different jurisdiction (Belgium, France, Italy, the Netherlands, the United States), answered by each model alone and with Obsidian, then scored by a blind judge against human-verified ground truth tied to the official source. This first run is a five-question pilot; a full-scale benchmark across more questions and domains is in progress, published the same way, every number reproducible.
Make Claude the model in row one
Connect Obsidian to Claude and every regulatory answer comes back with its official source, date, and legal status. Free tier, two-minute setup.
Explore the Obsidian data layerWhat this means if you run regulatory work through Claude
You do not need a different model, and you do not need to settle for guesses. The Claude you already use, given verified regulatory data, answers with the precision of a specialist and the receipts of an auditor. The background is here too: why AI hallucinates on regulatory questions, what tier-0 regulatory data is, and the idea of agentic regulatory intelligence. To test it on your own questions, connect the Obsidian regulatory data layer. The full cross-provider results are in the regulatory AI benchmark.