If you track CSRD, the ESRS, or the EU Taxonomy through an AI assistant, you have probably been burned. Ask which instrument transposed CSRD in a given country and the answer comes back fluent and wrong: an ordinance that does not exist, a deadline from before the stop-the-clock, a rule quoted as binding when it is still a draft. The conclusion feels obvious: AI cannot be trusted on sustainability regulation.
It is the wrong conclusion. The models you already use, Claude and ChatGPT, reason about ESG rules perfectly well. What fails them is reach. A general model answers from a frozen snapshot of the open web, with no way to open the actual transposition text or to know whether a measure is in force today. Give it that text, and it stops guessing.
That text is what Obsidian supplies: a verified, tier-0 regulatory data layer built for AI, deep and current on ESG, returning the official wording, the legal status, and a direct link to the source. We ran six Claude and ChatGPT models on five live ESG questions, the CSRD transposition in Belgium, France, Italy, and the Netherlands, and the SEC climate rule in the United States, each answered twice. Once alone. Once connected to Obsidian.
AI is inaccurate on ESG regulation
Alone, the six models averaged 40 out of 100 on the ESG questions. That is the experience behind the complaint, and on those terms the complaint is fair. Connected to Obsidian, the same six averaged 97, and Claude Sonnet 4.6 reached 99. Nothing about the models changed between those two numbers. Only the data in front of them did.
The lift does not track price. GPT-5.4-nano, the cheapest model in the test at $0.46 per million tokens, scores 12 alone and 98.6 connected, ahead of models twenty times its cost. On CSRD and SEC questions, the data layer decides accuracy, not the size of the model.
AI cannot point you to the official ESG source
Ask a model alone which instrument transposed CSRD and you get a plausible-looking citation you then have to go and verify yourself, assuming it exists. Connected to Obsidian, every answer came back with the official source attached: the Belgian Law of 2 December 2024, the French Ordinance 2023-1142, the Italian conversion of the stop-the-clock decree, the Dutch implementing bill 36678, the SEC climate rule (Release 33-11275), each with its exact reference, its current legal status, and a direct link to the official document, often the source PDF itself.
An answer with the tier-0 source attached is one you can hand to your auditor or your assurance provider without re-checking it. That is the difference between a draft a model imagined and an obligation you can act on.
AI hallucinates
We broke every answer into its individual factual claims and checked each against the official source. Alone, 35% of a model's claims held up; the rest were unsupported, and some directly contradicted the official text. Connected to Obsidian, 90% of claims traced back to the source, and not one contradicted it. On a field where a wrong deadline or a draft mistaken for law has real cost, that is the number that matters.
The full data, for the purists
Six Claude and ChatGPT models, both conditions, on the five ESG questions. Accuracy is a 0 to 100 score from a blind judge against human-verified ground truth. "Grounded claims" is the share of the answer's atomic factual claims that trace back to the official source, alone versus with Obsidian.
| # | Model | Provider | Tier | Acc. alone | Acc. + Obsidian | Lift | Cites source | Status correct | Grounded claims (alone → +Obs) | Latency | Price /1M |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | sonnet-4.6 | Anthropic | mid | 63.4 | 99.0 | +35.6 | 100% | 100% | 29% → 75% | 7.6s | $6.00 |
| 2 | gpt-5.4-nano | OpenAI | light | 12.4 | 98.6 | +86.2 | 100% | 100% | 30% → 100% | 2.0s | $0.46 |
| 3 | gpt-5.4-mini | OpenAI | mid | 68.8 | 98.0 | +29.2 | 100% | 100% | 57% → 100% | 1.9s | $0.70 |
| 4 | opus-4.8 | Anthropic | advanced | 67.4 | 98.0 | +30.6 | 100% | 100% | 34% → 85% | 6.5s | $10.00 |
| 5 | gpt-5.5 | OpenAI | advanced | 19.0 | 96.0 | +77.0 | 100% | 100% | 50% → 100% | 5.1s | $11.25 |
| 6 | haiku-4.5 | Anthropic | light | 11.4 | 90.2 | +78.8 | 100% | 80% | 26% → 90% | 4.3s | $2.00 |
The five questions cover CSRD transposition in Belgium, France, Italy, and the Netherlands, and the SEC climate disclosure rule in the United States, each tied to its official source. This first run is a five-question pilot; a full-scale ESG benchmark across more questions is in progress, published the same way, every number reproducible.
Put the official CSRD source behind every answer
Connect Obsidian to Claude or ChatGPT and every ESG answer comes back with its official instrument, date, and legal status. Free tier, two-minute setup.
Explore the Obsidian data layerWhat this means for ESG and sustainability reporting teams
You do not need a more expensive model, and you do not need to accept guesses on CSRD or the ESRS. The assistant you already use, given verified ESG data, answers with the precision of a specialist and the official instrument attached. The background is here too: ESG and CSRD regulatory intelligence, the CSRD and ESRS Omnibus changes, and what tier-0 regulatory data is. To test it on your own questions, connect the Obsidian regulatory data layer. The full cross-provider results are in the regulatory AI benchmark.