If you track CSRD, the ESRS, or the EU Taxonomy through an AI assistant, you have probably been burned. Ask which instrument transposed CSRD in a given country and the answer comes back fluent and wrong: an ordinance that does not exist, a deadline from before the stop-the-clock, a rule quoted as binding when it is still a draft. The conclusion feels obvious: AI cannot be trusted on sustainability regulation.

It is the wrong conclusion. The models you already use, Claude and ChatGPT, reason about ESG rules perfectly well. What fails them is reach. A general model answers from a frozen snapshot of the open web, with no way to open the actual transposition text or to know whether a measure is in force today. Give it that text, and it stops guessing.

That text is what Obsidian supplies: a verified, tier-0 regulatory data layer built for AI, deep and current on ESG, returning the official wording, the legal status, and a direct link to the source. We ran six Claude and ChatGPT models on five live ESG questions, the CSRD transposition in Belgium, France, Italy, and the Netherlands, and the SEC climate rule in the United States, each answered twice. Once alone. Once connected to Obsidian.

40 → 97
Average accuracy on the ESG questions, the same models alone vs. connected (out of 100)
100%
Connected answers that came back with a direct link to the official ESG source
35% → 90%
Share of an answer's factual claims grounded in the official source

AI is inaccurate on ESG regulation

Alone, the six models averaged 40 out of 100 on the ESG questions. That is the experience behind the complaint, and on those terms the complaint is fair. Connected to Obsidian, the same six averaged 97, and Claude Sonnet 4.6 reached 99. Nothing about the models changed between those two numbers. Only the data in front of them did.

Regulatory accuracy versus price for six Claude and ChatGPT models on ESG questions, alone and connected to the Obsidian data layer.
Accuracy on the ESG questions against price. Connected to Obsidian (the wider coins), every model converges near the top.
Regulatory accuracy versus average response time for six Claude and ChatGPT models on ESG questions, alone and connected to Obsidian.
The same against response time.

The lift does not track price. GPT-5.4-nano, the cheapest model in the test at $0.46 per million tokens, scores 12 alone and 98.6 connected, ahead of models twenty times its cost. On CSRD and SEC questions, the data layer decides accuracy, not the size of the model.

AI cannot point you to the official ESG source

Ask a model alone which instrument transposed CSRD and you get a plausible-looking citation you then have to go and verify yourself, assuming it exists. Connected to Obsidian, every answer came back with the official source attached: the Belgian Law of 2 December 2024, the French Ordinance 2023-1142, the Italian conversion of the stop-the-clock decree, the Dutch implementing bill 36678, the SEC climate rule (Release 33-11275), each with its exact reference, its current legal status, and a direct link to the official document, often the source PDF itself.

An answer with the tier-0 source attached is one you can hand to your auditor or your assurance provider without re-checking it. That is the difference between a draft a model imagined and an obligation you can act on.

AI hallucinates

We broke every answer into its individual factual claims and checked each against the official source. Alone, 35% of a model's claims held up; the rest were unsupported, and some directly contradicted the official text. Connected to Obsidian, 90% of claims traced back to the source, and not one contradicted it. On a field where a wrong deadline or a draft mistaken for law has real cost, that is the number that matters.

The full data, for the purists

Six Claude and ChatGPT models, both conditions, on the five ESG questions. Accuracy is a 0 to 100 score from a blind judge against human-verified ground truth. "Grounded claims" is the share of the answer's atomic factual claims that trace back to the official source, alone versus with Obsidian.

#ModelProviderTierAcc. aloneAcc. + ObsidianLiftCites sourceStatus correctGrounded claims (alone → +Obs)LatencyPrice /1M
1sonnet-4.6Anthropicmid63.499.0+35.6100%100%29% → 75%7.6s$6.00
2gpt-5.4-nanoOpenAIlight12.498.6+86.2100%100%30% → 100%2.0s$0.46
3gpt-5.4-miniOpenAImid68.898.0+29.2100%100%57% → 100%1.9s$0.70
4opus-4.8Anthropicadvanced67.498.0+30.6100%100%34% → 85%6.5s$10.00
5gpt-5.5OpenAIadvanced19.096.0+77.0100%100%50% → 100%5.1s$11.25
6haiku-4.5Anthropiclight11.490.2+78.8100%80%26% → 90%4.3s$2.00

The five questions cover CSRD transposition in Belgium, France, Italy, and the Netherlands, and the SEC climate disclosure rule in the United States, each tied to its official source. This first run is a five-question pilot; a full-scale ESG benchmark across more questions is in progress, published the same way, every number reproducible.

Put the official CSRD source behind every answer

Connect Obsidian to Claude or ChatGPT and every ESG answer comes back with its official instrument, date, and legal status. Free tier, two-minute setup.

Explore the Obsidian data layer

What this means for ESG and sustainability reporting teams

You do not need a more expensive model, and you do not need to accept guesses on CSRD or the ESRS. The assistant you already use, given verified ESG data, answers with the precision of a specialist and the official instrument attached. The background is here too: ESG and CSRD regulatory intelligence, the CSRD and ESRS Omnibus changes, and what tier-0 regulatory data is. To test it on your own questions, connect the Obsidian regulatory data layer. The full cross-provider results are in the regulatory AI benchmark.