How do I get AI to cite the exact chemicals instrument or amendment?

Connect your assistant to the Obsidian tier-0 chemicals data layer over MCP. It returns the convention or regulation, the exact COP decision or REACH Annex XVII entry, the current revision and a direct link to the official text. Free tier, about two minutes to set up.

Which AI model is best for chemicals regulation questions?

The model matters less than the data it can reach. The cheapest model, once connected, matched frontier models answering alone on the chemicals set, so connect the assistant you already use and pick the tier on cost.

How current is the chemicals data, for things like UN GHS revisions and REACH restrictions?

The data layer is maintained from the primary sources, the convention secretariats, the UN and ECHA, with each listing and revision tracked, so a connected model cites the current revision and entry rather than a superseded one it may have learned in training.

Regulatory AI for Chemicals Compliance: Claude and ChatGPT, Benchmarked

If you handle chemicals compliance, the hard part is that nothing stands still. Substances move onto restriction and candidate lists, hazard classifications get revised, the global move on PFAS keeps shifting, and the international conventions add entries at every meeting. Ask an AI whether a substance is restricted under REACH, what its current GHS classification is, or whether a persistent pollutant has been listed, and the answer comes back confident and often a revision or two out of date, sometimes citing a rule that does not exist.

The models reason about chemicals rules perfectly well. What fails them is reach: a general model cannot open the current restriction list or the latest convention text, and has no way to know which revision is in force. Give it that text, and it stops guessing.

That text is what Obsidian supplies, deep on the global chemicals regimes. We put the models through hundreds of complex chemicals tasks across REACH, CLP, the UN GHS and the Stockholm, Basel, Rotterdam and Minamata Conventions, each handled alone and connected to Obsidian.

53 → 95

Average regulatory accuracy, the same models alone vs. connected (out of 100)

24% → 91%

Share of an answer's factual claims grounded in the official source

96%

Connected answers that cited the correct official source

AI is inaccurate on chemicals regulation

Alone, the models averaged 53 out of 100. Connect them to Obsidian and the average climbs to 95. The best pairing, gpt-5.4-mini, reached 95.8. The models did not change between those two numbers. Only the data in front of them did.

Regulatory accuracy versus price per 1M tokens — Regulatory accuracy against price. Connected to Obsidian (the wider coins), every model converges near the top.

Regulatory accuracy versus average response time in seconds — The same against response time.

Chemicals work punishes stale knowledge harder than almost any field: a restriction status, a hazard classification or a listing that changed last quarter, and an answer built on last year's revision is simply wrong. That is where the data layer earns its place. gemini-3.1-flash-lite, at $0.175 per million tokens, climbs from 56 to 95 once connected, into the band of models many times its price. A light-tier model connected to Obsidian beat a frontier model answering alone in 16 of 16 head-to-head pairings on the chemicals set.

AI cannot point you to the official chemicals source

For a product-stewardship or regulatory-affairs team the citation is the deliverable. Connected to Obsidian, an answer arrives with the regime, the current restriction or listing, the revision in force and a direct link to the official text. Alone, you get a plausible reference to verify yourself, on questions where the status and the revision are the entire answer, and where a wrong call can hold up a product.

An answer with the tier-0 source attached is one you can forward to an auditor without re-checking it. That is the difference between a draft a model imagined and an obligation you can act on.

AI hallucinates

We broke every answer into its individual factual claims and checked each against the official source. The gap between the two grounded-claim numbers above is, for a substance restriction, a classification or a listing, the difference between an answer you can act on and one you re-check line by line. The ungrounded remainder is added context, not invented references.

The full data, for the purists

Every model, both conditions. "Alone" is the model with no data layer; "with Obsidian" is the same model connected. Accuracy is a 0 to 100 score from a blind judge against human-verified ground truth. "Grounded claims" is the share of the answer's atomic factual claims that trace back to the official source, alone versus with Obsidian.

#	Model	Provider	Tier	Acc. alone	Acc. + Obsidian	Lift	Cites source	Status correct	Grounded claims (alone → +Obs)	Latency	Speed	Price /1M	Cost / question
1	gpt-5.4-mini	OpenAI	mid	63.0	95.8	+32.8	96%	100%	35% → 97%	1.14s	83 tok/s	$0.7	$0.000486
2	grok-4.3	xAI	mid	53.1	95.8	+42.7	97%	100%	29% → 94%	3.22s	120 tok/s	$1.562	$0.002179
3	gpt-5.4-nano	OpenAI	light	38.3	95.5	+57.2	95%	100%	22% → 96%	1.21s	84 tok/s	$0.463	$0.000302
4	opus-4.8	Anthropic	advanced	58.4	95.5	+37.1	97%	100%	20% → 85%	3.7s	71 tok/s	$10.0	$0.013676
5	gemini-3.1-flash-lite	Google	light	56.3	95.4	+39.1	93%	100%	22% → 98%	0.74s	118 tok/s	$0.175	$0.000125
6	gpt-5.5	OpenAI	advanced	40.7	95.4	+54.7	95%	100%	46% → 96%	4.76s	33 tok/s	$11.25	$0.009351
7	grok-4.20-reasoning	xAI	advanced	56.4	95.0	+38.6	96%	100%	24% → 92%	2.49s	225 tok/s	$6.0	$0.012179
8	sonnet-4.6	Anthropic	mid	59.1	95.0	+35.9	96%	100%	21% → 83%	6.2s	50 tok/s	$6.0	$0.007406
9	grok-3-mini	xAI	light	49.7	94.8	+45.1	95%	98%	32% → 91%	3.19s	118 tok/s	$0.35	$0.000479
10	gemini-3.5-flash	Google	mid	60.0	94.4	+34.4	99%	100%	22% → 94%	2.9s	178 tok/s	$3.375	$0.006279
11	gemini-3.1-pro	Google	advanced	61.1	93.9	+32.8	95%	100%	23% → 96%	5.79s	111 tok/s	$6.0	$0.013897
12	haiku-4.5	Anthropic	light	41.5	93.6	+52.1	95%	100%	18% → 88%	1.97s	87 tok/s	$2.0	$0.001841

On a domain that punishes stale knowledge, the connected accuracy and the grounded-claim jump are the tests that matter, and the data layer clears both.

How we measured it

The full model set from Anthropic, OpenAI, Google and xAI.
Hundreds of complex chemicals tasks across REACH, CLP, the UN GHS, the Stockholm, Basel, Rotterdam and Minamata Conventions and the Global Framework on Chemicals, each tied to its official source and current revision.
Two conditions: the model alone, and connected to Obsidian.
A blind judge scores each answer; grounded claims come from a separate per-claim check against the official source.

Put the official chemicals source behind every answer

Connect Obsidian to the AI you already use and every REACH, GHS or convention answer comes back with the regime, the current restriction and the revision in force. Free tier, two-minute setup.

Explore the Obsidian data layer

What this means

For chemicals and advanced-materials teams tracking restrictions, listings and classifications across jurisdictions, the assistant you already use, given verified data, answers with the official source attached, so a regulatory specialist can act on it rather than re-checking it. The background is here too: tier-0 regulatory data and agentic regulatory intelligence. The full cross-industry results are in the regulatory AI benchmark. To test it on your own questions, connect the Obsidian regulatory data layer.

Request a Demo