Offchain Labs' new sampling-based protocol cuts AI inference verification from 15 minutes to milliseconds, solving a trust gap that has made it impossible for users to confirm what model actually runs on a GPU.
Offchain Labs' new sampling-based protocol cuts AI inference verification from 15 minutes to milliseconds, solving a trust gap that has made it impossible for users to confirm what model actually runs on a GPU.

Arbitrum developer Offchain Labs published a research paper on June 3 proposing a sampling-based method that reduces AI inference proof generation from roughly 15 minutes to milliseconds.
"The per-token pricing model creates a concrete economic incentive for fraud — serving a 7-billion-parameter model costs less than a 70-billion-parameter one, and running quantized inference is cheaper than full precision," the paper's authors at Offchain Labs wrote in the March 2026 study titled Towards Verifiable AI with Lightweight Cryptographic Proofs of Inference.
Existing cryptographic proofs, of the type used by zero-knowledge rollups, can demonstrate correct computation without re-execution, but schemes such as zkLLM require roughly 15 minutes to generate a single inference proof for a 13-billion-parameter model — a timeframe incompatible with APIs that must respond in under one second. The Offchain Labs proposal abandons exhaustive re-execution in favor of random sampling: the server commits to a digital fingerprint of the model weights and internal values, then the client selects a random path toward the output and asks the server to reveal only the values along that path. If the server substituted a smaller model, the values will be inconsistent and verification fails, with the probability of detection accumulating across repeated queries.
The protocol extends the same dispute resolution logic that protects Arbitrum One — optimistic rollups re-execute only the disputed step rather than every computation — to neural network inference, using a bisection procedure that narrows disagreement between two servers in a logarithmic number of rounds. For the emerging market of autonomous agents and regulated industries that require model governance, the difference between a transparency claim and a verifiable claim is beginning to carry direct consequences.
Stanford researchers documented that the behavior of GPT-3.5 and GPT-4 changed in measurable ways between March and June 2023 across the same evaluation tasks, according to the paper, yet the current API contract offers no mechanism to detect that difference. The economic incentive for model substitution scales with volume: a provider can redirect a fraction of queries to a smaller or quantized model while charging the fee of the larger one.
The connection to Arbitrum is explicit in the paper. Optimistic rollups operate on the same intuition — re-executing every step of a long computation on every machine is expensive, while sampling the disputed step is cheap. The proposed protocol extends that logic to neural network values, using the same bisection-based dispute resolution structure that protects Arbitrum One.
For regulated industries, model governance teams, and the emerging market of autonomous agents, the protocol does not require developers to modify their existing stacks; it only requires that someone in the system — whether the provider, the auditor, or the platform — produce a verifiable statement.
This article is for informational purposes only and does not constitute investment advice.