An explorable explanation

From the bug to the bet

Twelve beats. Each one is a single idea you can hold. Together they name the thing the runtime crowd has had no language for: a proof claim, made before the pipeline runs, on every input the schema allows.

Panel 1 of 12

The bug

Knight Capital, August 2012. $460 million lost in 45 minutes. The proximate cause was a deployment pipeline that activated dead code; the trigger was a SQL-shaped predicate that swept up trades it should have excluded. The firm nearly went under and was acquired within months.

Tests didn’t catch it because tests check known cases. The bug was in a case nobody thought to test. Knight’s engineers had unit tests, integration tests, a staging environment, and a deployment review. None of those artifacts ask the question that mattered: across every possible input, does this predicate behave the way I think it does? Testing is the wrong shape of instrument for that question.

The replay on the right is a SQL query structurally similar to the Knight pattern: an off-by-one in a date interval. The cursor walks through at reading speed; the chip stack shows the verifier’s abstract value at each line. At line 7 the verifier halts — the predicate’s interval runs backwards in time, and a counterexample lands in 30 milliseconds.

Nobody wrote a test for “d1 = today, no rows match the window.” Why would they? The bug lives precisely in the inputs nobody thought to enumerate.A property doesn’t need anyone to think of the input. It checks all of them.

Try the “Edit the bug” control once the verifier halts. Swap the interval bounds. Replay. The verifier passes. You just shipped a fix in front of the demo. Now: was Knight a one-off, or one of a family?

trade_window.sql · Knight 2012 patternplaying · line 1/9

1-- Daily trade rollup for the last 7 days, per symbol.
4  SUM(qty) AS total_qty,
5  COUNT(*) AS n_trades
6FROM trades
7WHERE date BETWEEN d1 AND d1 - INTERVAL '7 days'
8  AND status = 'filled'
9GROUP BY symbol;

abstract values

(waiting for the verifier to bind values…)

Panel 2 of 12

The class

Knight wasn’t unique. The same year, JPMorgan’s London Whale lost $6.2B partly because a spreadsheet’s VLOOKUP was fed a denominator that silently truncated past row 16,500. Citibank’s 2020 Revlon transfer sent $894M because a UI radio-button defaulted to the wrong selection. The UK’s Public Health England COVID export lost 16,000 cases because the upstream feed used .xls not .xlsx — and .xls caps at 65,536 rows.

These bugs are in the same family.Each is a property violated by a code path nobody enumerated. “No transaction may complete with negative net.” “Every order matches a customer.” “The interval [a, b] has a ≤ b.” The properties were never articulated; they lived as folklore in the heads of engineers who left two years before the deploy that broke the invariant.

Hover any row in the table on the right to see the SQL-shaped pattern that, in production, would have triggered the verifier. What looks like five different incidents is, mathematically, the same thing happening in five different settings: a value that could be one of N possibilities, a code path that handles M of them, M < N, and no test that exercised the (N − M) gap.

The veric thesis: all of these failures are checkable in a PR.Not by writing more tests, but by carrying enough information through the analysis to rule out the gap. So what does “enough information” look like? It starts with a statement testing can’t make.

Incidents · five years · same shape

YearIncidentCostRoot-cause class

2012Knight Capital$460MStale flag, divergent readers

2012London Whale (JPM)$6.2BSpreadsheet truncation

2020Citi / Revlon$900MUI off-by-one (transfer)

2020ICE Excel / Brexit16,000 cases lostFormat truncation

2018TSB Bank migration£330MSchema-rewrite race

Hover a row to see the pattern

-- (any row above)

Full archive: /incidents

Panel 3 of 12

The property

Tests check “for input X, output is Y.” Properties check “for allinputs in some class, output satisfies P.” A property is an infinity of tests collapsed into one statement.

The natural notation is mathematical: ∀ x ∈ Inputs, P(f(x)). The universal quantifier ∀ does the work — it ranges over every element of Inputs, not just the ones a developer thought to enumerate. The natural object that holds a property is a proof, not a test run.

On the right, pick any plain-English property and watch it typeset into math. The translation is mechanical: nouns become sets, verbs become predicates, “for every” becomes ∀, “there exists” becomes ∃. What looks like a leap from prose to symbols is, structurally, the same statement in two notations — one for humans, one for the verifier.

Once a property is written in this form, “does this hold?” stops being a question about yourintuition and becomes a question with a definite answer. Either the property holds for every input in the domain, or it doesn’t. The verifier’s job is to decide which.

The cost of doing this work historically has been writing the property by hand. Veric’s position: most of the properties you care about are inferable from the schema and the SQL.You don’t need to know what to assert. You need to know what to ship.But there’s a catch: checking ∀ by literal enumeration is exponential. The trick that makes it tractable is on the next panel.

Property typesetter · prose → math

plain English

All customer revenues are non-negative.

typeset

∀ c ∈ Customers, revenue(c) ≥ 0

domain: Customers (every row in the customer table)

Panel 4 of 12

The abstract value

Checking ∀ inputs literally would mean running the program on every possible input — exponential, dead before it starts. The trick that broke the exponential, in 1977, was abstract interpretation: summarize sets of inputs by an abstract value. Instead of carrying the integer 42, carry the symbol integer ≥ 0.

The set of all inputs satisfying integer ≥ 0 is infinite. The abstract value integer ≥ 0 is one symbol. The analyzer carries symbols, not values— and that’s the move that turns an exponential into a polynomial.

On the right is Cousot’s 1977 picture: the violet cone covers every concrete execution (amber threads). Drag the sliders. Widening forces convergence at the cost of precision; narrowing pulls the cone back toward the concrete hull. The cone is the statement “every real run lives somewhere inside this envelope” — and that envelope is something a verifier can reason about in finite time.

But the cone has structure. Concrete and abstract are not just two sets; they are connected by a precise mathematical correspondence, which is what makes the analysis sound rather than merely convenient.

The violet cone is the abstract value; the amber threads are concrete executions. Widening forces the cone to converge at a controlled precision cost.

Panel 5 of 12

The Galois connection

The mapping from concrete to abstract is not a casual simplification. Concrete and abstract values form posets — partially ordered by precision — and the abstraction-concretization pair (α, γ) is an adjunction. This is Cousot & Cousot’s 1977 insight, and it is the load-bearing piece of the whole edifice.

The property α(c) ⊑ a ⟺ c ⊑ γ(a) is what makes the analysis sound: the abstract value covers the concrete value iff the concrete value lies in the region the abstract value denotes. No false negatives leak through this equivalence. Soundness is a structural property of the math, not a guarantee the vendor signs in marketing copy.

Cast as a categorical diagram, the same statement reads as a commuting square: evaluate concretely then abstract, or abstract then evaluate — the two paths to the abstract meaning agree. That is the Conal-Elliott denotational-design idiom, and it is the picture every soundness proof reduces to.

With the connection in hand, the question becomes: how do abstract values relate to each other? The answer is the lattice.

The Galois pair (α, γ) is an adjunction between the concrete poset (left) and the abstract poset (right). α(c) ⊑ a iff c ⊑ γ(a) — the law that makes the analysis sound.

denotational design · the soundness square

evalAbstract ∘ α ⊑ ⟦α⟧ ∘ evalConcrete
the square commutes · soundness

Panel 6 of 12

The lattice

Abstract values form a lattice: any two of them have a least upper bound (their join ⊔ — the most precise common supertype) and a greatest lower bound (their meet ⊓ — the most precise common subtype). Joins compose information from divergent paths in a program; meets compose constraints. The whole structure is bounded by ⊤ (no information) at the top and ⊥ (impossible / no inputs) at the bottom.

Hover any node to see its upset (everything at-or-above) and downset (at-or-below). Click two nodes to flash their join (green) and meet (amber). The full catalog of canonical lattices — intervals, signs, parity, nullability — lives at /lattice.

The lattice is the data structure of the analysis.Every program point gets an abstract value; the value at each point is the join of values flowing in from predecessors. To run an analysis is to walk the program, propagate values, and watch the walk converge on a value at each point that doesn’t change anymore. That convergence has a name.

The boolean-3 Hasse diagram. Edges go upward by precision; ⊤ is the join of everything below, ⊥ the meet. Joins compose paths; meets compose constraints.

confluence · two reductions, one normal form

two reductions · one normal form

Panel 7 of 12

The fixpoint

A static analysis is a function f from program state to program state — propagating the abstract value through each line. The answer is lfp(f): the least fixpoint, the point where applying fone more time changes nothing. That stationary value is the analysis’s claim about the program.

Kleene: lfp(f) = ⊔ fⁿ(⊥). Start at ⊥ (no information), apply f, join, repeat. The analyzer iterates until it stops moving. For some lattices iteration terminates by itself; for others, widening forces termination at a controlled precision cost — the same widening cone from panel 4, doing its job in time rather than space.

On the right, watch a Kleene iteration converge on a real interval lattice. The bars climb, the values stabilize, the walk halts. That halting state is the proof.

Kleene: lfp(f) = ⊔ fⁿ(⊥). The analyzer climbs the lattice until f stops moving. Widening forces termination at a controlled precision cost.

Panel 8 of 12

The proof

Once lfp(f) is computed, checking the property P reduces to a single question: does the abstract value at every program point satisfy P? If yes, the property holds for every concrete input. If no, the failure tells us where to look for a counterexample.

The proof is constructive. The lattice trajectory isthe witness of correctness — not a separate certificate computed after the fact, but the byproduct of the analysis converging. When the verifier finishes, you don’t need to trust it: you can replay the trajectory and check each step yourself, or hand it to a second checker that re-validates from scratch. This is what “machine-checked” means in practice. Every trajectory veric ships gets a permalink at /proof.

On the right, a verifier run prints out, line by line. The lattice walk takes 47 milliseconds. 19 program points are checked. The verdict lands. Below the terminal, a cluster of tier shields fades in — each one certifies that the query cleared a specific class of correctness checks.

The relationship between “the proof” and the tier shields matters: the shields are not a UI label, they are a forwardable artifact. They get baked into the SARIF certificate (the next panel), they survive the round-trip through GitHub or any code-quality tool, they show up in the same JSON schema regulators have used for two decades. Correctness becomes auditable at the same friction surface as test coverage.

Soundness — the guarantee that if the verifier says PROVEN, the property really holds — falls out of the Galois connection from earlier. The abstraction is conservative; the fixpoint over-approximates the concrete behaviour; if the over-approximation satisfies the property, the underlying program does too. The math is the deliverable.

And when the math says fails, what does it hand back?

symbolic execution · all paths covered

4 paths · 4 verified · 0 unexplored

Proof terminal · lfp computed → verdict

tiers certified for this query

hoare triple · weakest precondition

wp ← postcondition

{ ? }

y := x + 1

{ ? }

z := y * 2

{ z > 0 }

Panel 9 of 12

The witness (when it fails)

A counterexample. When the analysis says “fails,” it does not just say “fails.” It constructs a minimum witness: the smallest concrete input that demonstrates the failure. Watch the wide failing input row on the right collapse to its 2-cell core — the verifier shrinks the evidence until nothing more can be removed without losing the failure.

Property-based testing pioneered this with Hypothesis and QuickCheck; abstract interpretation extends it to settings where you can’t even enumerate the input space. The witness is the bridge from the mathematical statement back to the engineer’s intuition— the moment a developer reads the row and says “oh, of course.”

But a witness in the developer’s terminal is not the only artifact the verifier produces. The same run also emits something durable, signable, and machine-readable.

A failing input row collapses to its minimum 2-cell core — the smallest concrete counterexample that demonstrates the property failure.

Panel 10 of 12

The certificate

The analyzer’s output is a proof certificate — a machine-checkable artifact attesting that a specific verifier checked a specific property on a specific input at a specific time. Watch the SARIF blob unfold on the right: schema, run metadata, the lattice trajectory, the verdict, the signature. Each field is something a regulator can replay tomorrow, or seven years from now.

Industries that demand correctness — avionics, payments, medical devices — have used certificates for decades. Data has not. Veric’s certificates are SARIF files: readable by any code-quality tool, signable with public-key cryptography, embeddable in a Git commit message or a CI badge. The same JSON schema regulators have used for two decades, now carrying proofs about your data pipeline.

None of this works without the engine that produces them — and the engine itself is the part most worth understanding.

The verifier emits a SARIF certificate: tier, ruleId, witness, signed hash. Readable by any code-quality tool, embeddable in a Git commit or CI badge.

Panel 11 of 12

The substrate

The verifier is not a hand-rolled tower of conditionals. It is generated by an attribute grammar tower — a sequence of compilers, each compiling the next, where the seed compiler compiles itself to a fixpoint. The seed has 58 productions; the fixpoint is currently at 2.85 megabytes of binary; the round-trip from seed-to-self closes in two passes.

What this gets you, structurally: the verifier’s rules are data, not code. New rules are new attributes; new abstract domains are new lattices; new SQL dialects are new grammars composed onto the existing tower. The substrate is reused at every layer. This is why veric ships rules at one-per-week velocity instead of one-per-quarter — the cost of a new check is the cost of writing the AG equation, not the cost of plumbing it through a hand-written analysis pipeline.

Self-hosting is the receipt that the architecture is sound. If the seed compiler can compile its own source code to a byte-identical second compiler, every shape the AG can express has been exercised by the AG itself. There is no dark corner of the substrate that has never been exercised in production — the substrate compiles itself in production every time we build.

Theoretically, this matters because abstract interpretation is a schema for analyses, not a single analysis. Cousot & Cousot (1977) showed that any sound static analysis can be cast as a Galois connection over a lattice. The substrate is the machinery for casting them: each analysis is a domain (the lattice) plus a transfer function (one AG equation per production) plus a property check (one more AG equation). The tower is the abstract-interpretation framework rendered as a self-extending Rust binary.

Pragmatically, this matters because the unit economics of static analysis vendors collapse if every new check requires a senior compiler engineer for a quarter. Veric’s substrate makes new checks an artifact a junior engineer can land in a day — often by composing existing primitives. The substrate is the moat. Which sets up the question of what, exactly, is being bet.

The attribute-grammar tower: each layer compiles the next. The seed (58 productions) compiles itself to a byte-identical fixpoint at 2.85MB — the receipt that the substrate is sound.

Panel 12 of 12

The bet

Every other code-correctness vendor makes a runtime claim: ran your test, ran your linter, ran your warehouse, looked at your PR. Veric is the first to make a proof claim: for all inputs conforming to the schema, the property holds. Runtime claims have a ceiling — the inputs you didn’t think to enumerate. Proof claims don’t.

The lineage is short and public. Pentium FDIV (1996) — the arithmetic bug that cost Intel $475M and made formal hardware verification a budget line. CompCert (2006) — the first production-grade formally verified C compiler. seL4 (2009) — the first formally verified operating-system microkernel. s2n (~2014) — Amazon’s TLS implementation, formally checked. Certora (~2022) — formal verification for Solidity smart contracts. Data, now. Each milestone happened when the cost of formal methods dropped below the cost of the failure class they prevented. SQL crossed that line in 2024.

The lane is empty in public. There is no “Certora for data.” The reference customers for SQL correctness today are still using runtime sampling, schema linters, and pull-request review — instruments that catch yesterday’s bug class and miss tomorrow’s. Every dbt project of non-trivial size has accumulated a folklore of “don’t touch this model on a Friday” rules; veric replaces the folklore with proof.

The substrate is built. The wedge is silent-correctness in dbt — the bugs that don’t throw errors, don’t fail tests, and don’t surface until a regulator asks why the quarterly revenue restated. The play is to be the canonical reference for the next decade: the company that, when a CTO asks “how do you know your data pipeline is correct?”, is the answer.

Three doors out: open the playground and watch a verifier run on a real query; open the equivalence demo and see two SQL queries proven semantically identical; or read the incident catalog and see the lineage of failures that motivated this whole approach.

Try the playground — verify a query in your browser

or:see equivalence proofs·read the incident catalog