Monthly revenue silently wrapped past $2.1B — 2022
Cost: one month of executive-dashboard revenue figures reported as a small negative number; emergency restatement; downstream commission calculations re-run · Time-to-detect: ~36 hours from monthly close to first executive question · Root cause class: T1 (type conflict) compounded by T7 (range / interval bound)
What happened
A late-stage payments platform ingested transaction data from a legacy ledger system into its analytics warehouse. The ingestion pipeline declared the amount_cents column as INT32 — a choice that long predated the company's growth and had never been revisited. For per-transaction values, INT32's ~$21M ceiling was comfortable. For monthly aggregates, it wasn't.
When monthly revenue first crossed the equivalent of $2.147B in amount_cents (INT32 max = 2,147,483,647), the warehouse's monthly aggregation SUM(amount_cents) silently overflowed. The downstream warehouse — Postgres in the analytics layer, not the source system — interpreted the overflowed INT32 as a wrapped negative number, and the monthly executive dashboard rendered the company's revenue as roughly negative $40M.
The CFO noticed within two business days. A short-term fix cast the column to BIGINT at the aggregation step; a longer-term fix migrated the column type at the source. Commission and revenue-share calculations that had run against the bad number were re-issued, and one quarterly KPI report had to be republished.
The pattern
A type's representable range was lower than the value the pipeline could legitimately produce, and the overflow was silent rather than loud. The column type was correct for the per-row use case and wrong for the aggregated use case; nothing in the pipeline tracked the value-range under aggregation. Schema tests passed because the type was unchanged and every individual value fit; row-level data quality tests passed because each transaction was within the per-row bound.
Any pipeline where a numeric column's declared type is sufficient for individual row values but insufficient for downstream aggregations has this exposure: USD revenue in INT32, event counts in INT32 for high-volume products, byte-size columns in 32-bit types when files grow past 2GB, millisecond timestamps in 32-bit columns.
How veric would catch it
veric's T1 tier checks declared types against operations, and the T7 tier propagates a value-range annotation along the model graph that includes range expansion under aggregation. Given transactions.amount_cents : int32, range(0, 21_474_836), the verifier infers that SUM(amount_cents) GROUP BY month projects to a range bounded by range(0, monthly_txn_count) × max(amount_cents). When that projected range exceeds int32's representable range, it flags: "aggregation SUM(amount_cents) projects range (0, ~9.2e12) for monthly grain; result column declared int32 with range (-2.1e9, 2.1e9) — T1/T7 range VIOLATED, widen the result type or declare a tighter range bound."
The same primitive catches the family: any aggregation whose projected range exceeds its declared type's capacity.
Try it: open the example below and watch the verdict change as you toggle the offending pattern on and off.
See also
- /explore — the abstract value — range under aggregation is exactly the kind of widening reasoning the cone illustrates.
- /explore — the fixpoint — the value-range propagating along the model graph converges at a least fixpoint.
- Adjacent incidents: Zillow iBuyer, PHE Excel truncation.
Sources
- Anonymised; pattern reflects the well-documented INT32 overflow class. Public references include the YouTube "Gangnam Style" 2,147,483,647 view-count overflow (Google, 2014) and the recurring discussion of monetary-aggregation overflow in payments-systems engineering literature (e.g. Stripe's published guidance on storing money values).