Public Health England Excel row-truncation — October 2020
Cost: ~15,841 unreported COVID-19 positive test results over one week, with downstream contact-tracing failure for tens of thousands of close contacts · Time-to-detect: ~7 days (Sep 25 – Oct 2, 2020) · Root cause class: T4 (cardinality / silent truncation)
What happened
Between September 25 and October 2, 2020, Public Health England (PHE) failed to report 15,841 positive COVID-19 cases to the UK's contact-tracing system. The cases were not lost from the originating labs; they were dropped during a CSV-to-Excel ingestion step. Per PHE's own statement and reporting in the BBC, The Register, and a City University technical analysis, PHE's pipeline received CSV files from testing labs and loaded them into Excel templates that downstream systems could ingest. The templates were saved in the legacy .xls format, which has a hard 65,536-row limit. With several rows per test result, each template silently capped at roughly 1,400 cases. When a file exceeded that, Excel did not error — it dropped the overflow rows. Contact tracing for those cases was delayed by days at a critical point in the second wave.
The pattern
A pipeline stage had a hard cardinality limit (65,536 rows) that was lower than its realistic input size, and the limit was enforced by silent truncation rather than a hard error. Any pipeline step where the maximum-row capacity of a downstream sink is not written down as a verified upper bound on its upstream source has this exposure: BigQuery/Snowflake export limits, CSV loaders with row caps, dashboard tools that quietly stop rendering past N rows, JSON serializers that crash above a payload size.
How veric would catch it
veric's cardinality tier (T4) tracks declared row-count bounds along the model graph. If the schema for the xls_template sink declared max_rows = 65536, veric would compare it against the upstream lab_results source's projected cardinality and flag in the PR diff: "sink xls_template.rows has declared max 65,536; upstream lab_results projects > 65,536 rows for week of 2020-09-25 — T4 cardinality-bounded VIOLATED, no spill or pagination declared." Honest scope: veric requires the bound to be declared. The deeper failure here was nobody writing the .xls row limit down at all — but a single one-line annotation makes the silent truncation provably loud.
Try it: open the example below and watch the verdict change as you toggle the offending pattern on and off.
See also
- /explore — the class — the cardinality-truncation pattern as one shape in a family.
- /explore — the abstract value — declared row bounds are an abstract domain veric tracks along the model graph.
- Adjacent incidents: USD revenue int32 overflow, marketplace refund double-count.
Sources
- BBC News, "Excel: Why using Microsoft's tool caused Covid-19 results to be lost" (Oct 5, 2020): https://www.bbc.com/news/technology-54423988
- The Register, "Excel spreadsheet blunder blamed after England under-reports 16,000 COVID-19 cases" (Oct 5, 2020): https://www.theregister.com/2020/10/05/excel_england_coronavirus_contact_error/
- City, University of London short-courses blog, technical analysis (Oct 13, 2020): https://blogs.city.ac.uk/cityshortcourses/2020/10/13/what-really-caused-the-excel-error-in-nhs-test-and-trace-covid-19-system-an-in-depth-technical-analysis/