Strava global heat-map deanonymization — January 2018
Cost: Disclosed locations of US, UK, French, and Russian forward operating bases and patrol routes; multi-government security review; long-term GPS-data policy revisions across the fitness-app industry · Time-to-detect: ~2 months from heat-map release (Nov 2017) to first public identification (Jan 2018) · Root cause class: T8 (lattice PII — sensitivity label collapsed under join / aggregation)
What happened
In November 2017, Strava published a global heat-map visualizing roughly 1 billion activities — runs, rides, and other GPS-recorded exercise sessions — uploaded by its users since 2015. Each individual point on the map was the aggregate of thousands of anonymized sessions. Per the Washington Post and BBC reporting that surfaced in late January 2018 (originally flagged on Twitter by analyst Nathan Ruser), the aggregation was sufficient to reveal the layout, perimeter, and patrol routes of US military bases in Syria and Afghanistan, French outposts in Niger, a suspected CIA black site, and Russian forces in eastern Ukraine — locations that did not appear on conventional public maps. Soldiers running daily perimeter loops, jogging with phones on, produced a thermal signature that no individual user's data would have given away.
Strava had treated each session as anonymized for the purposes of heat-map publication: no user IDs, no timestamps, no individual tracks rendered. Each point on the heat-map was the join (under spatial bucketing) of many users. The label "anonymous" survived that join in the system's own bookkeeping; the information content did not. US Central Command opened a review; Strava added opt-out controls and removed several segments; the Department of Defense issued new guidance on personal-fitness-device use in operational areas.
The pattern
A dataset was tagged "anonymous" or "non-PII" at the row level, then aggregated through a spatial / temporal join. The aggregation produced a derived value whose sensitivity was strictly higher than any individual contributor's. The system's policy engine — to the extent one existed — tracked the input label, not the output label. This is the canonical lattice-PII failure: sensitivity is not closed under the operations the warehouse provides. Two anonymous rows joined produce a row that is not anonymous; the label public at the meet of two secret inputs is sound only if the abstraction respects the lattice's join structure.
Any pipeline where a public-tier output is computed from inputs whose individual tiers permit it but whose joint information content does not has this exposure: anonymized location aggregations, k-anonymized health datasets joined back to a small geographic bucket, ad-targeting cohorts derived from cross-referenced "non-PII" attributes.
How veric would catch it
veric's T8 tier tracks information-flow labels through every join, union, and aggregate, and verifies that the output label is at least as restrictive as the lattice-meet of every input contribution to that output. In the SQL/dbt analog: a model that aggregates user_locations (declared pii.geo) and produces a heat-map cell would carry the pii.geo label forward unless the aggregate is bounded by a k-anonymity threshold whose minimum is statically provable. With a declared policy ("heatmap.cell may not be derived from any path whose joined cardinality across users is less than k=N at any spatial resolution"), the verifier flags: "model heatmap_publish.cell joins gps_traces (label pii.geo); aggregate group-by-grid produces buckets where minimum group-size is not statically bounded — T8 lattice-PII VIOLATED."
Honest scope: veric does not bound the information in an aggregate; it bounds the cardinality over which the aggregate is taken, and refuses to drop the label until the cardinality is provably above the declared threshold. That refusal is what would have stopped a public publish step on a base-perimeter-shaped query.
Try it: open the example below and watch the verdict change as you toggle the offending pattern on and off.
See also
- /explore — the property — “sensitivity survives every operator” is the kind of universally-quantified statement that no test corpus can establish.
- /explore — the substrate — lattice-tracking through join is exactly what the attribute-grammar tower buys you cheaply.
- /explore — the certificate — a SARIF artifact attesting "no path with less-than-k aggregation" is the artifact a regulator can replay.
- Adjacent incidents: Equifax 2017, marketing PII near-miss 2024.
Sources
- Washington Post, "U.S. soldiers are revealing sensitive and dangerous information by jogging" (Jan 29, 2018): https://www.washingtonpost.com/world/a-map-showing-the-users-of-fitness-devices-lets-the-world-see-where-us-soldiers-are-and-what-they-are-doing/2018/01/28/86915662-052d-11e8-aa61-f3391373867e_story.html
- BBC, "Fitness app Strava lights up staff at military bases" (Jan 29, 2018): https://www.bbc.com/news/technology-42853072
- Nathan Ruser, original Twitter thread (Jan 27, 2018): https://twitter.com/Nrg8000/status/957318498102865920
- Wikipedia, "Strava heatmap controversy" (consolidated timeline): https://en.wikipedia.org/wiki/Strava#Privacy_concerns