Marketing-segment export almost shipped a phone-number column — 2024

Cost: zero — caught in PR review before deploy; flagged here as the canonical near-miss · Time-to-detect: ~3 hours from PR open to PR block · Root cause class: T6 (information-flow reachability)

What happened

A growth-engineering team at a consumer fintech added a personal_phone column to dim_customers to support a new SMS-based win-back campaign. In the same PR, they wired the new column into marketing_segment_export — the dbt model that produces the daily CSV their marketing-automation vendor pulls. The model was a 200-line union over six customer cohorts, and the engineer added personal_phone to the output select * in three of the six branches.

What the PR did not include: a WHERE consent_marketing_sms = true filter on those three branches. Two of the six existing branches had the filter; the new branch the engineer copied did not. The customer-records table held phone numbers for every signed-up customer; only roughly 38% had given marketing-SMS consent. Without the filter, the next morning's export would have shipped phone numbers for the other 62% — including users who had explicitly opted out of marketing comms — to a third-party vendor outside the company's regulated boundary.

The PR was caught in code review by a data-platform engineer who asked the question "does the consent filter apply to every branch of this union?" and answered it manually. The fix was three lines.

The pattern

A column with a declared sensitivity classification (pii, marketing-consent-gated) reached a sink (vendor_export) along a code path where the consent filter was absent. The schema was unchanged, every existing dbt test passed, and there was no warehouse-level guard that distinguished "phone number with consent" from "phone number without". The only thing standing between the column and the vendor's S3 bucket was the engineer remembering the filter on every code path.

Any pipeline where a sensitive column reaches an external sink along an unfiltered code path that exists in parallel to filtered ones has this exposure: PII in marketing exports, financial data in BI dashboards, healthcare identifiers in analytics events, geolocation in product telemetry.

How veric would catch it

veric's T6 tier propagates a sensitivity label along every column in the model graph and a corresponding policy at every declared sink. Given dim_customers.personal_phone : pii(consent=marketing_sms) and marketing_segment_export : sink(allow=pii(consent=marketing_sms)), the verifier traces every path the column reaches the sink along and checks each path's predicate against the declared consent. In the PR diff, it would have flagged: "column personal_phone reaches marketing_segment_export via 3 branches; 2 branches enforce consent_marketing_sms = true, 1 branch does not — T6 information-flow VIOLATED, sensitivity policy unmet on path union_branch_3 → output."

This is the canonical near-miss veric is built for: not a postmortem about damage already done, but a PR check that turns a class of three-line oversight into a build-time failure.

Try it: open the example below and watch the verdict change as you toggle the offending pattern on and off.

Sources

Anonymised; pattern reflects the well-documented marketing-export PII-leak class. Public references include the FTC's 2023 enforcement actions on consent-gated data-sharing and the recurring discussion of consent-aware data exports in the dbt and Locally Optimistic data-engineering communities.