run.veric.dev

Travel-ops booking summaries drift by one hour — Summer 2024

Cost: ~6 weeks of booking summaries off by one hour for half the year; daily revenue reports re-issued for the full DST window · Time-to-detect: ~42 days (May DST shift to first analyst escalation in late June) · Root cause class: T7 (range / interval inference on a timestamp join key)

What happened

A travel-operations platform serving short-term rentals ran a nightly dbt job that built a daily_bookings_summary table for ops dashboards and partner payouts. In a refactor early in 2024, an engineer normalised the bookings.created_at column from raw UTC to the property's local time inside the stg_bookings model — a sensible-looking change, since most downstream consumers wanted local-day buckets. The join with dim_properties.timezone happened in the same model and looked correct.

The problem was that a separate model, int_payment_events, had been written months earlier and continued to use UTC created_at as its join key against bookings.id. The two were combined in fct_booking_revenue via a join on (booking_id, created_at_day). After the refactor, half the year — northern-hemisphere DST — pushed the local-time created_at_day over a midnight boundary that the UTC payment events never crossed. Roughly 4-7% of bookings on any given day landed in the wrong daily bucket, and a small but persistent fraction of payouts went to the wrong reporting day.

The pattern

Two models that joined on a timestamp-derived key disagreed about that key's interval semantics. One model had silently retimed the key; the other had not. The schema was unchanged — created_at_day was still a DATE in both — so every existing schema check, type check, and dbt test passed. The drift was a property of the interval the column denoted, not of its declared type.

Any pipeline where a timestamp-derived join key is recomputed in one model but not its sibling, with no enforced agreement on the interval the key represents, has this exposure: timezone normalisations, calendar-week vs ISO-week derivations, fiscal-year offsets, "as-of" backdating in slowly-changing dimensions.

How veric would catch it

veric's T7 tier infers an interval annotation on every column in the model graph — for timestamps, the interval includes the timezone the value is denominated in. When fct_booking_revenue joins two columns with disagreeing inferred timezones (UTC vs Europe/London after DST), the verifier flags in the PR diff: "join key created_at_day has interval tz=local on stg_bookings and tz=UTC on int_payment_events; intervals are not unifiable — T7 range/interval VIOLATED, declare a coercion or align the source."

The same primitive catches the broader family: any join where one side has been silently retimed or rebucketed against the other.

Try it: open the example below and watch the verdict change as you toggle the offending pattern on and off.

See also

Sources

  • Anonymised; pattern composited from public dbt-community incident reports and the well-documented general DST-join failure mode (e.g. dbt Discourse, "timezone-aware joins" threads, 2022-2024).
  • General reference on timestamp-interval semantics: Snodgrass, Developing Time-Oriented Database Applications in SQL (Morgan Kaufmann, 1999).
Reproduce in playgroundT7 · Range / interval in the glossary

Opens the playground pre-loaded with a model that exhibits this pattern. Toggle the offending lines to watch the verdict change.