run.veric.dev
AI vertical · DP preview

Italian DPA temporary ChatGPT ban — March 30, 2023

Cost: ~one-month service blackout in Italy; €15M GPDP fine on OpenAI announced Dec 20, 2024 · Time-to-detect: triggered by a Mar 20, 2023 ChatGPT data-leak incident exposing other users' chat titles and partial payment data · Root cause class: T6 (information-flow reachability — EU personal data → training corpus with no documented lawful basis) compounded by T7-style erasure-completeness gap (no Art. 17 workflow)

What happened

On March 30, 2023 Italy's data-protection authority (Garante per la protezione dei dati personali, GPDP) ordered OpenAI to immediately stop processing the personal data of Italian data subjects in ChatGPT. The order followed a March 20, 2023 incident in which a Redis-client bug caused some logged-in users to see other users' chat titles and partial payment information. The GPDP framed that incident as the visible tip of a deeper compliance gap: (a) no lawful basis under GDPR Art. 6 had been documented for the mass ingestion of personal data scraped from the web into the training corpus; (b) data subjects had no meaningful Art. 13/14 transparency notice; (c) ChatGPT exposed minors to age-inappropriate content with no age-gate; (d) outputs frequently contained inaccurate personal data with no Art. 16 rectification workflow.

OpenAI took ChatGPT offline for Italian IPs the same day. Over the following four weeks OpenAI added a privacy notice to the registration flow, an opt-out page allowing users (including non-customers) to object to use of their data for training, an age-attestation prompt, and a workflow for Art. 16/17 requests. The service resumed in Italy on April 28, 2023.

In December 2024 the GPDP fined OpenAI €15 million for the underlying violations — concluding investigation that the original ban had paused — and ordered a six-month public-information campaign about ChatGPT's data-processing practices. The decision is the first major EU enforcement action against a foundation-model provider for training-data lawful-basis failures and is being cited across European DPAs as the template enforcement.

The pattern

A training corpus was assembled by web-scrape without a documented lawful basis under GDPR Art. 6 for the personal data within. There was no tagging step that would let downstream operators attest "every personal-data record in this corpus carries a lawful_basis ∈ {consent, legitimate_interest_documented, …} annotation." There was no deletion pathway that would let the controller execute an Art. 17 erasure request without retraining the entire model.

Any pipeline where a personal-data class flows into a training corpus without a per-record lawful-basis attestation, and where the trained artifact has no defined erasure pathway, fails the GDPR Art. 5 → Art. 17 chain. The Italian GPDP enforcement is the canonical reading of how that chain applies to LLM training.

Which tier failed

T6 information-flow at training-corpus assembly: no eu_personal_data → lawful_basis flow tag, so the assembly step had no contract to fail against. The visible Mar 20 incident exposed deployed-model data, but the GPDP's main finding was about the training-corpus lawful-basis gap — the deeper bug was that nobody could even enumerate, at assembly time, which records were in scope for which lawful basis.

T7-adjacent gap: when an EU data subject exercises Art. 17, the controller must demonstrate erasure. For training data baked into model weights, the only way to demonstrate completeness is to retrain. The absence of a structured erasure-completeness contract is what makes Art. 17 in the foundation-model context an open regulatory question (see also OpenAI memory + GDPR Art. 17 conflict).

What an AG-tower-driven control would have done

Two contracts, both refutable at compile time. First: every_record_in(corpus) where data_subject_jurisdiction = EU → has_tag(lawful_basis) — the training-pipeline build refuses to proceed if any EU-scoped record carries no Art. 6 attestation. Second: for_every(data_subject) ∃ erasure_path(data_subject) → model_artifact.weights — the model-card build refuses to proceed if there is no declared erasure pathway. Neither contract requires solving the "unlearning" problem; they just require honesty about what the artifact attests. A compile-time refutation is not a fine; it is a build break, weeks before the GPDP shows up.

See also

Sources

See the AI-provenance tag glossaryT6 · Information-flow reachability in the canonical glossary

Each refutation in this archive is a SARIF artifact a regulator could replay tomorrow — the same artifact format the SQL-vertical playground emits today, with the AI-provenance tag glossary swapped in.

These write-ups are journalism + product framing; they are not legal advice. Regulatory citations are best-effort references to public documents at time of writing. For anticipated cases, the entry labels the framing explicitly as anticipated rather than closed.