run.veric.dev
AI vertical · DP preview

Replika — Italian DPA enforcement, February 2, 2023

Cost: Italian processing ban from Feb 3, 2023 (formally lifted only after extensive remediation); €5M GPDP fine on Replika developer Luka Inc. (Apr 10, 2025); €5M further fine on the training of the Replika LLM (May 19, 2025) · Time-to-detect: months between user reports and GPDP order · Root cause class: T6 (information-flow reachability — minors' personal data → training corpus with no age-verification or lawful basis) compounding into T7-style erasure-completeness gap

What happened

On February 2, 2023, Italy's GPDP issued an immediate-effect order (Provv. n. 9852506) prohibiting Replika — an AI-companion app developed by Luka Inc. — from processing the personal data of Italian users. The GPDP's findings were stark: Replika had no effective age-verification mechanism, no clear lawful basis for processing personal data of users (some of whom were under 18 and some under 13), and the chatbot was reported to produce sexually explicit content in conversations with users including minors. The order required Replika to stop processing Italian users' data within 20 days under threat of penalty up to €20M.

In April 2025 the GPDP issued a €5M fine against Luka Inc. for the original 2023 violations — confirming the lawful-basis, transparency, and minor-protection failures the temporary ban had paused. In May 2025, the GPDP extended its enforcement to the underlying language model itself, fining Luka an additional €5M for failures in the training stage: no documented lawful basis for ingesting user-conversation data into model training, no per-conversation deletion pathway when users exercised Art. 17, and no demonstrated compliance with GDPR Art. 5(1)(c) data minimisation in the corpus that fine-tuned the deployed model.

The two-fine pattern is doctrinally important: the GPDP separated deployment-stage violations (the 2023 finding — chatbot deployed without age-verification, lawful basis, transparency) from training-stage violations (the 2025 finding — user data ingested into model fine-tuning without independent lawful basis or erasure pathway). This is the architecture European DPAs are settling on: every stage of the pipeline must independently attest its own lawful basis, and the trained artifact must independently attest erasure-completeness.

The pattern

A consumer surface (the Replika chat app) collected personal data from users, including unverified minors. That personal data flowed into both (a) the deployed inference path and (b) the fine-tuning corpus that updated the underlying model. There was no compile-time check that "every record entering the fine-tuning corpus carries a lawful_basis ∈ {…} attestation, and class:minor records are excluded entirely." There was no erasure pathway — when a user deleted their account, the GPDP found no demonstrated process for removing their conversation contributions from the trained model checkpoints.

This is the same architectural shape as the Italian DPA ChatGPT ban, but with the additional aggravating factor that the data subjects included identified minors and the data class included reportedly explicit chat content. Any pipeline where end-user-generated content flows into model fine-tuning without a per-record minority check, lawful-basis attestation, and an erasure pathway carries the same exposure.

Which tier failed

T6 information-flow on the fine-tuning corpus assembly: class:minor_user should have been a flow tag that the corpus assembler refused to admit; class:eu_personal_data → lawful_basis was the missing attestation across the entire ingestion path. The GPDP's 2025 fine isolates this exactly: training-stage processing requires its own lawful basis, separate from deployment-stage processing.

T7-adjacent failure: when an EU minor (or their parent) exercises Art. 17, the controller must demonstrate erasure across both the operational database and the trained-model checkpoints that incorporated the user's data. Replika could not demonstrate the second.

What an AG-tower-driven control would have done

Three contracts at the corpus-assembly stage. First: class:minor_user ∉ corpus.fine_tuning_inputs — refutes at compile time when the assembly graph admits a record sourced from a user whose age-verification flag is unverified or under_18. Second: every_record where data_subject_jurisdiction = EU → has_tag(lawful_basis) — refutes when the corpus admits any EU record without an Art. 6 attestation. Third: for_each(data_subject) ∃ erasure_pathway → corpus_record_set — refutes at model-card build time when the artifact has no defined Art. 17 implementation.

None of these require solving "machine unlearning." They require honesty about which classes the assembly graph admits and which the model card attests are recoverable. The GPDP's 2025 ruling is the doctrinal endpoint: training-stage and deployment-stage need separate, refutable contracts. AG-tower's job is to produce them.

See also

Sources

See the AI-provenance tag glossaryT6 · Information-flow reachability in the canonical glossary

Each refutation in this archive is a SARIF artifact a regulator could replay tomorrow — the same artifact format the SQL-vertical playground emits today, with the AI-provenance tag glossary swapped in.

These write-ups are journalism + product framing; they are not legal advice. Regulatory citations are best-effort references to public documents at time of writing. For anticipated cases, the entry labels the framing explicitly as anticipated rather than closed.