run.veric.dev
AI vertical · DP preview

EU AI Act Art. 53(1)(d) summary-completeness enforcement — anticipated, 2026+

Status: anticipated enforcement; no first action filed at time of writing · Indicative cost ceiling: EU AI Act Art. 99 max — €15M or 3% worldwide turnover · Time-to-detect: mandatory from Aug 2, 2025 for new GPAI models (legacy models from Aug 2, 2027) per Art. 113; first AI-Office referrals expected in 2026 · Root cause class: T8 (provenance-flow — training-data summary fails to enumerate corpus content)

What happened (anticipated framing)

This entry projects a near-certain enforcement action — what an Art. 53(1)(d) referral will look like — rather than a closed case. Treat it as a forecast grounded in the regulatory text and template, not as a record of an action that has occurred.

EU AI Act Art. 53(1)(d), in force since August 2, 2025 for newly placed GPAI models (legacy models obligated from Aug 2, 2027), requires general-purpose AI providers to publish a "sufficiently detailed summary about the content used for training" using a template the European Commission published on July 24, 2025. The template requires per-model disclosure of: data sources by category, the structure and modality of each, identifiable data sources or representative samples, types of personal data and lawful basis, treatment of opt-outs under DSM Directive Art. 4(3), and the methodology used to generate the summary.

What the template does not say — but what enforcement will turn on — is the question of completeness. If a provider lists "Common Crawl, Wikipedia, licensed publishers, and other public-web sources," and a journalist or rights-holder demonstrates that a specific work was in the training corpus and is not enumerated by any of those categories, the summary is not sufficient. Either the category list is wrong, the methodology is wrong, or the corpus has been misrepresented. All three are Art. 99 violations.

The first enforcement action under Art. 53(1)(d) — by the Commission's AI Office, by a national DPA, or by a private claimant under Art. 85 — is anticipated within 18 months of the obligation taking effect. The likely fact pattern: a journalist or rights-holder team selects 50–100 known works, prompts a deployed GPAI model in ways known to elicit memorised text, and compiles a list of works the model has clearly seen but the model card's Art. 53(1)(d) summary does not enumerate. The referral is filed; the AI Office requests methodology and sample-level disclosures; the provider either updates the summary, retrains, or pays.

The pattern

A model-card build emits a training-data summary derived from the corpus-assembly metadata. There is no compile-time contract that "every category enumerated in the model-card summary is a closure of the corpus's actual source set, and every source set in the corpus has at least one category in the summary." When the corpus contains shards or sources the assembler forgot to register against the summary categories, the summary is silently incomplete. The bug is not that the summary is wrong; the bug is that there is no compile-time link between the corpus and the summary.

Any pipeline where the model card's Art. 53(1)(d) summary is generated by a separate process from the corpus assembly graph, with no verifier asserting their closure has this exposure. That is essentially every pipeline today.

Which tier failed (anticipated)

T8 provenance-flow at the model-card stage. The summary categories should be a closure attestation: every source-set in the corpus is covered by at least one summary category, and every summary category corresponds to at least one corpus source-set. The contract is bidirectional. The compile-time refutation surface answers the question "could this summary be challenged by an external party who finds a single source in the model that is not categorised?" If the answer is "yes," ship is blocked.

T6-adjacent dependency: completeness of the summary depends on the underlying flow tags being assigned at corpus-assembly time. If license_class, data_modality, and source_attribution are not flow tags, the summary cannot enumerate them. Art. 53(1)(d) compliance presupposes a Tier-6 / Tier-8 substrate that almost no current foundation-model provider has built.

What an AG-tower-driven control would have done

A contract summary.categories ⊇ corpus.source_sets ∧ corpus.source_sets ⊇ summary.categories (i.e., closure equality) refutes at model-card-build time when the assembly graph contains source sets that don't map to any summary category, or when the summary enumerates a category with no corpus source. The verifier surfaces: "summary categories {Common Crawl, Wikipedia, licensed publishers} are not a closure of corpus source sets {commoncrawl-2023-09, wikipedia-en-20240101, licenced/conde-nast-2023, …, scraped/anonymous-archive-018}; missing in summary: scraped/anonymous-archive-018. Art. 53(1)(d) summary build FAIL."

This is identical in shape to the SQL-tier "did your dbt model summary enumerate every upstream source" check. Same primitive, AI-vertical tag glossary. The Annex IV pack ships with a SARIF artifact attesting closure equality; the AI Office reads the SARIF and either accepts it or asks one structured follow-up. The current state of the art is unstructured prose about the corpus; the substrate exists to make it auditable, the obligation already exists, and the first enforcement is the only thing missing.

See also

Sources

See the AI-provenance tag glossaryT8 · Lattice PII (closure under join) in the canonical glossary

Each refutation in this archive is a SARIF artifact a regulator could replay tomorrow — the same artifact format the SQL-vertical playground emits today, with the AI-provenance tag glossary swapped in.

These write-ups are journalism + product framing; they are not legal advice. Regulatory citations are best-effort references to public documents at time of writing. For anticipated cases, the entry labels the framing explicitly as anticipated rather than closed.