privacy architecture · sketch · Q3 2027

Peer insights without ever seeing your code

This page is the public draft of how veric plans to compute cross-codebase property recommendations under an opt-in, zero-knowledge regime. It is a design sketch, not a shipped architecture; nothing on this page is implemented in the currently deployed product. We are publishing it early so customers can review the proposed contract before the first byte of cross-tenant data is ever computed over.

What problem are we solving?

A codebase that has been verified for, say, twenty properties is a more valuable asset to its owner if the owner can see what comparable codebases are verifying that theirs is not. The recommendation surface (the constellation) is a network-effect product: the more codebases on veric, the better the recommendation. But a recommendation engine that requires its provider to observe customer code — even in aggregate — is a non-starter for the customers most likely to derive value from veric. They are exactly the customers whose code is the asset that must not leak.

The architecture below is our attempt to cleave that tension: every recommendation is computed against a cryptographic commitment of whichproperties a given codebase verifies, never the property contents, the verification artifacts, or anything observable from inside the customer's deployment.

What crosses a customer boundary, what never does

Crosses: a Merkle commitment, signed by the customer's veric deployment, of the multiset of property identities that the deployment has successfully verified in the current release. The identity is a p_…token drawn from veric's public catalogue; its preimage is not the property body, source code, or a hash of either.
Never crosses: source code, AST nodes, lattice states, decoration tables, verification timelines, file paths, identifier names, type bodies, function signatures, runtime values, schema fragments, or anything derivable from any of the above. The customer's code never leaves the customer's environment.
Customer-controlled: the property identity catalogue is opt-in property-by-property. A deployment may publish the identity p_money_balanced while withholding p_money_kyc_required if the latter would leak that the customer operates in a regulated jurisdiction.

ZKP-style summaries — the construction

Each customer deployment maintains a Merkle tree whose leaves are the property identity tokens it has verified. The root is signed by an HSM-anchored deployment key. The customer publishes the root commitment plus a zero-knowledge set-membership argument (Plonk over BN254, or a Bulletproofs-based variant during the prototype phase) showing that a queried token p_x is present in the tree, without revealing the rest of the tree.

Recommendation queries are then computed by a coordinator that holds no customer-specific keys and never sees a preimage. The coordinator can answer:

How many distinct deployments have verified property p_x? (cardinality over committed memberships.)
Of the deployments tagged fintech at onboarding, how many verify p_x? (cardinality stratified by a self-reported industry tag, the only non-derivable field that crosses the boundary, and only at the tenant's explicit opt-in.)
Which property identities are most over-represented in the deployments closest to mine on a public, low-rank code-shape sketch? (locality-sensitive hashing over a public structural fingerprint that the customer chooses the inputs to.)

Crucially, the coordinator never learns deployment-to-deployment edges in the clear. Aggregates below a configurable threshold (default k=10) are suppressed; the constellation never highlights a cluster that has fewer than k distinct deployments.

What recommendations look like in practice

When a customer hovers a property star in the constellation, the surfaced popover ("used by 47 codebases, 14 in fintech, 3 with shape similar to yours") is computed by the coordinator from membership counts plus the customer's own structural fingerprint hashed locally. The customer's deployment verifies the coordinator's response by re-running the small set-membership argument against its own committed tree before rendering. No round-trip ever asks the coordinator to reveal which other tenant verified what.

Threat model and limitations

We assume the coordinator is honest-but-curious; we do not assume it is fully trusted. A malicious coordinator could attempt traffic-analysis attacks on query timing or query shape. Mitigations under design include constant-rate dummy queries, fixed-size response padding, and per-tenant query budgets. We do not yet have a deployable mitigation for a fully malicious coordinator that colludes with another tenant; closing that gap is a named milestone for the Q3 2027 review.

Side-channel leakage from the customer's self-reported industry tag is a known limitation. A customer who self-reports healthcare-EUon onboarding has, by definition, leaked that industry tag. This is why industry-stratified counts are an explicit opt-in, and why the constellation always renders the customer's own codebase locally without involving the coordinator at all.

Timeline

Q1 2027: external privacy review begins (third-party academic + one independent applied-cryptography firm).
Q2 2027: prototype the membership argument against a private test cohort of 3-5 design partners.
Q3 2027: targeted GA, opt-in only, with the published threat model and external review report posted alongside.

None of the above is committed roadmap. We are publishing this draft so that interested customers, researchers, and counterparties can push back on it before any line of production code lands. Comments to privacy@veric.dev.

← back to the constellation home →