Samsung internal-data leak via ChatGPT — April 2023

Cost: undisclosed; three confirmed leak events including ~6,000-line semiconductor source code · Time-to-detect: within 20 days of internal ChatGPT use being permitted; full company-wide ban followed in May 2023 · Root cause class: T6 (information-flow reachability — "internal/confidential" tag escapes to external sink) compounded by T7-style erasure-completeness gap (no path to remove submitted data from the third-party model)

What happened

In late March 2023, Samsung Semiconductor's Device Solutions division relaxed its policy on ChatGPT use to evaluate it as an engineering productivity tool. Within roughly 20 days, three separate confidential-data leak events were attributed to employee prompts submitted to ChatGPT. Per reporting in The Economist Korea (republished by Bloomberg and Forbes), the events included:

An engineer pasted a section of proprietary semiconductor measurement-database source code into ChatGPT to ask for a bug fix.
A second engineer submitted equipment yield/sensor data and asked the model to optimise the process.
A third employee uploaded a recording of an internal meeting and asked the model to produce minutes.

Each prompt was, under OpenAI's then-current API terms, eligible for retention and use in model improvement. Samsung's internal review concluded the data was effectively unrecoverable — once submitted, there was no contractual or technical path that would let Samsung force OpenAI to enumerate every model checkpoint that had seen the prompts and either redact or retrain. On May 1, 2023 Samsung issued a company-wide ban on generative-AI tools on company-owned devices and internal networks.

The incident is now the canonical "private-data exfiltration via prompt" case study and is referenced by NIST AI 600-1 (Jul 2024) under the "Data Privacy" risk category as an example of why deployers must monitor third-party AI use.

The pattern

A confidentiality classification (internal-only, trade-secret) was attached to source code and operational data on Samsung-owned systems. That classification did not propagate to the API call that submitted the data to a third-party model. There was no compile-time or runtime check enforcing "data tagged internal-only cannot reach a sink owned by an external party." The model vendor's data-retention default (at the time of the incident, ChatGPT's default was retention-and-training; that default flipped to no-retention for the API in March 2023 but for the consumer surface only later) meant the internal data became part of someone else's training corpus.

Any system where classified data is reachable from an interactive surface that submits prompts to an external model API has this exposure. The pattern is identical in shape to the Equifax T6 case from the existing incident archive — a regulated/sensitive class reaches a public sink — but the sink is now "external model vendor" instead of "public-internet HTTP handler."

Which tier failed

T6 information-flow on the deployer side: the data classification did not flow with the data through to the API egress. Samsung's existing DLP controls were focused on email and file uploads, not LLM-API egress, and the new sink class had not been enumerated.

T7-adjacent failure on the vendor side: once submitted, there was no erasure-completeness pathway. Even when OpenAI later offered "no-retention" mode and ChatGPT Enterprise with no-training defaults, the original submitted prompts could not be retroactively excluded from any model checkpoint that had already incorporated them.

What an AG-tower-driven control would have done

On the deployer side: a flow contract flow(class:internal_only) ∉ flow(external_api_egress) refutes at static-analysis time when an internal tool is wired to invoke a third-party LLM with classified data on the prompt path. The verifier flags: "function assistant.send_prompt is reachable from class internal_only via 2 hops; sink is external host api.openai.com; T6 information-flow VIOLATED."

On the vendor side: an erasure-completeness contract for_each(submitted_prompt) ∃ retract_pathway → model_checkpoint refutes when a vendor attests retention-and-training defaults but cannot exhibit a pathway. The model card carries a Stub refutation in the Annex IV pack until either the retention default flips or a real unlearning pathway lands. Either way, the deployer reads a refuted SARIF, not an Economist article.

Sources

Mark Gurman, "Samsung Bans Staff's AI Use After Spotting ChatGPT Data Leak," Bloomberg (May 2, 2023): https://www.bloomberg.com/news/articles/2023-05-02/samsung-bans-chatgpt-and-other-generative-ai-use-by-staff-after-leak
Siladitya Ray, "Samsung Bans ChatGPT Among Employees After Sensitive Code Leak," Forbes (May 2, 2023): https://www.forbes.com/sites/siladityaray/2023/05/02/samsung-bans-chatgpt-and-other-chatbots-for-employees-after-sensitive-code-leak/
The Economist Korea (Apr 2023, Korean original; English summaries in Mashable and Tom's Hardware): https://www.theeconomistkorea.com/news/articleView.html?idxno=178233
NIST AI 600-1 (GenAI Profile, Jul 2024) §2.5 Data Privacy: https://nvlpubs.nist.gov/nistpubs/ai/NIST.AI.600-1.pdf

These write-ups are journalism + product framing; they are not legal advice. Regulatory citations are best-effort references to public documents at time of writing. For anticipated cases, the entry labels the framing explicitly as anticipated rather than closed.