Data Feedback and Training
Whether upstream model providers retain prompt and output for training — three major providers' default terms, methods to disable feedback, ways to prove to customers.
Problem overview
Every agent invocation of upstream model APIs (Anthropic, OpenAI, etc.) sends customer prompt content and agent output through the provider’s servers. The provider may by default retain this data for training future models — this is a direct threat to customer compliance and a hard topic in vendor contracts and sales.
Concerns across compliance scenarios:
- GDPR: personal data used for other purposes requires explicit consent; training use without customer authorization is a violation
- HIPAA: medical data may not be used for any secondary purpose, including model training
- Enterprise trade secrets: customer prompts may contain unpublished strategy, contract clauses, customer information — provider retention means potential leakage
- Industry-specific: finance, legal, government customers have similar but stricter restrictions
Three major providers’ default terms
The table below reflects major providers’ default terms at the time of writing (early 2026). Terms change frequently; always verify the latest version before signing contracts.
| Provider | Standard API default | Enterprise plan | Methods to disable training use |
|---|---|---|---|
| Anthropic | Default not used for training (revised) | Same + DPA + BAA | Default already satisfies; extra needs via Enterprise contract |
| OpenAI | API default not used for training (since 2023) | ZDR (Zero Data Retention) option + DPA + BAA | Apply for ZDR; 30-day default retention reducible to 0 |
| Google (Vertex AI) | Default not used for training (enterprise accounts) | Data residency region options + detailed DPA | Default already satisfies |
| Self-hosted models | Fully controlled | Custom | No external feedback |
Key trend changes:
- 2022-2023 default terms were “may be used for training”; enterprise market pushback led providers to revise to “default not for training”
- But terms can change at any time — written commitment at contract signing is safer than relying on default settings
- ChatGPT and other consumer products differ from API terms — user conversations are by default used for training; only ChatGPT Enterprise offers ZDR options
Three layers to cut feedback
Layer 1: contract terms
Sign enterprise contracts with upstream model providers (not default API access); contracts explicitly state:
- “Provider shall not use Customer Data to train, fine-tune, or otherwise improve any AI models.”
- “Provider shall not retain Customer Data beyond [X] hours for operational purposes.”
- Add DPA (Data Processing Agreement) — GDPR requirement
- For healthcare add BAA (Business Associate Agreement) — HIPAA requirement
Layer 2: technical switches
API options provided by upstream:
- OpenAI: set
data_retention: "zero"on API calls or enable ZDR in account settings - Anthropic: default zero retention; enterprise accounts additionally sign DPA confirming terms
- Vertex AI: account-level data control settings
Technical switches and contract terms should be used together — contract is legal protection, technical switch is technical enforcement.
Layer 3: audit verification
How is contract + technical switch proven to the customer?
- Upstream providers typically do not directly issue certificates to “your customers” (your customer is not their direct customer)
- Practical approach: you (the vendor) hold the upstream provider’s contracts / certifications, and prove the sub-processor relationship to customers
- Customer contracts include sub-processor list + data usage terms for each sub-processor
- Provider compliance certifications (SOC2, HIPAA-eligible, etc.) serve as indirect endorsement
Standard response to customers
Enterprise customer due diligence checklists almost always include this question. Prepare a standard response to avoid answering from scratch in every sales conversation:
Q: Will our data be used to train AI models?
A: No. Specifically:
1. Our upstream model providers ([list, e.g., Anthropic / OpenAI / ...])
commit by default not to use API input for model training. We have
signed enterprise-grade contracts with providers ([contract name])
that further clarify this clause.
2. We enable Zero Data Retention (ZDR) configuration at the API call
layer (where the provider supports this option).
3. At our own infrastructure layer, prompt and output are retained only
in audit logs (for compliance and troubleshooting), retention period
[X] days, accessible only to authorized personnel.
4. We can provide the following upon customer due diligence request:
- Upstream provider DPA / BAA
- Our SOC2 Type II report (if applicable)
- Complete sub-processor list
For more details, contact compliance@[domain].
Self-hosted / private model trade-offs
Some customers (finance, government, healthcare) require fully private deployment — model weights inside customer infrastructure, prompts never leave the customer boundary.
Trade-offs:
| Dimension | Public API (upstream provider) | Private deployment (self-hosted model / customer infra) |
|---|---|---|
| Compliance boundary | Cross-organization (multiple sub-processors) | Single boundary (customer’s own) |
| Cost | Low unit cost (shared infrastructure) | High unit cost (dedicated infra + model license) |
| Model capability | Always current (provider continuous upgrade) | Lagging (self-hosted updates slow) |
| Deployment complexity | Low (API call) | High (requires GPU cluster, model management) |
| Suitable customers | Most enterprise customers | Top-compliance customers, government, sensitive industries |
Practical strategy:
- Default product offering: public API + strict compliance configuration (suits 80% of customers)
- Enterprise upsell: private deployment option (suits remaining 20%, unit price 5-10× standard contract)
- Do not proactively push private deployment to all customers — cost and operational burden are large; offer only when customers explicitly require
Cross-section connections
- Data residency physical location choices: overview
- Private deployment impact on unit economics: metrics/unit-economics
- Compliance-customer pricing tier: pricing/tier-design