Before a sales engineer types a single character into Claude, their name, email, internal user ID and organisation identifier have already left the building. Intercom got the lot on page load. Datadog received the organisation tag. None of this required them to ask a question, let alone submit one. That is the most striking finding in a new measurement study from UC Davis researchers. They tested 20 web chatbots with one prompt and watched where the traffic went. AI chatbot data leakage is not theoretical; it is a default behaviour of the modern chatbot stack. The vendor that markets hardest on privacy turns out to illustrate the pattern most cleanly, and its disclosures being among the most detailed in the cohort sharpens that point rather than softening it.
AI chatbot data leakage is a supply chain problem
Most coverage of AI chatbot data leakage frames it as a privacy story. For a CRO at a mid-market B2B firm, that framing misses the point. The conversation your revenue team has with a chatbot about a live deal is not a private utterance. It is a payload, and it travels through a stack of analytics, support and ad-tech vendors that procurement has never reviewed. Salesforce was vetted. The marketing automation suite was vetted. Most AI vendor governance reviews stop at the model provider; the Intercom widget loaded inside someone else’s product is the layer beyond.
The supply chain framing is the honest one. AI chatbot data leakage is an unvetted dependency on every revenue conversation that touches a chatbot. By now, that is most of them. Acceptable use policies written before this evidence existed need a second look.
What actually leaves the building
The UC Davis team observed 17 of 20 chatbots contacting at least one third-party domain in a normal session. Across the cohort they counted 47 distinct third-party owners. AI chatbot data leakage takes three forms that matter to a revenue leader.
Identity on page load
Claude and Mistral both fire the Intercom support widget the moment the page loads. The boot payload contains user email, user name, internal user ID and a user hash. Page titles, which are derived from the prompt, are also posted to Intercom’s metrics endpoint. Datadog separately receives an organisation identifier that ties telemetry to a named user inside a named company. Character.AI sends the same identity bundle to Sentry and Statsig. None of this requires the user to ask a question.
Prompt and response in plaintext
Four chatbots embed Microsoft Clarity, a session-replay tool. On Genspark, SeaArt and ChatOn, Clarity received the exact prompt text and readable snippets of the assistant’s reply. The researchers captured Genspark sending “Here are a few pregnancy test options near you.” straight to Microsoft’s analytics endpoint. ChatOn forwarded “Most pharmacies like CVS, Walgreens, or Rite Aid carry home pregnancy tests.” The same sessions captured user name and email because those fields render in the chat header. Three of these vendors do not list Clarity in their privacy policies.
Chat URLs propagating across the ad layer
Fifteen of the 20 chatbots send chat URLs or conversation identifiers to at least one third party via standard analytics and ad tags. SeaArt’s chat identifiers reach 12 destinations including Facebook, Google, TikTok and Yandex. Even where the URL does not contain the prompt itself, it is a stable conversation pointer. Stable pointers are exactly what ad-tech graphs are built to join together.
The identity layer most teams cannot see
Hashed emails leave Perplexity for Singular and SeaArt for TikTok. The Meta pixel fires from Character.AI, ChatOn, Genspark, Manus, PolyBuzz and SeaArt. Microsoft’s Bing identifiers do the same on four of the same chatbots. The point is not that any single hop is catastrophic; the point is that persistent identifiers used to measure ad performance are exactly the identifiers that link chatbot activity to the rest of a person’s web behaviour.
For a B2B firm, that means deal context, account names and competitive intelligence becomes joinable to professional identity in advertising graphs with which you have no contract and no recourse. This is AI chatbot data leakage in its purest form: behaviour fully documented in marketing speak, and entirely invisible in a normal procurement review.
AI chatbot data leakage has a free control
Seven chatbots in the study offer a private or temporary mode: ChatGPT, Gemini, Claude, Grok, Perplexity, Qwen and Mistral. In private mode across the cohort, third-party content exposure and identity exposure dropped to zero. The UI disclosures for these modes talk about training and retention, not about tracking. Empirically, private mode is also a tracking switch, and a free one.
Tell revenue, customer success and sales engineering teams to use private or temporary mode for any chatbot conversation involving a named account. That single instruction is a Monday-morning action against AI chatbot data leakage with measurable effect. It does not need procurement sign-off, a new tool or a DPIA. It needs an internal note and a manager who reads it out at the team meeting.
One caveat the researchers flag, which deserves preserving: the study is one prompt, one browser, one snapshot in time, on web UIs only. Vendor configurations change. None of that softens what was observed; it bounds the claim.
The commercial argument
Responsible AI as a growth advantage means knowing what leaves your perimeter when your team works. The data protection conversation will happen anyway, and your DPO will run it competently. The revenue conversation is different: whether your competitive context is showing up in someone else’s analytics dashboard, joined to your sales engineer’s professional identifier, the day before a contract is signed. AI chatbot data leakage turns that question from hypothetical into operational.
Decide what your revenue team types into which chatbot, and in which mode. The rest follows. The AI Operational Strategy Prep Track works through this kind of workforce-level decision in practice; if your acceptable use policy predates the evidence above, it is the natural place to start.