Whistledown: Combining User-Level Privacy with Conversational Coherence in LLMs
📝 Abstract
Users increasingly rely on large language models (LLMs) for personal, emotionally charged, and socially sensitive conversations. However, prompts sent to cloud-hosted models can contain personally identifiable information (PII) that users do not want logged, retained, or leaked. We observe this to be especially acute when users discuss friends, coworkers, or adversaries, i.e., when they spill the tea. Enterprises face the same challenge when they want to use LLMs for internal communication and decision-making. In this whitepaper, we present Whistledown, a best-effort privacy layer that modifies prompts before they are sent to the LLM. Whistledown combines pseudonymization and $ε $-local differential privacy ( $ε$-LDP) with transformation caching to provide best-effort privacy protection without sacrificing conversational utility. Whistledown is designed to have low compute and memory overhead, allowing it to be deployed directly on a client’s device in the case of individual users. For enterprise users, Whistledown is deployed centrally within a zero-trust gateway that runs on an enterprise’s trusted infrastructure. Whistledown requires no changes to the existing APIs of popular LLM providers.
💡 Analysis
Users increasingly rely on large language models (LLMs) for personal, emotionally charged, and socially sensitive conversations. However, prompts sent to cloud-hosted models can contain personally identifiable information (PII) that users do not want logged, retained, or leaked. We observe this to be especially acute when users discuss friends, coworkers, or adversaries, i.e., when they spill the tea. Enterprises face the same challenge when they want to use LLMs for internal communication and decision-making. In this whitepaper, we present Whistledown, a best-effort privacy layer that modifies prompts before they are sent to the LLM. Whistledown combines pseudonymization and $ε $-local differential privacy ( $ε$-LDP) with transformation caching to provide best-effort privacy protection without sacrificing conversational utility. Whistledown is designed to have low compute and memory overhead, allowing it to be deployed directly on a client’s device in the case of individual users. For enterprise users, Whistledown is deployed centrally within a zero-trust gateway that runs on an enterprise’s trusted infrastructure. Whistledown requires no changes to the existing APIs of popular LLM providers.
📄 Content
Cloud-hosted LLMs from providers like Anthropic [2] and OpenAI [1] have become ubiquitous for general advice, emotional support, and casual conversation. However, using cloud-hosted LLMs poses privacy risks when user prompts contain personal or sensitive data sent to untrusted servers. This risk is particularly pronounced for what we term teaspilling prompts, i.e., conversations where users discuss colleagues, friends, romantic partners, or adversaries by name, often seeking advice on interpersonal conflicts or sharing emotionally charged narratives.
The same fundamental challenge is also faced by enterprises and smaller organizations that want to use LLMs for internal communication and decision-making. For example, an enterprise may want to use LLMs to help employees make decisions about company policies, products, or services. However, these prompts may contain sensitive data that the enterprise does not want to share with the LLM. Similarly, an enterprise might want to use LLMs to analyze candidate job applications. However, these prompts may contain sensitive data about the applicant that the enterprise does not want to share with the LLM. Furthermore, the enterprise may want to minimize demographic bias in the LLM’s analysis regarding the job applicant.
There are many existing approaches to this problem, a lot of which are not necessarily mutually exclusive. The first is to build a user learning program [17], for example, advising users not to share real names. However, user training is error-prone, breaks conversational flow in the LLM context, and adds unnecessary cognitive overhead. A second approach is to only use open-source or in-house LLMs running on a user’s device or in an enterprise’s infrastructure [8]. However, this approach either sacrifices output quality [14] or requires significant compute resources [8]. A third approach to this problem is sanitizing prompts in trusted infrastructure before they are sent to an external LLM. This is identical in spirit to the NIST 800-122 recommendation [16] for data sanitization before external release. At its core, Whistledown attempts to solve this problem using the third approach.
Dorcha provides an AI security platform that exposes a zero-trust gateway located between users and external LLM providers. Dorcha’s Gateway helps enterprises de-risk the process of integrating their internal infrastructure with LLMs. The gateway can be used to manage non-human identities, define secure access policies between internal entities and LLM agents, anonymize metadata (e.g., IP addresses, device info), and make all requests between internal entities and agentic services observable and auditable.
Dorcha also provides a lighter-weight tool for everyday users called Saoirse. Saoirse is a client-side AI assistant that sends all user requests to a managed instance of Dorcha’s Gateway that we run in a trusted execution environment. Saoirse lets everyday users and smaller enterprises benefit from common-sense security policies without managing their own infrastructure. Saoirse is essentially an application providing a ChatGPT-like conversational interface but with built-in security and privacy functionality.
Recently, users of Saoirse expressed discomfort when including real personal names and other socially relevant sensitive data in prompts. This was especially true when users were seeking advice or spilling the tea on people they know in real life. Users are concerned that these names could be logged, leaked, or used for training by external models. Betausers of the Saoirse application shared that they were manually substituting fake names and other sensitive data to mitigate this risk, which is error-prone and cumbersome.
A similar concern was brought up during discussions with enterprise leaders who wanted to use LLMs for internal communication and decision-making. The enterprises work primarily in the humanitarian and non-profit sectors, with employees and users from a diverse range of backgrounds. The enterprises are interested in making data-driven decisions regarding highly sensitive topics such as human rights violations or medical aid distribution. Therefore, integrating LLMs into their decision-making processes, even for casual data gathering or advice, comes with significant risks.
To address this, we implemented Whistledown, a besteffort privacy and bias mitigation layer that modifies prompts after a user has submitted them but before they are sent to the LLM. Whistledown is designed to have very low performance overhead. This allows it to be deployed directly on a client’s device in the case of non-enterprise users. When deployed on a client’s device, Whistledown modifies the prompt before it exits the user’s device so only the user sees the unmodified prompt. For enterprise users, Whistledown can also be deployed centrally within Dorcha’s zerotrust gateway that runs on an enterprise’s trusted infrastructure. When deployed within Dorcha’s Gateway,
This content is AI-processed based on ArXiv data.