AlienLM: Alienization of Language for API-Boundary Privacy in Black-Box LLMs
Modern LLMs are increasingly accessed via black-box APIs, requiring users to transmit sensitive prompts, outputs, and fine-tuning data to external providers, creating a critical privacy risk at the API boundary. We introduce AlienLM, a deployable API-only privacy layer that protects text by translating it into an Alien Language via a vocabulary-scale bijection, enabling lossless recovery on the client side. Using only standard fine-tuning APIs, Alien Adaptation Training (AAT) adapts target models to operate directly on alienized inputs. Across four LLM backbones and seven benchmarks, AlienLM retains over 81% of plaintext-oracle performance on average, substantially outperforming random-bijection and character-level baselines. Under adversaries with access to model weights, corpus statistics, and learning-based inverse translation, recovery attacks reconstruct fewer than 0.22% of alienized tokens. Our results demonstrate a practical pathway for privacy-preserving LLM deployment under API-only access, substantially reducing plaintext exposure while maintaining task performance.
💡 Research Summary
Background and Motivation
The rapid adoption of commercial large language model (LLM) APIs (e.g., OpenAI, Anthropic, Google Cloud) has shifted the primary privacy risk from model training to the API boundary. Users must send prompts, responses, and fine‑tuning corpora over the network, often containing personally identifiable information (PII), medical notes, financial records, or proprietary documents. Existing cryptographic approaches for secure inference—fully homomorphic encryption, secure multi‑party computation, trusted execution environments—require white‑box access to model weights or specialized runtimes, leading to prohibitive latency and incompatibility with black‑box APIs. Differential privacy and federated learning protect training data but do not hide inference‑time inputs and outputs. Consequently, a practical, text‑level privacy mechanism that works solely with API calls is missing.
Core Idea: Alien Language
AlienLM introduces an “Alien Language” defined by a bijective permutation over the entire token vocabulary of a target LLM. Let V be the set of (token string, token ID) pairs; special tokens S remain unchanged, while the remaining IDs I are permuted by a bijection f : I → I. The resulting alien vocabulary V_alien has the same size but each non‑special token string is replaced by a completely different string. The permutation is designed to (1) maximize normalized edit distance between original and alien token strings, thereby reducing human readability, and (2) preserve cosine similarity between token embeddings, ensuring that the model can still learn the underlying semantics after fine‑tuning.
Permutation Optimization
Because the target model’s embeddings are inaccessible in a black‑box setting, AlienLM uses embeddings from an open‑source proxy model (e.g., Llama‑2‑7B) denoted e_P. The optimization objective for a subset I_ρ (controlled by the alienization ratio ρ) is:
∑_{i∈I_ρ}
Comments & Academic Discussion
Loading comments...
Leave a Comment