비밀 LLM 배포를 위한 암호화 텐서 포맷 CryptoTensors
📝 Abstract
To enhance the performance of large language models (LLMs) in various domain-specific applications, sensitive data such as healthcare, law, and finance are being used to privately customize or fine-tune these models. Such privately adapted LLMs are regarded as either personal privacy assets or corporate intellectual property. Therefore, protecting model weights and maintaining strict confidentiality during deployment and distribution have become critically important. However, existing model formats and deployment frameworks provide little to no built-in support for confidentiality, access control, or secure integration with trusted hardware. Current methods for securing model deployment either rely on computationally expensive cryptographic techniques or tightly controlled private infrastructure. Although these approaches can be effective in specific scenarios, they are difficult and costly for widespread deployment. In this paper, we introduce CryptoTensors, a secure and format-compatible file structure for confidential LLM distribution. Built as an extension to the widely adopted Safetensors format, CryptoTensors incorporates tensor-level encryption and embedded access control policies, while preserving critical features such as lazy loading and partial deserialization. It enables transparent decryption and automated key management, supporting flexible licensing and secure model execution with minimal overhead. We implement a proof-of-concept library, benchmark its performance across serialization and runtime scenarios, and validate its compatibility with existing inference frameworks, including Hugging Face Transformers and vLLM. Our results highlight CryptoTensors as a light-weight, efficient, and developer-friendly solution for safeguarding LLM weights in real-world and widespread deployments.
💡 Analysis
To enhance the performance of large language models (LLMs) in various domain-specific applications, sensitive data such as healthcare, law, and finance are being used to privately customize or fine-tune these models. Such privately adapted LLMs are regarded as either personal privacy assets or corporate intellectual property. Therefore, protecting model weights and maintaining strict confidentiality during deployment and distribution have become critically important. However, existing model formats and deployment frameworks provide little to no built-in support for confidentiality, access control, or secure integration with trusted hardware. Current methods for securing model deployment either rely on computationally expensive cryptographic techniques or tightly controlled private infrastructure. Although these approaches can be effective in specific scenarios, they are difficult and costly for widespread deployment. In this paper, we introduce CryptoTensors, a secure and format-compatible file structure for confidential LLM distribution. Built as an extension to the widely adopted Safetensors format, CryptoTensors incorporates tensor-level encryption and embedded access control policies, while preserving critical features such as lazy loading and partial deserialization. It enables transparent decryption and automated key management, supporting flexible licensing and secure model execution with minimal overhead. We implement a proof-of-concept library, benchmark its performance across serialization and runtime scenarios, and validate its compatibility with existing inference frameworks, including Hugging Face Transformers and vLLM. Our results highlight CryptoTensors as a light-weight, efficient, and developer-friendly solution for safeguarding LLM weights in real-world and widespread deployments.
📄 Content
The rapid and widespread adoption of large language models (LLMs) has led to a growing number of closed-weight models which cannot be publicly released due to privacy, commercial, or regulatory constraints. These models generally fall into two categories. The first comprises foundation models trained from scratch by major companies, such as GPT-5 [4], Claude [6], and Gemini [54]. These models remain proprietary to safeguard substantial training investments and maintain competitive advantage. The second category includes domain-specific models fine-tuned from open-weight LLMs (e.g., LLaMA [37,58], Mistral [24,25], Gemma [55], DeepSeek [17,35], Kimi [56,57]).
The proliferation of open models has greatly lowered the barrier to entry, allowing developers to adapt them using proprietary or sensitive datasets in domains such as healthcare [41], law [52], and finance [30]. The resulting models often inherit the sensitivity of their training data and, consequently, cannot be openly shared.
Deploying these closed-weight models in broader settings exposes natural trade-offs between privacy, inference speed, and hardware cost [34]. Most current development efforts focus on improving performance, usability, and format compatibility-for example, optimizing inference efficiency [19,49] or introducing new model storage formats [15,50]. Existing solutions for securely deploying proprietary LLMs, however, tend to be either too complex or inaccessible to most developers. In practice, large organizations depend on private cloud environments, custom communication protocols, encrypted delivery systems, or legal agreements to control model access [5,10,22]. While these strategies are effective at scale, they demand substantial infrastructure and legal resources, making them impractical for smaller teams or open model ecosystems. Conversely, academic research has explored privacy-preserving computation techniques that offer strong security guarantees. Yet, these approaches are difficult to apply in real-world AI workflows, as they often require model-specific modifications and heavy runtime support [12,48]. As a result, there remains no lightweight, general-purpose solution that can provide secure and controlled model usage while remaining compatible with today’s common deployment pipelines.
Facing the above challenges, in this paper, we propose CryptoTensors, a light-weight LLM file format for highly secure model distribution. To protect model weights, CryptoTensors encrypts the model weights, ensuring that even if a user copies and obtains the entire model file, they cannot load or use the model properly. To minimize infrastructure requirements, CryptoTensors only uses software-based encryption techniques. Most importantly, we achieve compatibility with Safetensors [21], so our format leverages existing widely adopted models, ensuring compatibility with frameworks such as vLLM and Hugging Face Transformers. As a result, users only need a Python environment to decrypt the model weights and perform model inference. To ensure broad applicability, our encryption operates at the tensor level, encrypting each tensor (i.e., each weight matrix) with a unique data encryption key. This fine-grained encryption ensures that the encryption is independent of the model’s architecture, allowing it to be applied to any model structure, whether dense or sparse. Overall, the primary contributions of this work are as follows:
• We introduce CryptoTensors, a secure LLM file format that extends Safetensors for controlled distribution and protection of closed-weight models. Following standard security practice, we minimize the trusted computing base-the subset of software and hardware components that must be assumed trustworthy-to a small set of cryptographic primitives and avoid systemlevel dependencies, thereby reducing the attack surface and ensuring strong confidentiality guarantees with practical overhead.
• CryptoTensors adopts a decoupled structure where the file header is in plaintext but its integrity is protected through digital signatures. The tensor body is protected via fine-grained encryption. This enables compatibility with existing indexing mechanisms and preserves essential optimizations such as lazy loading and partial deserialization.
• The file format is self-contained, embedding cryptographic metadata to enable transparent and policy-compliant decryption for authorized users without manual intervention. Our design supports automated key provisioning and policy validation through integration with remote Key Broker Services (KBS), enabling flexible licensing models and secure deployments across diverse environments.
• We implement a proof-of-concept library and comprehensively benchmark its performance, including serialization/deserialization overhead and runtime characteristics. We further demonstrate its practical applicability by integrating it into real-world inference frameworks such as Hugging Face Transformers and vLL
This content is AI-processed based on ArXiv data.