FastFHE: 동형암호 기반 딥러닝 추론 가속화 기법

Reading time: 7 minute
...

📝 Abstract

The deep learning (DL) has been penetrating daily life in many domains, how to keep the DL model inference secure and sample privacy in an encrypted environment has become an urgent and increasingly important issue for various security-critical applications. To date, several approaches have been proposed based on the Residue Number System variant of the Cheon-Kim-Kim-Song (RNS-CKKS) scheme. However, they all suffer from high latency, which severely limits the applications in real-world tasks. Currently, the research on encrypted inference in deep CNNs confronts three main bottlenecks: i) the time and storage costs of convolution calculation; ii) the time overhead of huge bootstrapping operations; and iii) the consumption of circuit multiplication depth. Towards these three challenges, we in this paper propose an efficient and effective mechanism FastFHE to accelerate the model inference while simultaneously retaining high inference accuracy over fully homomorphic encryption. Concretely, our work elaborates four unique novelties. First, we propose a new scalable ciphertext data-packing scheme to save the time and storage consumptions. Second, we work out a depthwiseseparable convolution fashion to degrade the computation load of convolution calculation. Third, we figure out a BN dotproduct fusion matrix to merge the ciphertext convolutional layer with the batch-normalization layer without incurring extra multiplicative depth. Last but not least, we adopt the low-degree Legendre polynomial to approximate the nonlinear smooth activation function SiLU under the guarantee of tiny accuracy error before and after encrypted inference. Finally, we execute multi-facet experiments to verify the efficiency and effectiveness of our proposed approach, and the experimental results show that, at the standard 128-bit security level, our FastFHE enables the popular ResNet20 to achieve a 2.41× reduction in inference latency and a 2.38× descent in amortized runtime (using 30 threads) compared to the state-of-the-art methods. Moreover, the performance on several other representative deep CNN architectures, such as ResNet32, ResNet44 and VGG11, also exhibits remarkable advantages of our work in both inference overhead and accuracy.

💡 Analysis

The deep learning (DL) has been penetrating daily life in many domains, how to keep the DL model inference secure and sample privacy in an encrypted environment has become an urgent and increasingly important issue for various security-critical applications. To date, several approaches have been proposed based on the Residue Number System variant of the Cheon-Kim-Kim-Song (RNS-CKKS) scheme. However, they all suffer from high latency, which severely limits the applications in real-world tasks. Currently, the research on encrypted inference in deep CNNs confronts three main bottlenecks: i) the time and storage costs of convolution calculation; ii) the time overhead of huge bootstrapping operations; and iii) the consumption of circuit multiplication depth. Towards these three challenges, we in this paper propose an efficient and effective mechanism FastFHE to accelerate the model inference while simultaneously retaining high inference accuracy over fully homomorphic encryption. Concretely, our work elaborates four unique novelties. First, we propose a new scalable ciphertext data-packing scheme to save the time and storage consumptions. Second, we work out a depthwiseseparable convolution fashion to degrade the computation load of convolution calculation. Third, we figure out a BN dotproduct fusion matrix to merge the ciphertext convolutional layer with the batch-normalization layer without incurring extra multiplicative depth. Last but not least, we adopt the low-degree Legendre polynomial to approximate the nonlinear smooth activation function SiLU under the guarantee of tiny accuracy error before and after encrypted inference. Finally, we execute multi-facet experiments to verify the efficiency and effectiveness of our proposed approach, and the experimental results show that, at the standard 128-bit security level, our FastFHE enables the popular ResNet20 to achieve a 2.41× reduction in inference latency and a 2.38× descent in amortized runtime (using 30 threads) compared to the state-of-the-art methods. Moreover, the performance on several other representative deep CNN architectures, such as ResNet32, ResNet44 and VGG11, also exhibits remarkable advantages of our work in both inference overhead and accuracy.

📄 Content

In today’s digital era, spurred by the rapid advancement of artificial intelligence in recent years, machine learning has achieved significant breakthroughs in both software and hardware. As a result, it has gradually become the future direction of development across a wide range of industries. As data volume grows and hardware computing power continues to arise, Machine Learning as a Service (MLaaS) has become a preferred choice for many professionals [1]. MLaaS is a cloud-based service whose primary goal is to lower the entry barriers to machine learning, enabling users from various fields to focus on their business demand without worrying about hardware resources, algorithm implementation, or deployment details [2], [3]. However, MLaaS not only encounters conventional adversarial attacks [4]- [6], but it also suffers from privacy-specific breaches both on sample data and model information, such as membership inference attack [7]- [10], model inversion [11], and model extraction [12]- [14]. One main concern of the clients is the sample data leakage, that is, uploading sensitive data to the cloud server may bring in huge risks of information leakage and privacy breach, which profoundly prompts researchers to look into the issue of privacy-preserving machine learning (PPML) in real-world applications.

PPML is a technical paradigm designed to accomplish machine learning tasks while ensuring data privacy. It addresses the issue of sensitive-data leakage in the pipeline of machine learning, such as medical records, financial transactions, personal behavioral data, etc. Surrounding the core PPML techniques, the fully homomorphic encryption (FHE), which keeps the data encrypted throughout the entire computation process, provides the highest level of privacy protection in theory. It has increasingly become the choice for many researchers to execute PPML.

As the achievement of deep neural networks, secure inference for deep-learning models in a FHE environment is becoming a promising privacy-preserving MLaaS solution at present. As we know, FHE supports performing arbitrary homomorphic additions and homomorphic multiplications on ciphertext without the risk of decryption errors [15], [16]. Fig. 1 briefly illustrates the process of secure deep CNN inference over FHE. That is, a user encrypts private data using the FHE public key and sends the encrypted data to the server in the cloud. The server performs deep CNN inference on the encrypted data and returns the encrypted inference result to the user/client, who would decrypt it with the owned private key to obtain the final plaintext outcome. Throughout this pipeline, the user’s private data remains encrypted during the entire inference process, ensuring the server side cannot access any data information. Currently, scholars have predominantly focused on constructing deep CNNs over FHE, for example, Gilad-Bachrach et al. [17] and Badawi [18] had successfully deployed neural networks in the encrypted environment. Nevertheless, the practicality is severely constrained by the shallow network depth and the absence of non-linear activation functions, this is due to the available multiplicative depth is limited in leveled homomorphic encryption (Leveled-HE). Subsequently, the FHE is employed to build deeper neural networks, and the bootstrapping operation [16] becomes necessary, this is because it can refresh both the noise level and the multiplicative depth of ciphertext computation, which enables to effectively transform the Leveled-HE into FHE and make it possible to construct arbitrarily deep neural networks. In the reference [19], the bootstrapping operation was used for the first time to realize a deep neural networks over FHE. However, the approach employed a high-degree polynomial to approximate the activation function ReLU, which consumed considerable multiplicative depth and required huge bootstrapping operations, ultimately resulting in a prolonged inference time.

As well known, the information propagation undergoes a long path along layers in the deep CNNs, thus, the operating in encrypted data over FHE accordingly costs a substantial time, a regular and valid solution is to optimize the procedure of convolution computation via an appropriate data-packing scheme, with the purpose of reducing the overall overhead. For instance, GAZELLE [20] offers an efficient FHE-matched convolution algorithm that significantly lowers the number of homomorphic operations needed for traditional convolution in the ciphertext domain. However, in deeper neural networks, the increasing number of input/output channels still leads to high overhead. Furtherly, building on GAZELLE [20], Lee [21] and Lee [19] propose more efficient convolution algorithms and succeed in implementing deeper neural networks. Nonetheless, due to the high computation cost in convolution and the substantial multiplicative depth required for the activation function ReLU, their approaches still demand more o

This content is AI-processed based on ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut