Navigating the Deep: End-to-End Extraction on Deep Neural Networks

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Neural network model extraction has recently emerged as an important security concern, as adversaries attempt to recover a network’s parameters via black-box queries. Carlini et al. proposed in CRYPTO'20 a model extraction approach, consisting of two steps: signature extraction and sign extraction. However, in practice this signature-extraction method is limited to very shallow networks only, and the proposed sign-extraction method is exponential in time. Recently, Canales-Martinez et al. (Eurocrypt'24) proposed a polynomial-time sign-extraction method, but it assumes the corresponding signatures have already been successfully extracted and can fail on so-called low-confidence neurons. In this work, we first revisit and refine the signature extraction process by systematically identifying and addressing for the first time critical limitations of Carlini et al.’s signature-extraction method. These limitations include rank deficiency and noise propagation from deeper layers. To overcome these challenges, we propose efficient algorithmic solutions for each of the identified issues. Our approach permits the extraction of much deeper networks than previously possible. In addition, we propose new methods to improve numerical precision in signature extraction, and enhance the sign extraction part by combining two polynomial methods to avoid exponential exhaustive search in the case of low-confidence neurons. This leads to the very first end-to-end model extraction method that runs in polynomial time. We validate our attack through extensive experiments on ReLU-based neural networks, demonstrating significant improvements in extraction depth. For instance, our attack extracts consistently at least eight layers of neural networks trained on either the MNIST or CIFAR-10 datasets, while previous works could barely extract the first three layers of networks of similar width.

💡 Research Summary

**
Model extraction attacks aim to reconstruct the parameters of a deployed neural network by only observing its input–output behavior. Since the seminal work of Carlini et al. (CRYPTO 2020), the community has viewed this problem through a cryptanalytic lens: a two‑step procedure first recovers the absolute values of the weights (the “signature”) and then determines their signs. While the signature‑extraction step works in principle, the original method suffers from two fundamental issues that become fatal as the depth of the network grows. First, the linear systems built from critical points often become rank‑deficient, leading to incorrect signatures for deeper layers. Second, activations from layers below the target layer can “leak” into the observed critical points, causing noise propagation that corrupts the recovered signatures. Moreover, the sign‑extraction step proposed by Carlini et al. is exponential in the number of neurons, and even the polynomial‑time improvement of Canales‑Martínez et al. (Eurocrypt 2024) assumes that perfect signatures are already available and still fails on low‑confidence neurons, resorting to exhaustive search.

The present paper systematically diagnoses these bottlenecks and proposes concrete algorithmic remedies that together enable the first end‑to‑end model extraction attack running in polynomial time on substantially deeper networks.

Signature extraction improvements

Rank augmentation – The authors observe that the set of critical points generated by the original algorithm does not span the full weight space for deep layers. They therefore introduce a controlled random perturbation scheme that generates additional, linearly independent critical points. By augmenting the original query set and solving an expanded linear system, they guarantee full rank (proved via singular‑value analysis) and thus obtain correct absolute weight values.
Deep‑layer noise filtering – To prevent activations from lower layers from contaminating the target layer’s signature, the paper tracks the activation pattern S(i) and the associated polytope Pₓ for each input. Using this information, they predict the contribution of deeper layers and subtract it before solving for the signature. This filtering dramatically reduces the variance of the recovered signatures and allows the method to scale to eight‑layer networks.

Sign extraction enhancements
The low‑confidence neuron problem is tackled by a hybrid of two polynomial‑time techniques originally proposed in

Navigating the Deep: End-to-End Extraction on Deep Neural Networks

💡 Research Summary

Comments & Academic Discussion

Leave a Comment