Stealing Machine Learning Models via Prediction APIs

Machine learning (ML) models may be deemed confidential due to their sensitive training data, commercial value, or use in security applications. Increasingly often, confidential ML models are being deployed with publicly accessible query interfaces. …

Authors: Florian Tram`er, Fan Zhang, Ari Juels

Stealing Machine Learning Models via Prediction APIs
Stealing Machine Learning Models via Pr ediction APIs Florian T ram ` er EPFL Fan Zhang Cornell University Ari Juels Cornell T ech, J acobs Institute Michael K. Reiter UNC Chapel Hill Thomas Ristenpart Cornell T ech Abstract Machine learning (ML) models may be deemed con- fidential due to their sensitiv e training data, commercial value, or use in security applications. Increasingly often, confidential ML models are being deployed with pub- licly accessible query interfaces. ML-as-a-service (“pre- dictiv e analytics”) systems are an example: Some allow users to train models on potentially sensitive data and charge others for access on a pay-per -query basis. The tension between model confidentiality and pub- lic access moti vates our inv estigation of model e xtraction attacks . In such attacks, an adv ersary with black-box ac- cess, but no prior knowledge of an ML model’ s param- eters or training data, aims to duplicate the functionality of (i.e., “steal”) the model. Unlike in classical learning theory settings, ML-as-a-service offerings may accept partial feature vectors as inputs and include confidence values with predictions. Given these practices, we sho w simple, ef ficient attacks that extract target ML models with near-perfect fidelity for popular model classes in- cluding logistic regression, neural networks, and deci- sion trees. W e demonstrate these attacks against the on- line services of BigML and Amazon Machine Learning. W e further show that the natural countermeasure of omit- ting confidence values from model outputs still admits potentially harmful model e xtraction attacks. Our results highlight the need for careful ML model deployment and new model e xtraction countermeasures. 1 Introduction Machine learning (ML) aims to provide automated ex- traction of insights from data by means of a predictiv e model. A predictiv e model is a function that maps feature vectors to a categorical or real-v alued output. In a super - vised setting, a previously gathered data set consisting This is an extended version of a paper that appeared at USENIX Security , 2016. of possibly confidential feature-vector inputs (e.g., digi- tized health records) with corresponding output class la- bels (e.g., a diagnosis) serves to train a predictiv e model that can generate labels on future inputs. Popular models include support vector machines (SVMs), logistic regres- sions, neural networks, and decision trees. ML algorithms’ success in the lab and in practice has led to an explosion in demand. Open-source frame works such as PredictionIO and cloud-based services offered by Amazon, Google, Microsoft, BigML, and others hav e arisen to broaden and simplify ML model deployment. Cloud-based ML services often allow model owners to charge others for queries to their commercially valuable models. This pay-per-query deployment option ex em- plifies an increasingly common tension: The query in- terface of an ML model may be widely accessible, yet the model itself and the data on which it was trained may be proprietary and confidential. Models may also be pri vac y-sensitive because the y leak information about training data [4, 23, 24]. For security applications such as spam or fraud detection [9, 29, 36, 56], an ML model’ s confidentiality is critical to its utility: An adversary that can learn the model can also often ev ade detection [4, 36]. In this paper we e xplore model extraction attac ks , which e xploit the tension between query access and con- fidentiality in ML models. W e consider an adv ersary that can query an ML model (a.k.a. a prediction API) to ob- tain predictions on input feature vectors. The model may be viewed as a black box. The adversary may or may not know the model type (logistic regression, decision tree, etc.) or the distribution over the data used to train the model. The adversary’ s goal is to extract an equi va- lent or near-equi valent ML model, i.e., one that achiev es (close to) 100% agreement on an input space of interest. W e demonstrate successful model e xtraction attacks against a wide v ariety of ML model types, including de- cision trees, logistic regressions, SVMs, and deep neu- ral netw orks, and against production ML-as-a-service Service Model T ype Data set Queries Time (s) Amazon Logistic Regression Digits 650 70 Logistic Regression Adult 1 , 485 149 BigML Decision T ree German Credit 1 , 150 631 Decision T ree Steak Survey 4 , 013 2 , 088 T able 1: Results of model extraction attacks on ML services. For each target model, we report the number of prediction queries made to the ML API in an attack that extracts a 100% equiv alent model. The attack time is primarily influenced by the service’ s prediction latency ( ≈ 100ms / query for Amazon and ≈ 500ms / query for BigML). (MLaaS) providers, including Amazon and BigML. 1 In nearly all cases, our attacks yield models that are func- tionally very close to the target. In some cases, our at- tacks extract the exact parameters of the target (e.g., the coefficients of a linear classifier or the paths of a decision tree). For some targets employing a model type, param- eters or features unknown to the attacker , we addition- ally show a successful preliminary attack step in volving rev erse-engineering these model characteristics. Our most successful attacks rely on the information- rich outputs returned by the ML prediction APIs of all cloud-based services we inv estigated. Those of Google, Amazon, Microsoft, and BigML all return high-pr ecision confidence values in addition to class labels . The y also respond to partial queries lacking one or more features. Our setting thus differs from traditional learning-theory settings [3, 7, 8, 15, 30, 33, 36, 54] that assume only mem- bership queries , outputs consisting of a class label only . For example, for logistic regression, the confidence v alue is a simple log-linear function 1 / ( 1 + e − ( w · x + β ) ) of the d - dimensional input vector x . By querying d + 1 random d -dimensional inputs, an attacker can with high proba- bility solve for the unknown d + 1 parameters w and β defining the model. W e emphasize that while this model extraction attack is simple and non-adapti ve, it affects all of the ML services we hav e in vestigated. Such equation-solving attacks extend to multiclass lo- gistic regressions and neural networks, but do not work for decision trees, a popular model choice. (BigML, for example, initially offered only decision trees.) For de- cision trees, a confidence value reflects the number of training data points labeled correctly on an input’ s path in the tree; simple equation-solving is thus inapplicable. W e show how confidence values can nonetheless be ex- ploited as pseudo-identifiers for paths in the tree, facili- tating discovery of the tree’ s structure. W e demonstrate successful model extraction attacks that use adaptive, it- erativ e search algorithms to discover paths in a tree. W e experimentally e valuate our attacks by training models on an array of public data sets suitable as stand- ins for proprietary ones. W e validate the attacks locally using standard ML libraries, and then present case stud- 1 W e simulated victims by training models in our o wn accounts. W e hav e disclosed our results to affected services in February 2016. ies on BigML and Amazon. For both services, we show computationally fast attacks that use a small number of queries to extract models matching the targets on 100% of tested inputs. See T able 1 for a quantitati ve summary . Having demonstrated the broad applicability of model extraction attacks to existing services, we consider the most ob vious potential countermeasure ML services might adopt: Omission of confidence values, i.e., output of class labels only . This approach would place model extraction back in the membership query setting of prior work in learning theory [3, 8, 36, 54]. W e demonstrate a generalization of an adaptive algorithm by Lowd and Meek [36] from binary linear classifiers to more com- plex model types, and also propose an attack inspired by the agnostic learning algorithm of Cohn et al. [18]. Our new attacks e xtract models matching tar gets on > 99% of the input space for a variety of model classes, but need up to 100 × more queries than equation-solving attacks (specifically for multiclass linear regression and neural networks). While less ef fective than equation-solving, these attacks remain attracti ve for certain types of adv er- sary . W e thus discuss further ideas for countermeasures. In summary , we explore model extraction attacks, a practical kind of learning task that, in particular , affects emerging cloud-based ML services being built by Ama- zon, Google, Microsoft, BigML, and others. W e sho w: • Simple equation-solving model extraction attacks that use non-adaptiv e, random queries to solve for the pa- rameters of a target model. These attacks affect a wide variety of ML models that output confidence values. W e show their success against Amazon’ s service (us- ing our o wn models as stand-ins for victims’), and also report successful re verse-engineering of the (only par- tially documented) model type employed by Amazon. • A new path-finding algorithm for extr acting decision tr ees that abuses confidence v alues as quasi-identifiers for paths. T o our knowledge, this is the first example of practical “exact” decision tree learning. W e demon- strate the attack’ s efficac y via experiments on BigML. • Model extr action attacks against models that output only class labels , the obvious countermeasure against extraction attacks that rely on confidence values. W e show slo wer, but still potentially dangerous, attacks in this setting that build on prior w ork in learning theory . W e additionally make a number of observ ations about the implications of extraction. For example, attacks against Amazon’ s system indirectly leak v arious summary statis- tics about a priv ate training set, while extraction against kernel logistic regression models [58] recovers signifi- cant information about individual training data points. The source code for our attacks is av ailable online at https://github . com/ftramer/Steal- ML . 2 Background For our purposes, a ML model is a function f : X → Y . An input is a d -dimensional vector in the feature space X = X 1 × X 2 × · · · × X d . Outputs lie in the range Y . W e distinguish between categorical features, which as- sume one of a finite set of values (whose set size is the arity of the feature), and continuous features, which as- sume a value in a bounded subset of the real numbers. W ithout loss of generality , for a cate gorical feature of ar- ity k , we let X i = Z k . For a continuous feature taking values between bounds a and b , we let X i = [ a , b ] ⊂ R . Inputs to a model may be pre-processed to perform fea- ture extraction. In this case, inputs come from a space M , and feature e xtraction in volv es application of a func- tion ex : M → X that maps inputs into a feature space. Model application then proceeds by composition in the natural way , taking the form f ( ex ( M )) . Generally , fea- ture e xtraction is man y-to-one. F or example, M may be a piece of English language text and the e xtracted features counts of individual words (so-called “bag-of- words” feature e xtraction). Other examples are input scaling and one-hot-encoding of categorical features. W e focus primarily on classification settings in which f predicts a nominal variable ranging over a set of classes. Given c classes, we use as class labels the set Z c . If Y = Z c , the model returns only the predicted class label. In some applications, ho we ver , additional informa- tion is often helpful, in the form of real-valued measures of confidence on the labels output by the model; these measures are called confidence values . The output space is then Y = [ 0 , 1 ] c . For a gi ven x ∈ X and i ∈ Z c , we de- note by f i ( x ) the i th component of f ( x ) ∈ Y . The value f i ( x ) is a model-assigned probability that x has associ- ated class label i . The model’ s predicted class is defined by the value ar gmax i f i ( x ) , i.e., the most probable label. W e associate with Y a distance measure d Y . W e drop the subscript Y when it is clear from conte xt. For Y = Z c we use 0-1 distance, meaning d ( y , y 0 ) = 0 if y = y 0 and d ( y , y 0 ) = 1 otherwise. For Y = [ 0 , 1 ] c , we use the 0-1 distance when comparing predicted classes; when com- paring class probabilities directly , we instead use the to- tal variation distance, giv en by d ( y , y 0 ) = 1 2 ∑ | y [ i ] − y 0 [ i ] | . In the rest of this paper , unless explicitly specified other- wise, d Y refers to the 0-1 distance ov er class labels. T raining algorithms. W e consider models obtained via supervised learning. These models are generated by a training algorithm T that takes as input a training set { ( x i , y i ) } i , where ( x i , y i ) ∈ X × Y is an input with an as- sociated (presumptiv ely correct) class label. The output of T is a model f defined by a set of parameters , which are model-specific, and hyper-par ameters , which spec- ify the type of models T generates. Hyper-parameters DB# Data#owner# Train# model## Extrac3on# adversary# ˆ f ML#service# f ( x 1 ) f ( x q ) x q x 1 …# Figure 1: Diagram of ML model extraction attacks. A data o wner has a model f trained on its data and allows others to make prediction queries. An adversary uses q prediction queries to e xtract an ˆ f ≈ f . may be viewed as distinguished parameters, often taken from a small number of standard values; for e xample, the kernel-type used in an SVM, of which only a small set are used in practice, may be seen as a hyper-parameter . 3 Model Extraction Attacks An ML model extraction attack arises when an adversary obtains black-box access to some tar get model f and at- tempts to learn a model ˆ f that closely approximates, or ev en matches, f (see Figure 1). As mentioned previously , the restricted case in which f outputs class labels only , matches the membership query setting considered in learning theory , e.g., P A C learning [54] and other previous works [3, 7, 8, 15, 30, 33, 36]. Learning theory algorithms have seen only lim- ited study in practice, e.g., in [36], and our in vestiga- tion may be viewed as a practice-oriented exploration of this branch of research. Our initial focus, howe ver , is on a different setting common in today’ s MLaaS services, which we no w explain in detail. Models trained by these services emit data-rich outputs that often include confi- dence values, and in which partial feature vectors may be considered v alid inputs. As we sho w later , this setting greatly advantages adv ersaries. Machine learning services. A number of companies hav e launched or are planning to launch cloud-based ML services. A common denominator is the ability of users to upload data sets, have the provider run training algo- rithms on the data, and mak e the resulting models gener- ally available for prediction queries. Simple-to-use W eb APIs handle the entire interaction. This service model lets users capitalize on their data without having to set up their own large-scale ML infrastructure. Details vary greatly across services. W e summarize a number of them in T able 2 and no w explain some of the salient features. A model is white-box if a user may download a rep- resentation suitable for local use. It is black-box if ac- cessible only via a prediction query interface. Ama- zon and Google, for example, provide black-box-only services. Google does not ev en specify what training algorithm their service uses, while Amazon provides only partial documentation for its feature extraction ex Service White-box Monetize Confidence Scores Logistic Regression SVM Neural Network Decision T ree Amazon [1] 7 7 3 3 7 7 7 Microsoft [38] 7 7 3 3 3 3 3 BigML [11] 3 3 3 3 7 7 3 PredictionIO [44] 3 7 7 3 3 7 3 Google [25] 7 3 3 3 3 3 3 T able 2: Particularities of major MLaaS providers. ‘White-box’ refers to the ability to download and use a trained model locally , and ‘Monetize’ means that a user may charge other users for black-box access to her models. Model support for each service is obtained from av ailable documentation. The models listed for Google’ s API are a pro- jection based on the announced support of models in standard PMML format [25]. Details on ML models are gi ven in Appendix A. (see Section 5). Some services allow users to monetize trained models by char ging others for prediction queries. T o use these services, a user uploads a data set and optionally applies some data pre-processing (e.g., field remov al or handling of missing v alues). She then trains a model by either choosing one of many supported model classes (as in BigML, Microsoft, and PredictionIO) or having the service choose an appropriate model class (as in Amazon and Google). T wo services ha ve also an- nounced upcoming support for users to upload their o wn trained models (Google) and their own custom learning algorithms (PredictionIO). When training a model, users may tune various parameters of the model or training- algorithm (e.g., re gularizers, tree size, learning rates) and control feature-extraction and transformation methods. For black-box models, the service provides users with information needed to create and interpret predictions, such as the list of input features and their types. Some services also supply the model class, chosen training pa- rameters, and training data statistics (e.g., BigML giv es the range, mean, and standard de viation of each feature). T o get a prediction from a model, a user sends one or more input queries. The services we revie wed accept both synchronous requests and asynchronous ‘batch’ re- quests for multiple predictions. W e further found vary- ing degrees of support for ‘incomplete’ queries, in which some input features are left unspecified [47]. W e will show that exploiting incomplete queries can drastically improv e the success of some of our attacks. Apart from PredictionIO, all of the services we examined respond to prediction queries with not only class labels, but a variety of additional information, including confidence scor es (typically class probabilities) for the predicted outputs. Google and BigML allow model owners to mone- tize their models by char ging other users for predictions. Google sets a minimum price of $0 . 50 per 1 , 000 queries. On BigML, 1 , 000 queries consume at least 100 cr edits , costing $0 . 10–$5, depending on the user’ s subscription. Attack scenarios. W e now describe possible motiv a- tions for adversaries to perform model e xtraction at tacks. W e then present a more detailed threat model informed by characteristics of the aforementioned ML services. A voiding query char ges. Successful monetization of prediction queries by the owner of an ML model f re- quires confidentiality of f . A malicious user may seek to launch what we call a cr oss-user model extraction attack, stealing f for subsequent free use. More subtly , in black- box-only settings (e.g., Google and Amazon), a service’ s business model may in volv e amortizing up-front training costs by charging users for future predictions. A model extraction attack will undermine the provider’ s business model if a malicious user pays less for training and ex- tracting than for paying per-query char ges. V iolating training-data privacy . Model extraction could, in turn, leak information about sensitiv e training data. Prior attacks such as model inv ersion [4, 23, 24] hav e shown that access to a model can be abused to infer information about training set points. Many of these at- tacks work better in white-box settings; model extraction may thus be a stepping stone to such priv acy-abusing at- tacks. Looking ahead, we will see that in some cases, significant information about training data is leaked triv- ially by successful model extraction, because the model itself directly incorporates training set points. Stepping stone to evasion. In settings where an ML model serves to detect adversarial behavior , such as iden- tification of spam, malware classification, and network anomaly detection, model extraction can facilitate eva- sion attacks . An adversary may use knowledge of the ML model to av oid detection by it [4, 9, 29, 36, 56]. In all of these settings, there is an inherent assumption of secrecy of the ML model in use. W e show that this assumption is broken for all ML APIs that we in vestigate. Threat model in detail. T wo distinct adversarial mod- els arise in practice. An adversary may be able to make dir ect queries, providing an arbitrary input x to a model f and obtaining the output f ( x ) . Or the adversary may be able to mak e only indir ect queries, i.e., queries on points in input space M yielding outputs f ( ex ( M )) . The feature extraction mechanism ex may be unknown to the adver- sary . In Section 5, we show how ML APIs can further be exploited to “learn” feature extraction mechanisms. Both direct and indirect access to f arise in ML services. (Direct query interfaces arise when clients are expected to perform feature extraction locally .) In either case, the output v alue can be a class label, a confidence v alue vec- tor , or some data structure rev ealing various lev els of in- formation, depending on the exposed API. W e model the adversary , denoted by A , as a random- ized algorithm. The adversary’ s goal is to use as few queries as possible to f in order to ef ficiently compute an approximation ˆ f that closely matches f . W e formalize “closely matching” using two dif ferent error measures: • T est err or R test : This is the average error o ver a test set D , given by R test ( f , ˆ f ) = ∑ ( x , y ) ∈ D d ( f ( x ) , ˆ f ( x )) / | D | . A low test error implies that ˆ f matches f well for in- puts distributed lik e the training data samples. 2 • Uniform err or R unif : For a set U of vectors uniformly chosen in X , let R unif ( f , ˆ f ) = ∑ x ∈ U d ( f ( x ) , ˆ f ( x )) / | U | . Thus R unif estimates the fraction of the full feature space on which f and ˆ f disagree. (In our e xperiments, we found | U | = 10 , 000 w as suf ficiently large to obtain stable error estimates for the models we analyzed.) W e define the extraction accuracy under test and uni- form error as 1 − R test ( f , ˆ f ) and 1 − R unif ( f , ˆ f ) . Here we implicitly refer to accuracy under 0-1 distance. When as- sessing how close the class probabilities output by ˆ f are to those of f (with the total-variation distance) we use the notations R TV test ( f , ˆ f ) and R TV unif ( f , ˆ f ) . An adversary may know any of a number of pieces of information about a target f : What training algorithm T generated f , the hyper-parameters used with T , the feature extraction function ex , etc. W e will in vestigate a variety of settings in this work corresponding to dif ferent APIs seen in practice. W e assume that A has no more information about a model’ s training data, than what is provided by an ML API (e.g., summary statistics). For simplicity , we focus on pr oper model extraction: If A believ es that f belongs to some model class, then A ’ s goal is to extract a model ˆ f from the same class. W e discuss some intuition in fav or of proper extraction in Appendix D, and leave a broader treatment of impr oper extraction strate gies as an interesting open problem. 4 Extraction with Confidence V alues W e begin our study of extraction attacks by focusing on prediction APIs that return confidence values. As per Section 2, the output of a query to f thus falls in a range [ 0 , 1 ] c where c is the number of classes. T o moti vate this, we recall that most ML APIs rev eal confidence val ues for models that support them (see T able 2). This includes logistic regressions (LR), neural networks, and decision trees, defined formally in Appendix A. W e first introduce a generic equation-solving attack that applies to all logis- tic models (LR and neural networks). In Section 4.2, we present two nov el path-finding attacks on decision trees. 4.1 Equation-Solving Attacks Many ML models we consider directly compute class probabilities as a continuous function of the input x and 2 Note that for some D , it is possible that ˆ f predicts true labels better than f , yet R test ( f , ˆ f ) is large, because ˆ f does not closely match f . Data set Synthetic # records # classes # features Circles Y es 5 , 000 2 2 Moons Y es 5 , 000 2 2 Blobs Y es 5 , 000 3 2 5-Class Y es 1 , 000 5 20 Adult (Income) No 48 , 842 2 108 Adult (Race) No 48 , 842 5 105 Iris No 150 3 4 Steak Survey No 331 5 40 GSS Survey No 16 , 127 3 101 Digits No 1 , 797 10 64 Breast Cancer No 683 2 10 Mushrooms No 8 , 124 2 112 Diabetes No 768 2 8 T able 3: Data sets used for extraction attacks. W e train two models on the Adult data, with targets ‘Income’ and ‘Race’. SVMs and binary logistic regres- sions are trained on data sets with 2 classes. Multiclass regressions and neural networks are trained on multiclass data sets. For decision trees, we use a set of public models shown in T able 5. real-valued model parameters. In this case, an API that rev eals these class probabilities provides an adversary A with samples ( x , f ( x )) that can be viewed as equations in the unknown model parameters. F or a large class of models, these equation systems can be ef ficiently solved, thus recov ering f (or some good approximation of it). Our approach for ev aluating attacks will primarily be experimental. W e use a suite of synthetic or pub- licly av ailable data sets to serve as stand-ins for propri- etary data that might be the target of an extraction at- tack. T able 3 displays the data sets used in this section, which we obtained from various sources: the synthetic ones we generated; the others are taken from public surve ys ( Steak Survey [26] and GSS Surve y [50]), from scikit [43] ( Digits ) or from the UCI ML library [35]. Mre details about these data sets are in Appendix B. Before training, we remove ro ws with missing values, apply one-hot-encoding to categorical features, and scale all numeric features to the range [ − 1 , 1 ] . W e train our models over a randomly chosen subset of 70% of the data, and keep the rest for ev aluation (i.e., to calculate R test ). W e discuss the impact of different pre-processing and feature extraction steps in Section 5, when we ev alu- ate equation-solving attacks on production ML services. 4.1.1 Binary logistic regression As a simple starting point, we consider the case of logis- tic regression (LR). A LR model performs binary clas- sification ( c = 2 ) , by estimating the probability of a bi- nary response, based on a number of independent fea- tures. LR is one of the most popular binary classifiers, due to its simplicity and efficienc y . It is widely used in many scientific fields (e.g., medical and social sciences) and is supported by all the ML services we revie wed. Formally , a LR model is defined by parameters w ∈ R d , β ∈ R , and outputs a probability f 1 ( x ) = σ ( w · x + β ) , where σ ( t ) = 1 / ( 1 + e − t ) . LR is a linear classifier: it defines a hyperplane in the feature space X (defined by w · x + β = 0), that separates the two classes. Giv en an oracle sample ( x , f ( x )) , we get a linear equa- tion w · x + β = σ − 1 ( f 1 ( x )) . Thus, d + 1 samples are both necessary and sufficient (if the queried x are linearly in- dependent) to recov er w and β . Note that the required samples are chosen non-adaptiv ely , and can thus be ob- tained from a single batch request to the ML service. W e stress that while this e xtraction attack is rather straightforward, it directly applies, with possibly dev as- tating consequences, to all cloud-based ML services we considered. As an example, recall that some services (e.g., BigML and Google) let model owners monetize black-box access to their models. An y user who wishes to make more than d + 1 queries to a model would then minimize the prediction cost by first running a cross- user model extraction attack, and then using the extracted model for personal use, free of charge. As mentioned in Section 3, attackers with a final goal of model-inv ersion or evasion may also have incentiv es to first extract the model. Moreov er , for services with black-box-only ac- cess (e.g., Amazon or Google), a user may ab use the ser - vice’ s resources to train a model ov er a large data set D (i.e., | D |  d ), and extract it after only d + 1 predictions. Crucially , the extraction cost is independent of | D | . This could undermine a service’ s business model, should pre- diction fees be used to amortize the high cost of training. For each binary data set shown in T able 3, we train a LR model and extract it giv en d + 1 predictions. In all cases, we achieve R test = R unif = 0. If we compare the probabilities output by f and ˆ f , R TV test and R TV unif are lower than 10 − 9 . For these models, the attack requires only 41 queries on av erage, and 113 at most. On Google’ s plat- form for example, an extraction attack would cost less than $0 . 10, and subv ert any further model monetization. 4.1.2 Multiclass LRs and Multilayer Per ceptrons W e no w sho w that such equation-solving attacks broadly extend to all model classes with a ‘logistic’ layer , includ- ing multiclass ( c > 2 ) LR and deeper neural networks. W e define these models formally in Appendix A. A multiclass logistic re gression (MLR) combines c bi- nary models, each with parameters w i , β i , to form a mul- ticlass model. MLRs are av ailable in all ML services we revie wed. W e consider two types of MLR models: soft- max and one-vs-rest (OvR), that differ in how the c bi- nary models are trained and combined: A softmax model fits a joint multinomial distribution to all training sam- ples, while a OvR model trains a separate binary LR for each class, and then normalizes the class probabilities. A MLR model f is defined by parameters w ∈ R cd , β β β ∈ R c . Each sample ( x , f ( x )) gives c equations in w and β β β . The equation system is non-linear howe ver , and has no analytic solution. For softmax models for instance, the equations take the form e w i · x + β i / ( ∑ c − 1 j = 0 e w j · x + β j ) = Model Unknowns Queries 1 − R test 1 − R unif Time (s) Softmax 530 265 99 . 96% 99 . 75% 2 . 6 530 100 . 00% 100 . 00% 3 . 1 OvR 530 265 99 . 98% 99 . 98% 2 . 8 530 100 . 00% 100 . 00% 3 . 5 MLP 2 , 225 1 , 112 98 . 17% 94 . 32% 155 2 , 225 98 . 68% 97 . 23% 168 4 , 450 99 . 89% 99 . 82% 195 11 , 125 99 . 96% 99 . 99% 89 T able 4: Success of equation-solving attacks. Models to extract were trained on the Adult data set with multiclass target ‘Race’. For each model, we report the number of unknown model parameters, the number of queries used, and the running time of the equation solver . The attack on the MLP with 11 , 125 queries con verged after 490 epochs. f i ( x ) . A common method for solving such a system is by minimizing an appropriate loss function, such as the logistic loss. W ith a regularization term, the loss func- tion is str ongly con vex , and the optimization thus con- ver ges to a global minimum (i.e., a function ˆ f that pre- dicts the same probabilities as f for all av ailable sam- ples). A similar optimization (over class labels rather than probabilities) is actually used for training logistic models. Any MLR implementation can thus easily be adapted for model extraction with equation-solving. This approach naturally extends to deeper neural net- works. W e consider multilayer perceptrons (MLP), that first apply a non-linear transform to all inputs (the hid- den layer), follo wed by a softmax regression in the trans- formed space. MLPs are becoming increasingly popular due to the continued success of deep learning methods; the advent of cloud-based ML services is likely to further boost their adoption. For our attacks, MLPs and MLRs mainly differ in the number of unknowns in the system to solve. For perceptrons with one hidden layer , we hav e w ∈ R d h + hc , β β β ∈ R h + c , where h is the number of hidden nodes ( h = 20 in our experiments). Another difference is that the loss function for MLPs is not strongly con vex. The optimization may thus con verge to a local minimum, i.e., a model ˆ f that does not exactly match f ’ s behavior . T o illustrate our attack’ s success, we train a softmax regression, a OvR regression and a MLP on the Adult data set with target ‘Race’ ( c = 5). For the non-linear equation systems we obtain, we do not know a priori how many samples we need to find a solution (in con- trast to linear systems where d + 1 samples are necessary and sufficient). W e thus explore various query budgets of the form α · k , where k is the number of unknown model parameters, and α is a budget scaling factor . For MLRs, we solve the equation system with BFGS [41] in scikit [43]. For MLPs, we use theano [52] to run stochastic gradient descent for 1 , 000 epochs. Our experi- ments were performed on a commodity laptop (2-core In- tel CPU @3.1GHz, 16GB RAM, no GPU acceleration). T able 4 sho ws the extraction success for each model, as we vary α from 0 . 5 to at most 5. For MLR models (a) (b) Figure 2: T raining data leakage in KLR models. (a) Displays 5 of 20 training samples used as representers in a KLR model (top) and 5 of 20 extracted representers (bottom). (b) For a second model, shows the av erage of all 1 , 257 representers that the model classifies as a 3 , 4 , 5 , 6 or 7 (top) and 5 of 10 extracted representers (bottom). (softmax and OvR), the attack is extremely efficient, re- quiring around one query per unknown parameter of f (each query yields c = 5 equations). For MLPs, the sys- tem to solve is more complex, with about 4 times more unknowns. W ith a sufficiently over -determined system, we con verge to a model ˆ f that v ery closely approximates f . As for LR models, queries are chosen non-adapti vely , so A may submit a single ‘batch request’ to the API. W e further ev aluated our attacks over all multiclass data sets from T able 3. For MLR models with k = c · ( d + 1 ) parameters ( c is the number of classes), k queries were sufficient to achie ve perfect extraction ( R test = R unif = 0, R TV test and R TV unif below 10 − 7 ). W e use 260 samples on av erage, and 650 for the lar gest model (Digits). F or MLPs with 20 hidden nodes, we achie ved > 99.9% accu- racy with 5 , 410 samples on av erage and 11 , 125 at most (Adult). W ith 54 , 100 queries on average, we extracted a ˆ f with 100% accuracy ov er tested inputs. As for binary LRs, we thus find that cross-user model extraction at- tacks for these model classes can be extremely ef ficient. 4.1.3 T raining Data Leakage for K ernel LR W e now mov e to a less mainstream model class, kernel logistic re gression [58], that illustrates how extraction at- tacks can leak priv ate training data, when a model’ s out- puts are directly computed as a function of that data. Kernel methods are commonly used to efficiently ex- tend support vector machines (SVM) to nonlinear clas- sifiers [14], but similar techniques can be applied to lo- gistic re gression [58]. Compared to kernel SVMs, k ernel logistic regressions (KLR) hav e the adv antage of com- puting class probabilities, and of naturally extending to multiclass problems. Y et, KLRs have not reached the popularity of kernel SVMs or standard LRs, and are not provided by any MLaaS provider at the time. W e note that KLRs could easily be constructed in any ML library that supports both kernel functions and LR models. A KLR model is a softmax model, where we re- place the linear components w i · x + β i by a mapping ∑ s r = 1 α i , r K ( x , x r ) + β i . Here, K is a kernel function, and the r epresenter s x 1 , . . . , x s are a chosen subset of the training points [58]. More details are in Appendix A. Each sample ( x , f ( x )) from a KLR model yields c equations over the parameters α α α ∈ R sc , β β β ∈ R c and the representers x 1 , . . . , x s . Thus, by querying the model, A obtains a non-linear equation system, the solution of which leaks training data. This assumes that A knows the exact number s of representers sampled from the data. Howe ver , we can relax this assumption: First, note that f ’ s outputs are unchanged by adding ‘extra’ representers, with weights α = 0. Thus, over -estimating s still results in a consistent system of equations, of which a solution is the model f , augmented with unused representers. W e will also show experimentally that training data may leak ev en if A extracts a model ˆ f with s 0  s representers. W e build two KLR models with a radial-basis function (RBF) kernel for a data set of handwritten digits. W e se- lect 20 random digits as representers for the first model, and all 1 , 257 training points for the second. W e extract the first model, assuming knowledge of s , by solving a system of 50 , 000 equations in 1 , 490 unknowns. W e use the same approach as for MLPs, i.e., logistic-loss min- imization using gradient descent. W e initialize the ex- tracted representers to uniformly random vectors in X , as we assume A does not kno w the training data distribu- tion. In Figure 2a, we plot 5 of the model’ s representers from the training data, and the 5 closest (in l 1 norm) ex- tracted representers. The attack clearly leaks information on individual training points. W e measure the attack’ s ro- bustness to uncertainty about s , by attacking the second model with only 10 local representers (10 , 000 equations in 750 unknowns). Figure 2b shows the average image of training points classified as a 3 , 4 , 5 , 6 or 7 by the tar- get model f , along with 5 extracted representers of ˆ f . Surprisingly maybe, the attack seems to be leaking the ‘av erage representor’ of each class in the training data. 4.1.4 Model In version Attacks on Extracted Models Access to a model may enable inference of priv acy- damaging information, particularly about the training set [4, 23, 24]. The model in version attack explored by Fredrikson et al. [23] uses access to a classifier f to find the input x opt that maximizes the class probability for class i , i.e., x opt = argmax x ∈ X f i ( x ) . This w as sho wn to allow recovery of recognizable images of training set members’ faces when f is a facial recognition model. Their attacks work best in a white-box setting, where the attacker kno ws f and its parameters. Y et, the authors also note that in a black-box setting, remote queries to a prediction API, combined with numerical approximation techniques, enable successful, albeit much less effic ient, attacks. Furthermore, their black-box attacks inherently require f to be queried adaptively . They leave as an open question making black-box attacks more efficient. W e explore composing an attack that first attempts to extr act a model ˆ f ≈ f , and then uses it with the [23] white-box inv ersion attack. Our extraction techniques re- place adapti ve queries with a non-adapti ve “batch” query to f , followed by local computation. W e show that ex- traction plus in version can require fe wer queries and less time than performing black-box in version directly . As a case study , we use the softmax model from [23], trained over the A T&T Faces data [5]. The data set con- sists of images of faces (92 × 112 pixels) of 40 peo- ple. The black-box attack from [23] needs about 20 , 600 queries to reconstruct a recognizable face for a single training set indi vidual. Reconstructing the faces of all 40 individuals w ould require around 800 , 000 online queries. The trained softmax model is much larger than those considered in Section 4.1, with 412 , 160 unknowns ( d = 10 , 304 and c = 40). W e solve an under-determined sys- tem with 41 , 216 equations (using gradient descent with 200 epochs), and recov er a model ˆ f achieving R TV test , R TV unif in the order of 10 − 3 . Note that the number of model parameters to extract is linear in the number of people c , whose faces we hope to recover . By using ˆ f in white-box model in version attacks, we obtain results that are visu- ally indistinguishable from the ones obtained using the true f . Giv en the extracted model ˆ f , we can recover all 40 faces using white-box attacks, incurring around 20 × fewer remote queries to f than with 40 black-box attacks. For black-box attacks, the authors of [23] estimate a query latency of 70 milliseconds (a little less than in our own measurements of ML services, see T able 1). Thus, it takes 24 minutes to reco ver a single face (the in version attack runs in seconds), and 16 hours to recover all 40 im- ages. In contrast, solving the large equation system un- derlying our model-extraction attack took 10 hours. The 41 , 216 online queries would take under one hour if ex e- cuted sequentially and even less with a batch query . The cost of the 40 local white-box attacks is negligible. Thus, if the goal is to reconstruct f aces for all 40 train- ing individuals, performing model inv ersion over a pre- viously extracted model results in an attack that is both faster and requires 20 × fewer online queries. 4.2 Decision T ree Path-Finding Attacks Contrary to logistic models, decision trees do not com- pute class probabilities as a continuous function of their input. Rather , decision trees partition the input space into discrete regions, each of which is assigned a label and confidence score. W e propose a new path-finding attack, that exploits API particularities to extract the ‘decisions’ taken by a tree when classifying an input. Prior work on decision tree extraction [7, 12, 33] has focused on trees with Boolean features and outputs. While of theoretical importance, such trees have limited practical use. Kushilevitz and Mansour [33] showed that Boolean trees can be extracted using membership queries (arbitrary queries for class labels), b ut their algorithm does not extend to more general trees. Here, we propose attacks that exploit ML API specificities, and that apply to decision tree models used in MLaaS platforms. Our tree model, defined formally in Appendix A, al- lows for binary and multi-ary splits over cate gorical fea- tures, and binary splits over numeric features. Each leaf of the tree is labeled with a class label and a confidence score. W e note that our attacks also apply (often with bet- ter results) to r egr ession tr ees . In regression trees, each leaf is labeled with a real-valued output and confidence. The key idea behind our attack is to use the rich in- formation provided by APIs on a prediction query , as a pseudo-identifier for the path that the input traversed in the tree. By varying the value of each input feature, we then find the predicates to be satisfied, for an input to follow a giv en path in the tree. W e will also exploit the ability to query incomplete inputs , in which each feature x i is chosen from a space X i ∪ { ⊥ } , where ⊥ encodes the absence of a value. One way of handling such inputs ([11, 47]) is to label each node in the tree with an output value. On an input, we traverse the tree until we reach a leaf or an internal node with a split over a missing fea- ture, and output that value of that leaf or node. W e formalize these notions by defining oracles that A can query to obtain an identifier for the leaf or inter- nal node reached by an input. In practice, we instantiate these oracles using prediction API peculiarities. Definition 1 (Identity Oracles) . Let each node v of a tr ee T be assigned some identifier id v . A leaf-identity oracle O takes as input a query x ∈ X and r eturns the identifier of the leaf of the tr ee T that is reac hed on input x . A node-identity oracle O ⊥ takes as input a query x ∈ X 1 ∪ { ⊥ } × · · · × X d ∪ { ⊥ } and returns the identifier of the node or leaf of T at whic h the tr ee computation halts. 4.2.1 Extraction Algorithms W e now present our path-finding attack (Algorithm 1), that assumes a leaf-identity oracle that returns unique identifiers for each leaf. W e will relax the uniqueness assumption further on. The attack starts with a random input x and gets the leaf id from the oracle. W e then search for all constraints on x that have to be satisfied to remain in that leaf, using procedures LINE SEARCH (for continuous features) and CAT SPLIT (for categorical fea- tures) described belo w . From this information, we then create new queries for unvisited leav es. Once all leav es hav e been found, the algorithm returns, for each leaf, the corresponding constraints on x . W e analyze the algo- rithm’ s correctness and complexity in Appendix C. W e illustrate our algorithm with a toy example of a tree over continuous feature Size and categorical feature Color (see Figure 3). The current query is x = { S ize = 50 , C ol or = R } and O ( x ) = id 2 . Our goal is two-fold: Algorithm 1 The path-finding algorithm. The notation id ← O ( x ) means querying the leaf-identity oracle O with an input x and obtaining a response id . By x [ i ] ⇒ v we denote the query x 0 obtained from x by replacing the value of x i by v . 1: x init ← { x 1 , . . . , x d }  random initial query 2: Q ← { x init }  Set of unprocessed queries 3: P ← {}  Set of explored leaves with their predicates 4: while Q not empty do 5: x ← Q . P O P () 6: id ← O ( x )  Call to the leaf identity oracle 7: if id ∈ P then  Check if leaf already visited 8: continue 9: end if 10: for 1 ≤ i ≤ d do  T est all features 11: if I S C O N T I N U O U S ( i ) then 12: for ( α , β ] ∈ L I N E S E A R C H ( x , i , ε ) do 13: if x i ∈ ( α , β ] then 14: P [ id ] . A D D ( ‘ x i ∈ ( α , β ] ‘ )  Current interval 15: else 16: Q . P U S H ( x [ i ] ⇒ β )  New leaf to visit 17: end if 18: end for 19: else 20: S , V ← C ATE G O RY S P L I T ( x , i , id ) 21: P [ id ] . A D D ( ‘ x i ∈ S ‘ )  V alues for current leaf 22: for v ∈ V do 23: Q . P U S H ( x [ i ] ⇒ v )  New lea ves to visit 24: end for 25: end if 26: end for 27: end while (1) Find the pr edicates that x has to satisfy to end up in leaf id 2 (i.e., Size ∈ ( 40 , 60 ] , C ol or = R ), and (2) create new inputs x 0 to explore other paths in the tree. The LINE SEARCH procedure (line 12) tests continu- ous featur es . W e start from bounds on the range of a fea- ture X i = [ a , b ] . In our example, we hav e Size ∈ [ 0 , 100 ] . W e set the value of Size in x to 0 and 100, query O , and obtain id 1 and id 5 . As the id s do not match, a split on Size occurs on the path to id 2 . With a binary search o ver feature Size (and all other features in x fix ed), we find all intervals that lead to different leav es, i.e., [ 0 , 40 ] , ( 40 , 60 ] , ( 60 , 100 ] . From these intervals, we find the predicate for the current leaf (i.e., Size ∈ ( 40 , 60 ] ) and build queries to explore ne w tree paths. T o ensure termination of the line search, we specify some pr ecision ε . If a split is on a threshold t , we find the v alue ˜ t that is the unique multiple of ε in the range ( t − ε , t ] . For values x i with granularity ε , splitting on ˜ t is then equiv alent to splitting on t . The CATEGORY SPLIT procedure (line 20) finds splits on categorical features . In our example, we vary the value of Color in x and query O to get a leaf id for each value. W e then b uild a set S of values that lead to the cur- rent leaf, i.e., S = { R } , and a set V of v alues to set in x to explore other leaves (one representativ e per leaf). In our example, we could ha ve V = { B , G , Y } or V = { B , G , O } . Using these two procedures, we thus find the pred- icates defining the path to leaf id 2 , and generate ne w queries x 0 for un visited leav es of the tree. Color Size id 1 ≤ 40 Size Color id 2 = R id 3 = B id 4 = G ≤ 60 id 5 > 60 > 40 ∈ { R , B , G } id 6 ∈ { Y , O } Figure 3: Decision tree over features Color and Size. Shows the path (thick green) to leaf id 2 on input x = { S ize = 50 , C ol or = R } . Data set # records # classes # featur es IRS T ax Patterns 191 , 283 51 31 Steak Survey 430 5 12 GSS Survey 51 , 020 3 7 Email Importance 4 , 709 2 14 Email Spam 4 , 601 2 46 German Credit 1 , 000 2 11 Medical Cover 163 , 065 Y = R 13 Bitcoin Price 1 , 076 Y = R 7 T able 5: Data sets used for decision tree extraction. Trained trees for these data sets are a vailable in BigML ’ s public gallery . The last two data sets are used to train regression trees. A top-down approach. W e propose an empirically more efficient top-down algorithm that exploits queries ov er partial inputs. It extracts the tree ‘layer by layer’, starting at the root: W e start with an empty query (all features set to ⊥ ) and get the root’ s id by querying O ⊥ . W e then set each feature in turn and query O again. For exactly one feature (the root’ s splitting feature), the input will reach a different node. W ith similar procedures as described pre viously , we e xtract the root’ s splitting crite- rion, and recursiv ely search lower layers of the tree. Duplicate identities. As we verify empirically , our at- tacks are resilient to some nodes or leaves sharing the same id . W e can modify line 7 in Algorithm 1 to detect id duplicates, by checking not only whether a leaf with the current id was already visited, but also whether the current query violates that leaf ’ s predicates. The main issue with duplicate id s comes from the LINE SEARCH and CATEGORY SPLIT procedures: if two queries x and x 0 differ in a single feature and reach different lea ves with the same id , the split on that feature will be missed. 4.2.2 Attack Evaluation Our tree model (see Appendix A) is the one used by BigML. Other ML services use similar tree models. For our experiments, we downloaded eight public decision trees from BigML (see T able 5), and queried them lo- cally using av ailable API bindings. More details on these models are in Appendix B. W e show online extraction attacks on black-box models from BigML in Section 5. T o emulate black-box model access, we first issue online queries to BigML, to determine the information contained in the service’ s responses. W e then simulate Without incomplete queries With incomplete queries Model Leaves Unique IDs Depth 1 − R test 1 − R unif Queries 1 − R test 1 − R unif Queries IRS T ax Patterns 318 318 8 100 . 00% 100 . 00% 101 , 057 100 . 00% 100 . 00% 29 , 609 Steak Survey 193 28 17 92 . 45% 86 . 40% 3 , 652 100 . 00% 100 . 00% 4 , 013 GSS Survey 159 113 8 99 . 98% 99 . 61% 7 , 434 100 . 00% 99 . 65% 2 , 752 Email Importance 109 55 17 99 . 13% 99 . 90% 12 , 888 99 . 81% 99 . 99% 4 , 081 Email Spam 219 78 29 87 . 20% 100 . 00% 42 , 324 99 . 70% 100 . 00% 21 , 808 German Credit 26 25 11 100 . 00% 100 . 00% 1 , 722 100 . 00% 100 . 00% 1 , 150 Medical Cover 49 49 11 100 . 00% 100 . 00% 5 , 966 100 . 00% 100 . 00% 1 , 788 Bitcoin Price 155 155 9 100 . 00% 100 . 00% 31 , 956 100 . 00% 100 . 00% 7 , 390 T able 6: Performance of extraction attacks on public models from BigML. For each model, we report the number of leaves in the tree, the number of unique identifiers for those leav es, and the maximal tree depth. The chosen granularity ε for continuous features is 10 − 3 . black-box access locally , by discarding any extra infor- mation returned by the local API. Specifically , we make use of the following fields in query responses: • Prediction. This entry contains the predicted class la- bel (classification) or real-valued output (re gression). • Confidence. For classification and regression trees, BigML computes confidence scores based on a confi- dence interval for predictions at each node [11]. The prediction and confidence v alue constitute a node’ s id . • Fields. Responses to black-box queries contain a ‘fields’ property , that lists all features that appear ei- ther in the input query or on the path traversed in the tree. If a partial query x reaches an internal node v , this entry tells us which feature v splits on (the feature is in the ‘fields’ entry , b ut not in the input x ). W e make use of this property for the top-down attack v ariant. T able 6 displays the results of our attacks. For each tree, we give its number of leaves, the number of unique leaf id s, and the tree depth. W e display the success rate for Algorithm 1 and for the “top-do wn” variant with incomplete queries. Querying partial inputs vastly im- prov es our attack: we require far less queries (except for the Steak Survey model, where Algorithm 1 only visits a fraction of all leaves and thus achieves low success) and achiev e higher accuracy for trees with duplicate leaf id s. As expected, both attacks achie ve perfect extraction when all leav es have unique id s. While this is not al- ways the case for classification trees, it is far more likely for regression trees, where both the label and confidence score take real v alues. Surprisingly maybe, the top-down approach also fully extracts some trees with a large num- ber of duplicate leaf id s. The attacks are also efficient: The top-do wn approach takes less than 10 seconds to ex- tract a tree, and Algorithm 1 takes less than 6 minutes for the largest tree. For online attacks on ML services, discussed next, this cost is trumped by the delay for the inherently adaptiv e prediction queries that are issued. 5 Online Model Extraction Attacks In this section, we showcase online model extraction at- tacks against two ML services: BigML and Amazon. For BigML, we focus on extracting models set up by a user, Model OHE Binning Queries Time (s) Price ( $ ) Circles - Y es 278 28 0 . 03 Digits - No 650 70 0 . 07 Iris - Y es 644 68 0 . 07 Adult Y es Y es 1 , 485 149 0 . 15 T able 7: Results of model extraction attacks on Amazon. OHE stands for one-hot-encoding. The reported query count is the number used to find quantile bins (at a granularity of 10 − 3 ), plus those queries used for equation-solving. Amazon char ges $0 . 0001 per prediction [1]. who wishes to charge for predictions. For Amazon, our goal is to extract a model trained by ourselves, to which we only get black-box access. Our attacks only use ex- posed APIs, and do not in any w ay attempt to bypass the services’ authentication or access-control mechanisms. W e only attack models trained in our o wn accounts. 5.1 Case Study 1: BigML BigML currently only allows monetization of decision trees [11]. W e train a tree on the German Cr edit data, and set it up as a black-box model. The tree has 26 leav es, two of which share the same label and confidence score. From another account, we extract the model us- ing the two attacks from Section 4.2. W e first find the tree’ s number of features, their type and their range, from BigML ’ s public gallery . Our attacks (Algorithm 1 and the top-down variant) extract an exact description of the tree’ s paths, using respectiv ely 1 , 722 and 1 , 150 queries. Both attacks’ duration (1 , 030 seconds and 631 sec- onds) is dominated by query latency ( ≈ 500ms / query). The monetary cost of the attack depends on the per - prediction-fee set by the model owner . In any case, a user who wishes to make more than 1 , 150 predictions has economic incentiv es to run an extraction attack. 5.2 Case Study 2: Amazon W eb Services Amazon uses logistic regression for classification, and provides black-box-only access to trained models [1]. By default, Amazon uses tw o feature extraction tech- niques: (1) Categorical features are one-hot-encoded , i.e., the input space M i = Z k is mapped to k binary fea- tures encoding the input value. (2) Quantile binning is used for numeric features. The training data values are split into k -quantiles ( k equally-sized bins), and the input space M i = [ a , b ] is mapped to k binary features encod- ing the bin that a value falls into. Note that | X | > | M | , i.e., ex increases the number of features. If A reverse- engineers ex , she can query the service on samples M in input space, compute x = ex ( M ) locally , and e xtract f in feature-space using equation-solving. W e apply this approach to models trained by Amazon. Our results are summarized in T able 7. W e first train a model with no categorical features, and quantile binning disabled (this is a manually tunable parameter), over the Digits data set. The attack is then identical to the one considered in Section 4.1.2: using 650 queries to Ama- zon, we extract a model that achie ves R test = R unif = 0. W e now consider models with feature extraction en- abled. W e assume that A knows the input space M , but not the training data distribution. For one-hot-encoding, knowledge of M suffices to apply the same encoding lo- cally . For quantile binning howe ver , applying ex locally requires knowledge of the training data quantiles. T o rev erse-engineer the binning transformation, we use line- searches similar to those we used for decision trees: For each numeric feature, we search the feature’ s range in in- put space for thresholds (up to a granularity ε ) where f ’ s output changes. This indicates our value landed in an ad- jacent bin, with a dif ferent learned regression coef ficient. Note that learning the bin boundaries may be interesting in its own right, as it leaks information about the train- ing data distribution. Having found the bin boundaries, we can apply both one-hot-encoding and binning locally , and extract f o ver its feature space. As we are restricted to queries over M , we cannot define an arbitrary sys- tem of equations over X . Building a well-determined and consistent system can be dif ficult, as the encoding ex generates sparse inputs o ver X . Howe ver , Amazon f acil- itates this process with the way it handles queries with missing features : if a feature is omitted from a query , all corresponding features in X are set to 0. For a lin- ear model for instance, we can trivially re-construct the model by issuing queries with a single feature specified, such as to obtain equations with a single unknown in X . W e trained models for the Circles, Iris and Adult data sets, with Amazon’ s default feature-extraction settings. T able 7 shows the results of our attacks, for the rev erse- engineering of ex and e xtraction of f . For binary models (Circles and Adult), we use d + 1 queries to solve a linear equation-system over X . For models with c > 2 classes, we use c · ( d + 1 ) queries. In all cases, the extracted model matches f on 100% of tested inputs. T o optimize the query complexity , the queries we use to find quantile bins are re-used for equation-solving. As line searches require adaptive queries, we do not use batch predictions. Howe ver , e ven for the Digits model, we resorted to using real-time predictions, because of the service’ s significant ov erhead in ev aluating batches. For attacks that require a large number of non-adapti ve queries, we expect batch predictions to be faster than real-time predictions. 5.3 Discussion Additional feature extractors. In some ML services we considered, users may enable further feature extrac- tors. A common transformation is feature scaling or nor- malization. If A has access to training data statistics (as pro vided by BigML for instance), applying the trans- formation locally is trivial. More generally , for models with a linear input layer (i.e., logistic regressions, linear SVMs, MLPs) the scaling or normalization can be seen as being applied to the learned weights, rather than the input features. W e can thus view the composition f ◦ ex as a model f 0 that operates over the ‘un-scaled’ input space M and extract f 0 directly using equation-solving. Further extractors include text analysis (e.g., bag-of- words or n-gram models) and Cartesian products (group- ing many features into one). W e hav e not analyzed these in this work, b ut we belie ve that they could also be easily rev erse-engineered, especially given some training data statistics and the ability to make incomplete queries. Learning unknown model classes or hyper -parame- ters. For our online attacks, we obtained information about the model class of f , the enabled feature extrac- tion ex , and other hyper-parameters, directly from the ML service or its documentation. More generally , if A does not have full certainty about certain model charac- teristics, it may be able to narro w do wn a guess to a small range. Model hyper-parameters for instance (such as the free parameter of an RBF kernel) are typically chosen through cross-validation o ver a default range of v alues. Giv en a set of attack strategies with varying assump- tions, A can use a generic e xtract-and-test approach: each attack is applied in turn, and e valuated by comput- ing R test or R unif ov er a chosen set of points. The adver- sary succeeds if any of the strategies achiev es a lo w error . Note that A needs to interact with the model f only once, to obtain responses for a chosen set of extraction samples and test samples, that can be re-used for each strategy . Our attacks on Amazon’ s service followed this ap- proach: W e first formulated guesses for model charac- teristics left unspecified by the documentation (e.g., we found no mention of one-hot-encoding, or of how miss- ing inputs are handled). W e then ev aluated our assump- tions with successi ve extraction attempts. Our results in- dicate that Amazon uses softmax regression and does not create binary predictors for missing v alues. Interestingly , BigML takes the ’opposite’ approach (i.e., BigML uses OvR re gression and adds predictors for missing v alues). 6 Extraction Given Class Labels Only The successful attacks giv en in Sections 4 and 5 show the danger of re vealing confidence v alues. While current ML services have been designed to rev eal rich informa- tion, our attacks may suggest that returning only labels would be safer . Here we explore model extraction in a setting with no confidence scores. W e will discuss fur- ther countermeasures in Section 7. W e primarily focus on settings where A can make dir ect queries to an API, i.e., queries for arbitrary inputs x ∈ X . W e briefly discuss indir ect queries in the context of linear classifiers. The Lowd-Meek attack. W e start with the prior work of Lowd and Meek [36]. The y present an attack on any linear classifier , assuming black-box oracle access with membership queries that return just the predicted class label. A linear classifier is defined by a vector w ∈ R d and a constant β ∈ R , and classifies an instance x as pos- itive if w · x + β > 0 and ne gative otherwise. SVMs with linear kernels and binary LRs are examples of linear clas- sifiers. Their attack uses line searches to find points ar- bitrarily close to f ’ s decision boundary (points for which w · x + β ≈ 0), and extracts w and β from these samples. This attack only works for linear binary models. W e describe a straightforward extension to some non-linear models, such as polynomial kernel SVMs. Extracting a polynomial kernel SVM can be reduced to extracting a linear SVM in the transformed feature space. Indeed, for any kernel K poly ( x , x 0 )=( x T · x 0 + 1 ) d , we can deri ve a projection function φ ( · ) , so that K poly ( x , x 0 )= φ ( x ) T · φ ( x 0 ) . This transforms the kernel SVM into a linear one, since the decision boundary now becomes w F · φ ( x ) + β = 0 where w F = ∑ t i = 1 α i φ ( x i ) . W e can use the Lowd- Meek attack to extract w F and β as long as φ ( x ) and its in verse are feasible to compute; this is unfortunately not the case for the more common RBF kernels. 3 The retraining approach. In addition to e valuating the Lowd-Meek attack against ML APIs, we introduce a number of other approaches based on the broad strat- egy of re-training a model locally , giv en input-output examples. Informally , our hope is that by extracting a model that achiev es low training err or ov er the queried samples, we would ef fecti vely approximate the tar get model’ s decision boundaries. W e consider three re- training strate gies, described belo w . W e apply these to the model classes that we previously extracted using equation-solving attacks, as well as to SVMs. 4 3 W e did explore using approximations of φ , b ut found that the adap- tiv e re-training techniques discussed in this section perform better . 4 W e do not e xpect retraining attacks to work well for decision trees, because of the greedy approach tak en by learning algorithms. W e hav e not ev aluated extraction of trees, giv en class labels only , in this work. 0 25 50 75 100 0 10 − 4 10 − 3 10 − 2 10 − 1 10 0 Budget Factor α A vg. Extraction Error R test 0 25 50 75 100 Budget Factor α R unif Uniform Line-Search Adaptiv e Lowd-Meek Figure 4: A verage error of extracted linear models. Results are for different extraction strategies applied to models trained on all binary data sets from T able 3. The left shows R test and the right shows R unif . (1) Retraining with unif orm queries. This baseline strategy simply consists in sampling m points x i ∈ X uniformly at random, querying the oracle, and training a model ˆ f on these samples. (2) Line-search retraining. This strategy can be seen as a model-agnostic generalization of the Lo wd- Meek attack. It issues m adaptive queries to the oracle using line search techniques, to find samples close to the decision boundaries of f . A model ˆ f is then trained on the m queried samples. (3) Adaptive retraining . This strategy applies tech- niques from acti ve learning [18, 48]. For some number r of rounds and a query budget m , it first queries the oracle on m r uniform points, and trains a model ˆ f . Over a total of r rounds, it then selects m r new points, along the decision boundary of ˆ f (in- tuitiv ely , these are points ˆ f is least certain about), and sends those to the oracle before retraining ˆ f . 6.1 Linear Binary Models W e first explore how well the various approaches work in settings where the Lowd-Meek attack can be applied. W e ev aluate their attack and our three retraining strate- gies for logistic regression models trained ov er the binary data sets shown in T able 3. These models have d + 1 pa- rameters, and we v ary the query budget as α · ( d + 1 ) , for 0 . 5 ≤ α ≤ 100. Figure 4 displays the a verage errors R test and R unif ov er all models, as a function of α . The retraining strategies that search for points near the decision boundary clearly perform better than simple uniform retraining. The adaptive strategy is the most ef- ficient of our three strate gies. For relati vely lo w budgets, it even outperforms the Lo wd-Meek attack. Howe ver , for budgets large enough to run line searches in each dimen- sion, the Lowd-Meek attack is clearly the most ef ficient. For the models we trained, about 2 , 050 queries on av- erage, and 5 , 650 at most, are needed to run the Lowd- Meek attack effecti vely . This is 50 × more queries than what we needed for equation-solving attacks. W ith 827 queries on average, adaptive retraining yields a model ˆ f that matches f on ov er 99% of tested inputs. Thus, even if an ML API only pro vides class labels, ef ficient extrac- tion attacks on linear models remain possible. W e further consider a setting where feature-extraction (specifically one-hot-encoding of categorical features) is applied by the ML service, rather than by the user . A is then limited to indirect queries in input space. Lowd and Meek [36] note that their extraction attack does not work in this setting, as A can not run line searches directly ov er X . In contrast, for the linear models we trained, we ob- served no major dif ference in extraction accurac y for the adaptiv e-retraining strategy , when limited to queries ov er M . W e leav e an in-depth study of model e xtraction with indirect queries, and class labels only , for future work. 6.2 Multiclass LR Models The Lowd-Meek attack is not applicable in multiclass ( c > 2) settings, even when the decision boundary is a combination of linear boundaries (as in multiclass re- gression) [39, 51]. W e thus focus on evaluating the three retraining attacks we introduced, for the type of ML models we expect to find in real-world applications. W e focus on softmax models here, as softmax and one- vs-rest models have identical output behaviors when only class labels are provided: in both cases, the class label for an input x is gi ven by argmax i ( w i · x + β i ) . From an extractor’ s perspective, it is thus irrelev ant whether the target w as trained using a softmax or OvR approach. W e e valuate our attacks on softmax models trained on the multiclass data sets shown in T able 3. W e again vary the query budget as a factor α of the number of model parameters, namely α · c · ( d + 1 ) . Results are displayed in Figure 5. W e observ e that the adapti ve strate gy clearly performs best and that the line-search strategy does not improv e o ver uniform retraining, possibly because the line-searches hav e to be split across multiple decision- boundaries. W e further note that all strategies achiev e lower R test than R unif . It thus appears that for the models we trained, points from the test set are on average ‘far’ from the decision boundaries of f (i.e., the trained mod- els separate the different classes with lar ge margins). For all models, 100 · c · ( d + 1 ) queries resulted in ex- traction accuracy above 99 . 9%. This represents 26 , 000 queries on average, and 65 , 000 at the most (Digits data set). Our equation-solving attacks achiev ed similar or better results with 100 × less queries. Y et, for scenar- ios with high monetary incentiv es (e.g., intrusion detec- tor ev asion), extraction attacks on MLR models may be attractiv e, ev en if APIs only provide class labels. 0 25 50 75 100 10 − 4 10 − 3 10 − 2 10 − 1 10 0 Budget Factor α A vg. Extraction Error R test Uniform Line-Search Adaptiv e 0 25 50 75 100 Budget Factor α R unif Figure 5: A verage error of extracted softmax models. Results are for three retraining strategies applied to models trained on all multiclass data sets from T able 3. The left shows R test and the right shows R unif . 0 25 50 75 100 10 − 3 10 − 2 10 − 1 10 0 Budget Factor α A vg. Extraction Error R test Uniform Line-Search Adaptiv e 0 25 50 75 100 Budget Factor α R unif Figure 6: A verage error of extracted RBF ker nel SVMs Results are for three retraining strate gies applied to models trained on all binary data sets from T able 3. The left shows R test and the right shows R unif . 6.3 Neural Networks W e now turn to attacks on more complex deep neural networks. W e expect these to be harder to retrain than multiclass regressions, as deep networks hav e more pa- rameters and non-linear decision-boundaries. Therefore, we may need to find a large number of points close to a decision boundary in order to extract it accurately . W e ev aluated our attacks on the multiclass models from T able 3. For the tested query budgets, line-search and adaptive retraining gave little benefit over uniform retraining. For a budget of 100 · k , where k is the num- ber of model parameters, we get R test = 99 . 16% and R unif = 98 . 24%, using 108 , 200 queries per model on av- erage. Our attacks might improv e for higher budgets but it is unclear whether they would then pro vide any mone- tary advantage o ver using ML APIs in an honest way . 6.4 RBF K ernel SVMs Another class of nonlinear models that we consider are support-vector machines (SVMs) with radial-basis func- tion (RBF) kernels. A kernel SVM first maps inputs into a higher -dimensional space, and then finds the hyper - plane that maximally separates the two classes. As men- tioned in Section 6, SVMs with polynomial kernels can be extracted using the Lowd-Meek attack in the trans- formed feature space. For RBF kernels, this is not possi- ble because the transformed space has infinite dimension. SVMs do not provide class probability estimates. Our only applicable attack is thus retraining. As for linear models, we vary the query budget as α · ( d + 1 ) , where d is the input dimension. W e further use the extr act-and- test approach from Section 5 to find the v alue of the RBF kernel’ s hyper-par ameter . Results of our attacks are in Figure 6. Again, we see that adaptiv e retraining performs best, even though the decision boundary to extract is non- linear (in input space) here. Kernel SVMs models are ov erall harder to retrain than models with linear decision boundaries. Y et, for our largest budgets (2 , 050 queries on av erage), we do extract models with over 99% accu- racy , which may suf fice in certain adversarial settings. 7 Extraction Countermeasures W e ha ve sho wn in Sections 4 and 5 that adversarial clients can effecti vely extract ML models given access to rich prediction APIs. Given that this undermines the financial models targeted by some ML cloud services, and potentially leaks confidential training data, we be- liev e researchers should seek countermeasures. In Section 6, we analyzed the most obvious defense against our attacks: prediction API minimization. The constraint here is that the resulting API must still be use- ful in (honest) applications. F or example, it is simple to change APIs to not return confidences and not respond to incomplete queries, assuming applications can get by without it. This will prev ent many of our attacks, most notably the ones described in Section 4 as well as the fea- ture discovery techniques used in our Amazon case study (Section 5). Y et, we showed that ev en if we strip an API to only provide class labels, successful attacks remain possible (Section 6), albeit at a much higher query cost. W e discuss further potential countermeasures belo w . Rounding confidences. Applications might need con- fidences, but only at lower granularity . A possible de- fense is to round confidence scores to some fixed preci- sion [23]. W e note that ML APIs already work with some finite precision when answering queries. For instance, BigML reports confidences with 5 decimal places, and Amazon provides v alues with 16 significant digits. T o understand the ef fects of limiting precision further , we re-ev aluate equation-solving and decision tree path- finding attacks with confidence scores rounded to a fix ed decimal place. For equation-solving attacks, rounding the class probabilities means that the solution to the ob- tained equation-system might not be the target f , b ut some truncated version of it. For decision trees, round- ing confidence scores increases the chance of node id collisions, and thus decreases our attacks’ success rate. 0 20 40 60 80 100 0 10 − 4 10 − 3 10 − 2 10 − 1 Budget Factor α R test Labels only 4 decimals 2 decimals 5 decimals 3 decimals No rounding Figure 7: Effect of rounding on model extraction. Shows the a v- erage test error of equation-solving attacks on softmax models trained on the benchmark suite (T able 3), as we vary the number of significant digits in reported class probabilities. Extraction with no rounding and with class labels only (adaptiv e retraining) are added for comparison. Figure 7 shows the results of experiments on softmax models, with class probabilities rounded to 2–5 decimals. W e plot only R test , the results for R unif being similar . W e observe that class probabilities rounded to 4 or 5 deci- mal places (as done already in BigML) ha ve no ef fect on the attack’ s success. When rounding further to 3 and 2 decimal places, the attack is weakened, but still vastly outperforms adaptiv e retraining using class labels only . For regression trees, rounding has no effect on our at- tacks. Indeed, for the models we considered, the output itself is unique in each leaf (we could also round out- puts, but the impact on utility may be more critical). For classification trees, we re-e valuated our top-down attack, with confidence scores rounded to fewer than 5 decimal places. The attacks on the ‘IRS T ax P atterns’ and ‘Email Importance’ models are the most resilient, and suf fer no success degradation before scores are rounded to 2 deci- mal places. For the other models, rounding confidences to 3 or 4 decimal places sev erely undermines our attack. Differential privacy . Differential priv acy (DP) [22] and its v ariants [34] have been explored as mechanisms for protecting, in particular , the priv acy of ML train- ing data [55]. DP learning has been applied to regres- sions [17, 57], SVMs [45], decision trees [31] and neural networks [49]. As some of our extraction attacks leak training data information (Section 4.1.3), one may ask whether DP can pre vent extraction, or at least reduce the sev erity of the priv acy violations that extraction enables. Consider na ¨ ıve application of DP to protect indi vidual training data elements. This should, in theory , decrease the ability of an adversary A to learn information about training set elements, when giv en access to prediction queries. One would not expect, howe ver , that this pre- vents model extraction, as DP is not defined to do so: consider a trivially useless learning algorithm for binary logistic regression, that discards the training data and sets w and β to 0. This algorithm is differentially priv ate, yet w and β can easily be recovered using equation-solving. A more appropriate strategy would be to apply DP di- rectly to the model parameters, which would amount to saying that a query should not allo w A to distinguish be- tween closely neighboring model parameters. How ex- actly this would work and what priv acy budgets would be required is left as an open question by our work. Ensemble methods. Ensemble methods such as ran- dom forests return as prediction an aggregation of pre- dictions by a number of individual models. While we hav e not experimented with ensemble methods as targets, we suspect that they may be more resilient to extraction attacks, in the sense that attackers will only be able to ob- tain relatively coarse approximations of the target func- tion. Ne vertheless, ensemble methods may still be vul- nerable to other attacks such as model ev asion [56]. 8 Related W ork Our work is related to the e xtensiv e literature on learning theory , such as P A C learning [54] and its variants [3, 8]. Indeed, e xtraction can be vie wed as a type of learning, in which an unknown instance of a kno wn hypothesis class (model type) is providing labels (without error). This is often called learning with membership queries [3]. Our setting differs from these in two ways. The first is con- ceptual: in P AC learning one builds algorithms to learn a concept — the terminology belies the motiv ation of for- malizing learning from data. In model extraction, an at- tacker is literally gi ven a function oracle that it seeks to illicitly determine. The second difference is more prag- matic: prediction APIs re veal richer information than as- sumed in prior learning theory work, and we exploit that. Algorithms for learning with membership queries hav e been proposed for Boolean functions [7, 15, 30, 33] and various binary classifiers [36, 39, 51]. The latter line of work, initiated by Lo wd and Meek [36], studies strate- gies for model ev asion, in the context of spam or fraud detectors [9, 29, 36, 37, 56]. Intuitiv ely , model extraction seems harder than ev asion, and this is corroborated by results from theory [36, 39, 51] and practice [36, 56]. Evasion attacks fall into the lar ger field of adversarial machine learning , that studies machine learning in gen- eral adv ersarial settings [6, 29]. In that context, a number of authors have considered strategies and defenses for poisoning attacks, that consist in injecting maliciously crafted samples into a model’ s train or test data, so as to decrease the learned model’ s accurac y [10, 21, 32, 40, 46]. In a concurrent work, Papernot et al. [42] make use of improper model extraction techniques in the context of model ev asion attacks against deep neural networks. They first extract a “substitute” network using data- augmentation techniques similar in spirit to our adaptive retraining strategy (Section 6). This substitute is then in turn used to craft adversarial samples likely to fool the target black-box model. For their attacks to remain tractable despite the complexity of the targeted models, they aim for lo wer e xtraction accuracy ( R test ≈ 80%), yet show that this suffices for adversarial samples to “trans- fer” from one model to the other with high probability . In a non-adversarial setting, improper model extrac- tion techniques hav e been applied for interpreting [2, 19, 53] and compressing [16, 27] complex neural netw orks. 9 Conclusion W e demonstrated how the flexible prediction APIs ex- posed by current ML-as-a-service providers enable new model extraction attacks that could subvert model mon- etization, violate training-data priv acy , and facilitate model ev asion. Through local experiments and online attacks on two major providers, BigML and Amazon, we illustrated the efficiency and broad applicability of attacks that exploit common API features, such as the av ailability of confidence scores or the ability to query arbitrary partial inputs. W e presented a generic equation- solving attack for models with a logistic output layer and a nov el path-finding algorithm for decision trees. W e further explored potential countermeasures to these attacks, the most obvious being a restriction on the information provided by ML APIs. Building upon prior work from learning-theory , we showed ho w an attacker that only obtains class labels for adaptiv ely chosen in- puts, may launch less effecti ve, yet potentially harmful, r etraining attacks . Evaluating these attacks, as well as more refined countermeasures, on production-grade ML services is an interesting av enue for future work. Acknowledgments. W e thank Mart ´ ın Abadi and the anonymous revie wers for their comments. This work was supported by NSF grants 1330599, 1330308, and 1546033, as well as a generous gift from Microsoft. References [1] A M A Z ON W E B S E RV I CE S . https://aws . amazon . com/ machine- learning . Accessed Feb. 10, 2016. [2] A N D R EW S , R . , D I E D E R I C H , J . , A N D T I C K LE , A . Surve y and critique of techniques for extracting rules from trained artificial neural networks. KBS 8 , 6 (1995), 373–389. [3] A N G L UI N , D . Queries and concept learning. Machine learning 2 , 4 (1988), 319–342. [4] A T E N I E S E , G . , M A N C IN I , L . V . , S P O G NA R D I , A . , V I L L A N I , A . , V I T A L I , D . , A ND F E L I C I , G . Hacking smart machines with smarter ones: How to extract meaningful data from machine learning classifiers. IJSN 10 , 3 (2015), 137–150. [5] A T & T L A B O R A T O R IE S C A M B R I D GE . The ORL database of faces. http://www . cl . cam . ac . uk/research/dtg/ attarchive/facedatabase . html . [6] B A R R E N O , M . , N E L S ON , B . , S E AR S , R . , J O S E P H , A . D . , A N D T Y G A R , J . D . Can machine learning be secure? In ASIACCS (2006), A CM, pp. 16–25. [7] B E L L AR E , M . A technique for upper bounding the spectral norm with applications to learning. In COLT (1992), ACM, pp. 62–70. [8] B E N E DE K , G . M . , A N D I TA I , A . Learnability with respect to fixed distrib utions. TCS 86 , 2 (1991), 377–389. [9] B I G G IO , B . , C O R ON A , I . , M A I O R C A , D . , N E L SO N , B . , ˇ S R N D I ´ C , N . , L A S KOV , P . , G I AC I N T O , G . , A N D R O L I , F. Eva- sion attacks against machine learning at test time. In ECML PKDD . Springer , 2013, pp. 387–402. [10] B I G G IO , B . , N E L S O N , B . , A N D L A S KOV , P . Poisoning attacks against support vector machines. In ICML (2012). [11] B I G M L. https://www . bigml . com . Accessed Feb. 10, 2016. [12] B L U M , A . L . , A N D L A N GL E Y , P . Selection of rele vant features and examples in machine learning. Artificial intelligence 97 , 1 (1997), 245–271. [13] B L U M ER , A . , E H RE N F E U C H T , A . , H AU S SL E R , D . , A N D W A R - M U T H , M . K . Occam’ s razor . Readings in machine learning (1990), 201–204. [14] B O S E R , B . E . , G U Y ON , I . M . , A N D V A P N I K , V . N . A training algorithm for optimal mar gin classifiers. In COLT (1992), A CM, pp. 144–152. [15] B S H O UT Y , N . H . Exact learning boolean functions via the mono- tone theory . Inform. Comp. 123 , 1 (1995), 146–153. [16] B U CI L U ˇ A , C . , C A RUA N A , R. , A N D N I C U L E S C U -M I Z I L , A . Model compression. In KDD (2006), A CM, pp. 535–541. [17] C H AU D H UR I , K . , A N D M O N T E L E O N I , C . Privac y-preserving logistic regression. In NIPS (2009), pp. 289–296. [18] C O H N , D . , A TL A S , L . , A N D L A D N E R , R . Improving gener- alization with active learning. Machine learning 15 , 2 (1994), 201–221. [19] C R A V E N , M . W . , A N D S H A V L I K , J . W . Extracting tree- structured representations of trained networks. In NIPS (1996). [20] C Y B E NK O , G . Approximation by superpositions of a sigmoidal function. MCSS 2 , 4 (1989), 303–314. [21] D A LVI , N . , D O M I N G O S , P . , S AN G H A I , S . , V E R MA , D . , E T A L . Adversarial classification. In KDD (2004), ACM, pp. 99–108. [22] D W O R K , C . Differential pri vacy . In ICALP (2006), Springer . [23] F R E D RI K S O N , M . , J H A , S . , A N D R I S TE N PA RT , T. Model inv er- sion attacks that exploit confidence information and basic coun- termeasures. In CCS (2015), ACM, pp. 1322–1333. [24] F R E D RI K S O N , M . , L A N T Z , E . , J H A , S . , L I N , S . , P AG E , D . , A N D R I S T E N PART , T. Priv acy in pharmacogenetics: An end- to-end case study of personalized Warfarin dosing. In USENIX Security (2014), pp. 17–32. [25] G O O G LE P R E D I C TI O N A P I. https://cloud . google . com/ prediction . Accessed Feb . 10, 2016. [26] H I C K EY , W. Ho w Americans Like their Steak. http://fivethirtyeight . com/datalab/how- americans- like- their- steak , 2014. Accessed Feb. 10, 2016. [27] H I N TO N , G . , V I N Y A L S , O . , A N D D E A N , J . Distilling the kno wl- edge in a neural network. arXiv:1503.02531 (2015). [28] H O R N IK , K . , S T I N C H CO M B E , M . , A N D W H I T E , H . Multilayer feedforward networks are uni versal approximators. Neural net- works 2 , 5 (1989), 359–366. [29] H UA N G , L . , J O S E P H , A . D . , N E L S O N , B . , R U B I N S T E I N , B . I . , A N D T Y G A R , J . Adversarial machine learning. In AISec (2011), A CM, pp. 43–58. [30] J AC K S O N , J . An ef ficient membership-query algorithm for learn- ing DNF with respect to the uniform distribution. In FOCS (1994), IEEE, pp. 42–53. [31] J AG A N NATH A N , G . , P I L L A IPA K K A M NA T T , K . , A N D W R I G H T , R . N . A practical differentially priv ate random decision tree clas- sifier . In ICDMW (2009), IEEE, pp. 114–121. [32] K L O F T , M . , A N D L A S KOV , P . Online anomaly detection under adversarial impact. In AISTA TS (2010), pp. 405–412. [33] K U S H IL E V I T Z , E . , A N D M A N S O UR , Y . Learning decision trees using the Fourier spectrum. SICOMP 22 , 6 (1993), 1331–1348. [34] L I , N . , Q A RD A J I , W . , S U , D . , W U , Y . , A N D Y A N G , W . Mem- bership pri vac y: A unifying framework for pri vacy definitions. In CCS (2013), A CM. [35] L I C H MA N , M . UCI machine learning repository , 2013. [36] L OW D , D ., A N D M E E K , C. Adversarial learning. In KDD (2005), A CM, pp. 641–647. [37] L OW D , D . , A N D M E E K , C . Good word attacks on statistical spam filters. In CEAS (2005). [38] M I C RO S O F T A Z U R E . https://azure . microsoft . com/ services/machine- learning . Accessed Feb. 10, 2016. [39] N E L S ON , B . , R U B I N S TE I N , B . I . , H UA N G , L . , J O S E P H , A . D . , L E E , S . J . , R AO , S . , A N D T Y G A R , J . Query strategies for ev ad- ing con vex-inducing classifiers. JMLR 13 , 1 (2012), 1293–1332. [40] N E W S OM E , J . , K A R P , B . , A N D S O N G , D . Paragraph: Thwart- ing signature learning by training maliciously . In RAID (2006), Springer , pp. 81–105. [41] N O C E DA L , J . , A N D W R I G H T , S . Numerical optimization . Springer Science & Business Media, 2006. [42] P A P E R N OT , N . , M C D A N I E L , P . , G O O D F E LL O W , I . , J H A , S . , C E L I K , Z . B . , A ND S W A M I , A . Practical black-box attacks against deep learning systems using adversarial examples. arXiv pr eprint arXiv:1602.02697 (2016). [43] P E D R EG O S A , F. , V A R OQ U AU X , G . , G R A M F OR T , A . , M I C H EL , V . , T H I R I O N , B . , G R I S EL , O . , B L O N DE L , M . , P R E T T E N - H O F E R , P . , W E I S S , R . , D U B OU R G , V . , V A N D E R P L A S , J . , P A S - S O S , A . , C O U R NA P E AU , D ., B RU C H E R , M . , P E R RO T , M . , A N D D U C H E S NAY , E . Scikit-learn: Machine learning in Python. JMLR 12 (2011), 2825–2830. [44] P R E D IC T I O N I O. http://prediction . io . Accessed Feb. 10, 2016. [45] R U B I N S T E IN , B . I . , B ART L E T T , P . L . , H UA N G , L . , A N D T A F T , N . Learning in a large function space: Priv acy-preserving mech- anisms for SVM learning. JPC 4 , 1 (2012), 4. [46] R U B I N S T E IN , B . I . , N E L S O N , B . , H UA N G , L ., J O S E P H , A . D . , L AU , S . - H . , R AO , S . , T A FT , N . , A N D T Y G A R , J . Antidote: understanding and defending against poisoning of anomaly de- tectors. In IMC (2009), ACM, pp. 1–14. [47] S A A R -T S E C H A N S K Y , M . , A N D P R OVO S T , F. Handling missing values when applying classification models. JMLR (2007). [48] S E T T LE S , B . Active learning literature survey . University of W isconsin, Madison 52 , 55-66 (1995), 11. [49] S H O K RI , R . , A ND S H M A T I K OV , V . Priv acy-preserving deep learning. In CCS (2015), ACM, pp. 1310–1321. [50] S M I T H , T. W. , M A R S D EN , P . , H O U T , M . , A N D K I M , J . General social surveys, 1972-2012, 2013. [51] S T E V EN S , D . , A ND L OW D , D . On the hardness of ev ading com- binations of linear classifiers. In AISec (2013), A CM, pp. 77–86. [52] T H E A NO D E V E L O P ME N T T E A M . Theano: A Python framew ork for fast computation of mathematical expressions. arXiv:1605.02688 (2016). [53] T OW E L L , G . G . , A N D S H A V L I K , J . W. Extracting refined rules from knowledge-based neural networks. Machine learning 13 , 1 (1993), 71–101. [54] V A L I A N T , L . G . A theory of the learnable. Communications of the ACM 27 , 11 (1984), 1134–1142. [55] V I N T ER B O , S . Differentially priv ate projected histograms: Con- struction and use for prediction. In ECML-PKDD (2012). [56] ˇ S R N D I ´ C , N . , A N D L A S K OV , P . Practical ev asion of a learning- based classifier: A case study . In Security and Privacy (SP) (2014), IEEE, pp. 197–211. [57] Z H A N G , J . , Z H A N G , Z . , X I AO , X . , Y AN G , Y . , A N D W I N S LE T T , M . Functional mechanism: regression analysis under differenti al priv acy . In VLDB (2012). [58] Z H U , J . , A N D H A S T I E , T. Kernel logistic regression and the import vector machine. In NIPS (2001), pp. 1081–1088. A Some Details on Models SVMs. Support vector machines (SVMs) perform bi- nary classification ( c = 2) by defining a maximally sep- arating hyperplane in d -dimensional feature space. A linear SVM is a function f ( x ) = sign ( w · x + β ) where ‘sign’ outputs 0 for all negati ve inputs and 1 otherwise. Linear SVMs are not suitable for non-linearly separable data. Here one uses instead kernel techniques [14]. A kernel is a function K : X × X → R . T ypical k ernels include the quadratic kernel K quad ( x , x 0 ) = ( x T · x 0 + 1 ) 2 and the Gaussian radial basis function (RBF) kernel K rbf ( x , x 0 ) = e − γ || x − x 0 || 2 , parameterized by a value γ ∈ R . A kernel’ s projection function is a map φ defined by K ( x , x 0 ) = φ ( x ) · φ ( x 0 ) . W e do not use φ explicitly , in- deed for RBF kernels this produces an infinite-dimension vector . Instead, classification is defined using a “ker - nel trick”: f ( x ) = sign ([ ∑ t i = 1 α i K ( x , x i )] + β ) where β is again a learned threshold, α 1 , . . . , α t are learned weights, and x 1 , . . . , x t are feature v ectors of inputs from a training set. The x i for which α i 6 = 0 are called support vectors. Note that for non-zero α i , it is the case that α i < 0 if the training-set label of x i was zero and α i > 0 otherwise. Logistic regression. SVMs do not directly generalize to multiclass settings c > 2, nor do they output class prob- abilities. Logistic regression (LR) is a popular classi- fier that does. A binary LR model is defined as f 1 ( x ) = σ ( w · x + β ) = 1 / ( 1 + e − ( w · x + β ) ) and f 0 ( x ) = 1 − f 1 ( x ) . A class label is chosen as 1 iff f 1 ( x ) > 0 . 5. When c > 2, one fixes c weight vectors w 0 , . . . , w c − 1 each in R d , thresholds β 0 , . . . , β c − 1 in R and defines f i ( x ) = e w i · x + β i / ( ∑ c − 1 j = 0 e w j · x + β j ) for i ∈ Z c . The class la- bel is taken to be argmax i f i ( x ) . Multiclass regression is referred to as multinomial or softmax regression. An al- ternativ e approach to softmax regression is to build a bi- nary model σ ( w i · x + β i ) per class in a one-vs-r est fash- ion and then set f i ( x ) = σ ( w i · x + β i ) / ∑ j σ ( w j · x + β j ) . These are log-linear models, and may not be suit- able for data that is not linearly separable in X . Again, one may use kernel techniques to deal with more com- plex data relationships (c.f., [58]). Then, one replaces w i · x + β i with ∑ t r = 1 α i , r K ( x , x r ) + β i . As written, this uses the entire set of training data points x 1 , . . . , x t as so- called representors (here analogous to support vectors). Unlike with SVMs, where most training data set points will nev er end up as support vectors, here all training set points are potentially representors. In practice one uses a size s < t random subset of training data [58]. Deep neural networks. A popular way of extending softmax regression to handle data that is non linearly sep- arable in X is to first apply one or more non-linear trans- formations to the input data. The goal of these hidden layers is to map the input data into a (typically) lower - dimensional space in which the classes are separable by the softmax layer . W e focus here on fully connected net- works, also kno wn as multilayer perceptrons, with a sin- gle hidden layer . The hidden layer consists of a num- ber h of hidden nodes , with associated weight vectors w ( 1 ) 0 , . . . , w ( 1 ) h − 1 in R d and thresholds β ( 1 ) 0 , . . . , β ( 1 ) h − 1 in R . The i -th hidden unit applies a non linear transformation h i ( x ) = g ( w ( 1 ) i · x + β ( 1 ) i ) , where g is an activ ation func- tion such as tanh or σ . The vector h ( x ) ∈ R h is then input into a softmax output layer with weight v ectors w ( 2 ) 0 , . . . , w ( 2 ) c − 1 in R h and thresholds β ( 2 ) 0 , . . . , β ( 2 ) c − 1 in R . Decision tr ees. A decision tree T is a labeled tree. Each internal node v is labeled by a feature inde x i ∈ { 1 , . . . , d } and a splitting function ρ : X i → Z k v , where k v ≥ 2 de- notes the number of outgoing edges of v . On an input x = ( x 1 , x 2 , . . . , x d ) , a tree T defines a com- putation as follows, starting at the root. When we reach a node v , labeled by { i , ρ } , we proceed to the child of v index ed by ρ ( x i ) . W e consider three types of splitting functions ρ that are typically used in practice ([11]): (1) The feature x i is categorical with X i = Z k . Let { S , T } be some partition of Z k . Then k v = 2 and ρ ( x i ) = 0 if x i ∈ S and ρ ( x i ) = 1 if x i ∈ T . This is a binary split on a categorical feature. (2) The feature x i is cate gorical with X i = Z k . W e hav e k v = k and ρ ( x i ) = x i . This corresponds to a k -ary split on a categorical feature of arity k . (3) The feature x i is continuous with X i = [ a , b ] . Let a < t < b be a thr eshold . Then k v = 2 and ρ ( x i ) = 0 if x i ≤ t and ρ ( x i ) = 1 if x i > t . This is a binary split on a continuous feature with threshold t . When we reach a leaf, we terminate and output that leaf ’ s value. This v alue can be a class label, or a class label and confidence score. This defines a function f : X → Y . B Details on Data Sets Here we giv e some more information about the data sets we used in this work. Refer back to T able 3 and T able 5. Synthetic data sets. W e used 4 synthetic data sets from scikit [43]. The first tw o data sets are classic examples of non-linearly separable data, consisting of two concen- tric Cir cles , or two interleaving Moons . The next two synthetic data sets, Blobs and 5-Class , consist of Gaus- sian clusters of points assigned to either 3 or 5 classes. Public data sets. W e gathered a varied set of data sets representativ e of the type of data we would expect ML service users to use to train logistic and SVM based mod- els. These include famous data sets used for supervised learning, obtained from the UCI ML repository ( Adult , Iris , Br east Cancer , Mushr ooms , Diabetes ). W e also consider the Steak and GSS data sets used in prior work on model in version [23]. Finally , we add a data set of dig- its available in scikit , to visually illustrate training data leakage in k ernelized logistic models (c.f. Section 4.1.3). Public data sets and models from BigML. For e xperi- ments on decision trees, we chose a varied set of models publicly available on BigML ’ s platform. These models were trained by real MLaaS users and they cover a wide range of application scenarios, thus providing a realistic benchmark for the ev aluation of our extraction attacks. The IRS model predicts a US state, based on admin- istrativ e tax records. The Steak and GSS models re- spectiv ely predict a person’ s preferred steak preparation and happiness lev el, from survey and demographic data. These two models were also considered in [23]. The Email Importance model predicts whether Gmail clas- sifies an email as ‘important’ or not, given message metadata. The Email Spam model classifies emails as spam, given the presence of certain words in its content. The German Credit data set was taken from the UCI li- brary [35] and classifies a user’ s loan risk. Finally , two regression models respectively predict Medical Char ges in the US based on state demographics, and the Bitcoin Market Price from daily opening and closing v alues. C Analysis of the Path-Finding Algorithm In this section, we analyze the correctness and com- plexity of the decision tree extraction algorithm in Algorithm 1. W e assume that all leav es are assigned a unique id by the oracle O , and that no continuous fea- ture is split into intervals of width smaller than ε . W e may use id to refer directly to the leaf with identity id . Correctness. T ermination of the algorithm follows im- mediately from the fact that new queries are only added to Q when a new leaf is visited. As the number of leav es in the tree is bounded, the algorithm must terminate. W e prove by contradiction that all leaves are ev entu- ally visited. Let the depth of a node v , denote the length of the path from v to the root (the root has depth 0). For two lea ves id , id 0 , let A be their deepest common ances- tor ( A is the deepest node appearing on both the paths of id and id 0 ). W e denote the depth of A as ∆ ( id , id 0 ) . Suppose Algorithm 1 terminates without visiting all leav es, and let ( id , id 0 ) be a pair of leaves with maxi- mal ∆ ( id , id 0 ) , such that id was visited but id 0 was not. Let x i be the feature that their deepest common ances- tor A splits on. When id is visited, the algorithm calls LINE SEARCH or CATEGORY SPLIT on feature x i . As all leaf id s are unique and there are no intervals smaller than ε , we will discover a leaf in each sub-tree rooted at A , in- cluding the one that contains id 0 . Thus, we visit a leaf id 00 for which ∆ ( id 00 , id 0 ) > ∆ ( id , id 0 ) , a contradiction. Complexity . Let m denote the number of leaves in the tree. Each leaf is visited exactly once, and for each leaf we check all d features. Suppose continuous features hav e range [ 0 , b ] , and categorical features have arity k . For continuous features, finding one threshold takes at most log 2 ( b ε ) queries. As the total number of splits on one feature is at most m (i.e., all nodes split on the same feature), finding all thresholds uses at most m · log 2 ( b ε ) queries. T esting a categorical feature uses k queries. The total query complexity is O ( m · ( d cat · k + d cont · m · log ( b ε )) , where d cat and d cont represent respectively the number of categorical and continuous features. For the special case of boolean trees, the complexity is O ( m · d ) . In comparison, the algorithm of [33], that uses membership queries only , has a complexity polynomial in d and 2 δ , where δ is the tree depth. For degenerate trees, 2 δ can be exponential in m , implying that the as- sumption of unique leaf identities (obtained from confi- dence scores for instance) provides an exponential speed- up ov er the best-known approach with class labels only . The algorithm from [33] can be extended to regression trees, with a complexity polynomial in the size of the out- put range Y . Again, under the assumption of unique leaf identities (which could be obtained solely from the out- put values) we obtain a much more efficient algorithm, with a complexity independent of the output range. The T op-Down Approach. The correctness and com- plexity of the top-down algorithm from Section 4.2 (which uses incomplete queries), follow from a similar analysis. The main difference is that we assume that all nodes hav e a unique id , rather than only the leaves. D A Note on Improper Extraction T o extract a model f , without knowledge of the model class, a simple strategy is to extract a multilayer percep- tron ˆ f with a large enough hidden layer . Indeed, feed- forward networks with a single hidden layer can, in prin- ciple, closely approximate any continuous function ov er a bounded subset of R d [20, 28]. Howe ver , this strategy intuitively does not appear to be optimal. Even if we know that we can find a multilayer perceptron ˆ f that closely matches f , ˆ f might have a far more complex representation (more parameters) than f . Thus, tailoring the extraction to the ‘simpler’ model class of the tar get f appears more ef ficient. In learning theory , the problem of finding a succinct representation of some target model f is known as Occam Learning [13]. Our experiments indicate that such generic improper extraction indeed appears sub-optimal, in the context of equation-solving attacks. W e train a softmax regression ov er the Adult data set with target “Race”. The model f is defined by 530 real-valued parameters. As shown in Section 4.1.2, using only 530 queries, we extract a model ˆ f from the same model class, that closely matches f ( ˆ f and f predict the same labels on 100% of tested inputs, and produce class probabilities that differ by less than 10 − 7 in TV distance). W e also extracted the same model, assuming a multilayer perceptron target class. Even with 1 , 000 hidden nodes (this model has 111 , 005 parameters), and 10 × more queries (5 , 300), the extracted model ˆ f is a weaker approximation of f (99 . 5% accuracy for class labels and TV distance of 10 − 2 for class probabilities).

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment