Cross-institutional healthcare predictive modeling can accelerate research and facilitate quality improvement initiatives, and thus is important for national healthcare delivery priorities. For example, a model that predicts risk of re-admission for a particular set of patients will be more generalizable if developed with data from multiple institutions. While privacy-protecting methods to build predictive models exist, most are based on a centralized architecture, which presents security and robustness vulnerabilities such as single-point-of-failure (and single-point-of-breach) and accidental or malicious modification of records. In this article, we describe a new framework, ModelChain, to adapt Blockchain technology for privacy-preserving machine learning. Each participating site contributes to model parameter estimation without revealing any patient health information (i.e., only model data, no observation-level data, are exchanged across institutions). We integrate privacy-preserving online machine learning with a private Blockchain network, apply transaction metadata to disseminate partial models, and design a new proof-of-information algorithm to determine the order of the online learning process. We also discuss the benefits and potential issues of applying Blockchain technology to solve the privacy-preserving healthcare predictive modeling task and to increase interoperability between institutions, to support the Nationwide Interoperability Roadmap and national healthcare delivery priorities such as Patient-Centered Outcomes Research (PCOR).
Cross-institution interoperable healthcare predictive modeling can advance research and facilitate quality improvement initiatives, for example, by generating scientific evidence for comparative effectiveness research, 1 accelerating biomedical discoveries, 2 and improving patient-care. 3 For example, a healthcare provider may be able to predict certain outcome even if her institution has few or none related patient records. A predictive model can be "learned" (i.e., its parameters can be estimated) from data originating from the other institutions. However, improper data disclosure could place sensitive personal health information at risk. To protect the privacy of individuals, several algorithms (such as GLORE, 4 EXPLORER, 5 and VERTIGO 6 ) have been proposed to conduct predictive modeling by transfer of partially-trained machine learning models instead of disseminating individual patient-level data. However, these state-of-the-art distributed privacypreserving predictive modeling frameworks are centralized (i.e., require a central server to intermediate the modeling process and aggregate the global model), [4][5][6] as shown in Figure 1(a). Such a client-server architecture carries the following risks:
• Institutional policies. For example, a site may not want to cede control to a single central server. 7 • Single-point-of-failure. 8,9 For example, if the central server is shut down for maintenance, the whole network stops working. Furthermore, if the admin user account of the central server gets compromised, the entire network is also under the risk of being compromised. 7 • Participating sites cannot join/leave the network at any time. 10 If any site joins or leaves the network for a short period of time, the analysis process is disrupted and the server needs to deal with the recovering issue. A new site may not participate in the network without the authentication and reconfiguration on the central server. 8,9 • The data being disseminated and the transfer records are mutable. An attacker could change the partial models without being noticed. 7 The transfer records may also be modified so that no audit trail is available to identify such malicious change of data. 11,12 • The client-server architecture may present consensus/synchronization issues on distributed networks. Specifically, the issue is the combination of two problems: the Byzantine Generals Problem, 13 in which the participating sites need to agree upon the aggregated model under the constraint that each site may fail due to accidental or even malicious ways, 7 and the Sybil Attack Problem, 14 of which the attacker comprises a large fraction of the seemingly independent participants and exerts unfairly disproportionate influence during the process of predictive modeling. 7,15 To address the abovementioned risks, one plausible solution is to adapt the Blockchain technology (in this article, we use “Blockchain” to denote the technology, and “blockchain” to indicate the actual chain of blocks). 7,[9][10][11][12][15][16][17][18][19][20] A Blockchain-based distributed network has the following desirable features that make it suitable to mitigate the risks of centralized privacy-preserving healthcare predictive modeling networks. First, Blockchain is by design a decentralized (i.e., a peer-to-peer, non-intermediated) architecture (Figure 1(b)); the verification of transactions is achieved by majority proof-of-work voting. 17 Each institution can keep full control of their own computational resources. Also, there is no risk of single-point-of-failure. 8,9 Second, each site (including new sites) can join/leave the network freely without imposing overhead on a central server or disrupting the machine learning process. [8][9][10] Finally, the proof-of-work blockchain provides an immutable audit trail. 7,11,12 That is, changing the data or records is very difficult; the attacker needs to redo proof-of-work of the target block and all blocks after it, and then surpass all honest sites. As shown by Satoshi Nakamoto, 17 the inventor of Blockchain and Bitcoin, given that the probability that an honest node finds the next block is larger than the probability that an attacker finds the next block, the probability the attacker will ever catch up drops exponentially as the number of the blocks by which the attacker lags behind increases. This is also the reason why the Blockchain mechanism also solves the relaxed version of Byzantine Generals Problem and the Sybil Attack Problem, 9,15,18,20 as formally proved by Miller et al. Although Blockchain provides the abovementioned security and robustness benefits, a reasonable approach to integrate Blockchain with the privacy-preserving healthcare predictive modeling algorithms is yet to be devised. In this article, we propose ModelChain, a private-Blockchain-based privacy-preserving healthcare predictive modeling framework, to combine these two important technologies.
First, we apply privacy-preserving online ma
This content is AI-processed based on open access ArXiv data.