In this paper a homomorphic privacy preserving association rule mining algorithm is proposed which can be deployed in resource constrained devices (RCD). Privacy preserved exchange of counts of itemsets among distributed mining sites is a vital part in association rule mining process. Existing cryptography based privacy preserving solutions consume lot of computation due to complex mathematical equations involved. Therefore less computation involved privacy solutions are extremely necessary to deploy mining applications in RCD. In this algorithm, a semi-trusted mixer is used to unify the counts of itemsets encrypted by all mining sites without revealing individual values. The proposed algorithm is built on with a well known communication efficient association rule mining algorithm named count distribution (CD). Security proofs along with performance analysis and comparison show the well acceptability and effectiveness of the proposed algorithm. Efficient and straightforward privacy model and satisfactory performance of the protocol promote itself among one of the initiatives in deploying data mining application in RCD.
Deep Dive into Semi-Trusted Mixer Based Privacy Preserving Distributed Data Mining for Resource Constrained Devices.
In this paper a homomorphic privacy preserving association rule mining algorithm is proposed which can be deployed in resource constrained devices (RCD). Privacy preserved exchange of counts of itemsets among distributed mining sites is a vital part in association rule mining process. Existing cryptography based privacy preserving solutions consume lot of computation due to complex mathematical equations involved. Therefore less computation involved privacy solutions are extremely necessary to deploy mining applications in RCD. In this algorithm, a semi-trusted mixer is used to unify the counts of itemsets encrypted by all mining sites without revealing individual values. The proposed algorithm is built on with a well known communication efficient association rule mining algorithm named count distribution (CD). Security proofs along with performance analysis and comparison show the well acceptability and effectiveness of the proposed algorithm. Efficient and straightforward privacy mod
Data mining sometimes known as data or knowledge discovery is a process of analyzing data from different point of views and to deduce into useful information which can be applied in various applications including advertisement, bioinformatics, database marketing, fraud detection, ecommerce, health care, security, sports, telecommunication, web, weather forecasting, financial forecasting, etc. Association rule mining is one of the data mining techniques which helps discovering underlying correlation among different data items in a certain database. It can deduce some hidden and unpredictable knowledge which may provide high interestingness to the database owners or miners.
Rapid development of information technology, increasing use of advanced devices and development of algorithms have amplified the necessity of privacy preservation in all kind of transactions. It is more important in case of data mining since sharing of information is a primary requirement for the accomplishment of data mining process. As a matter of fact the more the privacy preservation requirement is increased, the less the accuracy the mining process can achieve. Therefore a tradeoff between privacy and accuracy is determined for a particular application.
In this paper we denote Resource Constrained Device (RCD) as any kind of device having limited capability of transmission, computation, storage, battery or any other features. Examples includes but not limited to mobile phones, Personal Digital Assistants (PDAs), sensor devices, smart cards, Radio Frequency Identification (RFID) devices etc. We also interpret lightweight algorithm as a simple algorithm which requires less computation, low communication overhead and less memory and can be deployed in a RCD. Integration of communication devices of various architectures lead to global heterogeneous network which comprises of trusted, semitrusted, untrustworthy, authorized, unauthorized, suspicious, intruders, hackers types of terminals/devices supported by fewer or no dedicated and authorized infrastructure. Sharing data for data mining purposes among such resource constrained ad-hoc environment is a big challenge itself. Preservation of privacy intensifies the problem by another fold. Therefore privacy preserving data mining in RCD envisions facilitating the mining capability to all these tiny devices which may have a major impact in the market of near future.
Data mining capability of RCD would flourish the future era of ubiquitous computing too. Owner of the device would perform mining operation on the fly. Small sensor devices would be able to optimize or extend their operations based on the dynamic circumstance instead of waiting for time consuming decision from the server. Scattered agents of a security department can take instant decision of actions about a crime or a criminal while in duty. To comprehend the necessity of lightweight privacy preserving data mining, let us consider another circumstance: there are many scattered sensor devices located in a geographical location belonging to different authorities which are serving different purposes with some common records about the environment. Now if it is required to mine data among those sensor devices to accomplish a common interest of the authorities in real time, then preserving privacy would be the first issue that must be ensured. Another motivation behind developing our proposed system could be healthcare awareness. Let us assume some community members or some university students want to know about the extent of attack of some infectious diseases such as swine flu, bird flu, AIDS etc. Each individual is very concerned about the privacy since the matter is very sensitive. They are equipped with a mobile phone or similar smart device and want to know the mining result on the fly. In such circumstances, a distributed lightweight privacy preserving data mining technique would provide a perfect solution. In addition to that; relevant people can be warned or prescribed based on all available health information including previously generated knowledge about a particular infectious diseases.
There is not much research work done for lightweight privacy preserving data mining but there is plenty of research on privacy preserving data mining. Essentially two main approaches are adapted for privacy preserving data mining solutions. First one is the randomization which is basically used for centralized data. In this approach data is perturbed using randomization function and submitted for mining. Randomization function is chosen such that the aggregated property of the data can be recognized in the miner side. In [1,2,3] authors have proposed such approaches. One of the major drawbacks of randomization approach is: if the precision of data mining result is increased, the privacy is not fully preserved [4].
Another one is the cryptographic approach in which the data is encrypted before it is being shared. The miner cannot decrypt in
…(Full text truncated)…
This content is AI-processed based on ArXiv data.