Logic Mining Using Neural Networks
Knowledge could be gained from experts, specialists in the area of interest, or it can be gained by induction from sets of data. Automatic induction of knowledge from data sets, usually stored in large databases, is called data mining. Data mining me…
Authors: Saratha Sathasivam (USM), Wan Ahmad Tajuddin Wan Abdullah (Univ Malaya)
LOGIC MINING USING NEURAL NETWORKS * W A T Wan Abdullah & + * Saratha Sathasivam * Department of Physics, Universiti Malaya, 50603 Kuala Lumpur, Malaysia. + School of Mathematics, University Science Malaysia, 11800 Pulau Pinang, Malaysia. E - mail: saratha@email.com Abstract Knowledge could be gained from experts, specialists in the area of interest, or it can be gained by induction from sets of data. Automatic induction of knowledge from data sets, usually stored in large databases, is called data mining. Data mining methods are important in the management of complex systems. There are many technologies available to data mining practitioners, including Artificial Neural Networks, Regression, and Dec ision Trees. Neural networks have been successfully applied in wide range of supervised and unsupervised learning applications. Neural network methods are not commonly used for data mining tasks, because they often produce incomprehensible models, and requ ire long training times. One way in which the collective properties of a neural network may be used to implement a computational task is by way of the concept of energy minimization. The Hopfield network is well - known example of such an approach. The Hopfi eld network is useful as content addressable memory or an analog computer for solving combinatorial - type optimization problems. Wan Abdullah [1] proposed a method of doing logic programming on a Hopfield neural network. Optimization of logical inconsistenc y is carried out by the network after the connection strengths are defined from the logic program; the network relaxes to neural states corresponding to a valid interpretation. In this article, we describe how Hopfield network is able to induce logical ru les from large database by using reverse analysis method: given the values of the connections of a network, we can hope to know what logical rules are entrenched in the database. Key - words: Hopfield, Logic Programming, data mining, neural network 1.0 In troduction The main focus of the data mining task is to gain insight into large collections of data. Often achieving this goal involves applying machine - learning methods to inductively construct models of the data at hand. Although neural network learning algorithms have been successfully applied in wide range of supervised and unsupervised learning applications, they have not often been applied in data mining settings, in which two fundamental considerations are the comprehensibility and speed issues whic h often are of prime importance in the data mining community. Data mining is not merely automatic collecting of knowledge . Human - computer collaboration knowledge discovery is the interactive process between data miner and and computer. The aim is to ext ract novel, plausible, relevant and interesting knowledge from the database. We do not provide an introduction to data mining techniques in this paper, but instead refer the interested reader to one of the good book in the field [ 2 ]. Logic programming ca n be treated as a problem in combinatorial optimization. Therefore it can be carried out in a neural network to obtain the desired solution. Our objective is to find a set of interpretation (i.e., truth value assignments) for the atoms in the clauses which satisfy the clauses (which yields all the clauses true). We extended the work related to logic programming in neural network by introducing reverse analysis method. This method is capable to induce logical rules entrenched in a database. The knowledge obt ained from the logical rules can be used to unearth relationship in data that may provide useful insights. The rest of the paper organized as follows. In the next section we consider some theory of the Proceedings of the International Conference on Intelligent Systems 2005 (ICIS 2005) Kuala Lumpur, 1 – 3 December 2005 Little - Hopfield Model. Section 3 discusses about log ic programming focusing on Horn clauses. Follow by Section 4, where the logic of Hebbian learning will be discussed. Section 5 describes method for extracting rules from database: Reverse analysis method. Finally, Section 6 provides discussion and conclus ion. 2.0 Little - Hopfield Model In order to keep this paper self - contained we briefly review the Little Hopfield Model [3]. The Hopfield model [4] is a standard model for associative memory. The Little dynamics is asynchronous, with each neuron update their state deterministically. It consists of N formal neurons, each of which is described by an Ising variable ) ,.... 2 , 1 )( ( N i t S i = . Neurons can be modeled as being binary, i V } 1 , 0 { ∈ , obeying the dynamics ) ( i i h V θ → where ) 1 ( ) 2 ( i j j ij i T V T h + = ∑ , i and j running over all neurons N , ) 2 ( ij T is the synaptic strength from j to neuron i, i T − is the threshold of neuron I , and θ is the step function. Alternatively, they can be taken to be bipolar, { ∈ i S - 1,1 }with ) sgn( i i h S → where i V is replaced by i S in i h . In the following, we write mainly expression for the binary case; the corresponding ones for the bipolar case can deduced accordingly. Restricting the connections to be symmetric and zero - diagonal, ji ij T T = , 0 = ii T , allows one to write Lyapunov energy function as i i i j i i j ij V T V V T E ∑ ∑ ∑ − − = 2 1 (1) which monotone decreases with the dynamics. The two - connection model can be generalized to include higher order connections. This modifies the “field” to be ) 1 ( ) 2 ( ) 3 ( .... i j j ij k j j k ijk i T V T V V T h + + + = ∑ ∑ ∑ (2) where “…..” denotes still higher ord ers, and an energy function can be written as follows: k j i j k i ijk V V V T E ∑ ∑ ∑ − = 3 1 ..... - i i i i j j i ij V T V V T ∑ ∑ ∑ − 2 1 (3) provided that ] [ ijk ijk T T = for i,j,k distinct, which[…] denoting permutations in order, and 0 = ijk T for any i,j,k equal, and that similar symmetry requirements are satisfied for higher order connections. An updating rule reads )] ( sgn[ ) 1 ( t h t S i i = + (4) 3.0 Logic Programming In the simple propositional case, logic clauses take the form . ,...., , ,......, , 2 1 2 1 m n B B B A A A ← which says that ( 1 A or 2 A or….or n A ) if ( 1 B and 2 B and…and n B );they are Horn clauses if 1 = n and 0 ≥ m : we can have rules e.g. . , C B A ← saying − − ∨ ∨ = ∧ ¬ ∨ C B A C B A ) ( ,and assertions e.g. . ← D saying that D is true. A logic program consists of a set of Horn clause procedures and is activated by an initial goal statement. It is in the form of Conjunctive Normal Form (CNF) and contains one positive literal. Basically, logic programming in Hopfield model [5] can be treated as a problem in combinatorial optimization. Therefore it can be carried out in a neural network to obtain the desired solution. Our objective is to find a set of interpretation (i.e., truth value assignments) for the atoms in Proceedings of the International Conference on Intelligent Systems 2005 (ICIS 2005) Kuala Lumpur, 1 – 3 December 2005 the clauses which satisfy the clauses (which yields all the clauses true). For an example, consider the logi c program below: P = A ? B,C ? D ?B ? C ?. Given the goal G ← we require to show that G P ¬ ∧ is inconsistent in order to prove the goal. Alternatively, w e require to find an interpretation for the Herbrand base of the problem which is consistent with P (which yields P true) and examine the truth of G in such an interpretation. If we assign the values 1 to true and 0 to false then 0 = ¬ P in dicates a consistent interpretation while 1 = ¬ P reveals that at least one of the clauses in the program is not satisfied. Therefore, looking for a consistent interpretation is a combinatorial (of assigning truth values to ground atoms) mi nimization of the inconsistency, the value of P ¬ . Translate all clauses and the negation of it into Boolean algebraic form: P=(A ? ¬ B? ¬ C)? (D? ¬ B)? C ¬ P=( ¬ A? B? C)? ( ¬ D? B)? ( ¬ C) From these , we may write a cost function which is minimized when all the clauses are satisfied as follow: ) 1 ( ) 1 ( ) 1 ( C B D c B A p V V V V V V E − + − + − = (5) where the neurons A V , as an example, represent the truth values of A . Notice that we have chosen the multiplication operation to represent the relationship “AND”, and addition operation “OR”. The minimum value for p E is 0, corresponding to the fact that all the clauses are satisfied. The value for p E (which is an integer) is proportional to the number of clauses unsatisfied. An energy function is defined as: i i i j i i j ij k j i i j k ijk V J V V T V V V T H ∑ ∑ ∑ ∑ ∑ ∑ − − − = 2 1 3 1 (6) where the synaptic strength is completely symmetric with zeros in the diagonal planes. By comparing (5) and (6), we obtained the connection strengths. 4.0 The Logic Of Hebbian Learning Now we reproduce results in [5] which calculate the connection strengths using Hebb Rule [6]. For two - neuron connections, a Hebbian - like learning is given by ) 1 2 )( 1 2 ( − − = ∆ j i ij V V T α (7) (or, for bipolar neurons, j i ij S S T α = ∆ ), where α is a learning rate. For connections of other orders, we can generalized this to ) 1 2 )..( 1 2 )( 1 2 ( .... − − − = ∆ n j i m ij V V V T α (8) Assume we have events ,... , , , , , − − − C C B B A A occurring randomly but eq ually probably: a V ,etc. are randomly 0 or 1 with equal probabilities. As such, there would be no nett change in connection strengths, because n ij T ... ∆ has equal probability of being positive as well as being negative. No w say for example − D does not occur. This would result in D T ∆ being positive, which is equivalent to, according to our analysis in the previous section, the assertion . ← D being learnt. In this ca se, our system has learnt a rule which corresponds to only D occurring. For the case of C occurring when D occurs, example CD occurs, − D C occurs, D C − does not Proceedings of the International Conference on Intelligent Systems 2005 (ICIS 2005) Kuala Lumpur, 1 – 3 December 2005 occur, − − D C occurs, there is a nett in crease in ] [ CD T ∆ and a nett decrease of the same magnitude in D T ∆ . This is equivalent to the rule . D C ← being learnt, which the rule giving events like those is observed. However, there is also an incr ease in C T ∆ which means that . ← C has also been learnt . To clarify further, let us look at the case where A occurs if B and C both do. Then the following table summarizes what happens: ] [ ABC T ∆ ] [ AB T ∆ ] [ AC T ∆ ] [ BC T ∆ A T ∆ B T ∆ C T ∆ ABC occurs + + + + + + + − C AB occurs _ + _ _ + + _ C B A − occurs _ _ + _ + _ + − − C B A occurs + _ _ + + _ _ − − C B A occurs _ _ _ + _ + + − − C B A occurs + _ + _ _ + _ C B A − − occurs + + _ _ _ _ + − − − C B A occurs _ + + _ _ _ _ nett + + + - + - - factor x 6( - 1/3) x 2( - 1/2) x 2 (- 1/2) x 2( - 1/2) x 1( - 1) x 1( - 1) x 1( - 1) The net change is multiplied by the number of terms in the energy giving the same contribution( the various permutations of the subscripts) and the factor associated with each term in the energy function. The s ystem “correctly” learns . , C B A ← but also the extra rules ., B A ← ., C A ← . ← A and so on. If we use bipolar neurons, the energy change to include the clause . D C ← is ) 1 )( 1 ( 4 1 D c S S E + − = ∆ (9) Thus the events { CD,C − D , _ C _ D } correctly give the corresponding change in energy without the spurious clause as with binary neurons. This may be expected as the change in variable is effectively an overall change in the neural threshold values. For . , C B A ← the change in energy with bipolar neurons is ) 1 )( 1 )( 1 ( 8 1 C B A S S S E + + − = ∆ (10) The collection of events { ABC, AB _ C , A − B C, A − − C B , − − C B A , C B A − − , − − − C B A } yields the learning of . , C B A ← plus the extra energy term - C B A S S S which causes the system to have a liking f or all A, B, and C to be true. 5.0 Extracting Rules From Database Companies have been collecting data for decades, building massive data warehouses in which to store it. Even though this data is available, very few companies have been able to realize th e actual value stored in it. The question these companies are asking is how to extract this value? So, in this paper we proposed a method known as reverse analysis to induce the logical rules entrenched in a database. These logical rules represent signifi cant patterns or trends in the database that would otherwise go unrecognized. Proceedings of the International Conference on Intelligent Systems 2005 (ICIS 2005) Kuala Lumpur, 1 – 3 December 2005 In this section, we describe the implementation of our method regarding extracting rules from database. i) Enumerate number of neurons and patterns in the database ii) Extract the even ts from the database and represent in binary/bipolar pattern, where 0 indicates false state and 1 indicates true state (for bipolar - 1 represent false state and 1 represent true state). iii) Calculate the connection strengths for the events using Hebbian learn ing as indicated in Section 4.0. iv) Capture nonzero values (connection strengths) for third order connection. v) Reverse analysis been carried out to deduce the underlying logical rules. vi) Logical rules represented in the form of Horn clause s. vii) Calculate connection strengths for the extracted Horn clauses and deduct the value of the connection strengths from (iii). viii) Repeat the similar steps for second - order and first - order connection. Reverse analysis method has been tested in a small data set as shown in Table 1. The logical rules induced from the method seem to agree with the frequent observations. Table 1: Customers daily purchased from a supermarket: Bread Jam Cheese Sausage Burger Peter v Sue v v v John v v Mary v v Anne v v v From the reverse analysis method discussed in Section 5, we obtained the following rules: Bread ß Jam, Cheese Burger ß Sausage, Cheese The logical rules induced, can help the departmental store in monitoring their stock according to the customers demand. Significant patterns or trends in the data set have been identified by using reverse analysis. The departmental store can apply the patterns to improve its sales process according to customers s hopping trends. Futhermore, the knowledge obtained may suggest new initiatives and provide information that improves future decision making. 6.0 CONCLUSION Data might be one of the most valuable assets if we know how to reveal valuable knowledge hidden in raw data. By doing reverse analysis: given the values of the connections of a network (obtained from the data set) , we can hope to know what logical rules are entrenched in it. Reverse analysis method can help us in revealing knowledge hidden in data and turn this knowledge into a crucial competitive advantage. The reverse analysis method yields some limitation such as exists of extra terms (discussed in Section 4.0) and redundancy clauses due to interference effects. However, if we take the learnin g rate for three - neuron connections to be half that for two - neuron connections, this extra term is lost. Moreover, redundancies clauses have been proved does no effect the knowledge base [7]. The logical rules are obtained through frequent observation and these are not necessarily intrinsic in the object themselves. It is our hope that our reverse analysis method will serve to inspire some interesting applications of this method to challenging data mining tasks. References [1] W.A.T. Wan Abdullah, “ Log ic programming on a neural network”. Int. J. Intelligent Sys . 7 . 513 - 519 (1992). [2] Berry, J. A., Lindoff, G., Data Mining Techniques , Wiley Computer Publishing, 1997 (ISBN 0 - 471 - 17980 - 9). [3] Little, W.A. , Math. Biosci . 19 , 101 - 120 (1974). Proceedings of the International Conference on Intelligent Systems 2005 (ICIS 2005) Kuala Lumpur, 1 – 3 December 2005 [4] J.J . Hopfield, “Neural Networks and Physical Systems with Emergent Collective Computational abilities”, Proc. Natl. Acad. Sci. USA , 79 , 2554 - 2558 (1982). [5] W.A.T. Wan Abdullah, “The logic of neural networks”. Phys.Lett.A . 176 . 202 - 206 (1993). [6] Hebb, D.O ., The Organization of Behaviour , Wiley, New York, 1949. [7] Liberatore, L. “Redundancy in Logic I: CNF Propositional Formulae” . Artificial Intelligence , 163 , 203 - 232 (2005). Proceedings of the International Conference on Intelligent Systems 2005 (ICIS 2005) Kuala Lumpur, 1 – 3 December 2005
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment