Information-theoretic measures associated with rough set approximations

Information-theoretic measures associated with rough set approximations Ping Zhu a,b , Qiaoyan W en b a Scho ol of Science , Beijing Universi ty of P osts and T elecommuni catio ns, Beijing 100876, China b State Ke y Laborat ory of Networking and Switchi ng, Beijing University of P osts and T elecommunic ations, Beijing 100876, China Abstract Although some informa tion-theo retic measures of uncer tainty or gr anularity have been prop osed in roug h set theo ry , these measures are o nly dep endent on the un derlying partition and th e cardina lity of the un iv er se, independent of the lower and upper approxima tions. It seems some wh at u nreason able since the basic idea o f rough set th eory aims at describing v ague concep ts by th e lo w er and upper appro ximations. In this paper, we thus deﬁne new inf ormation - theoretic entro py and co -entropy func tions associated to the pa rtition and the appro ximations to measure th e uncer- tainty an d gran ularity o f an appr oximation space. After in troduc ing the n ovel notions of entro py and co-entro py , we then examine their pr operties. In particu lar , we discuss the relatio nship of co- entropie s between d i ﬀ erent un i verses. The theoretical development i s accompan ied by illustrati ve numerical e xamples. K eywor ds: Rough set, entropy, co-entro py, uncertainty, granularity 1. Introductio n T o handle inexact, u ncertain or v agu e knowledge in some inform ation systems, Pa wlak de velo ped roug h set theory in th e early 1980 s [14, 15]. Since then we have witn essed a systematic, world-wide growth of interest in ro ugh set theory and its application s in a numb er of ﬁeld s, su ch as granu lar computin g, data m ining, d ecision ana lysis, patter n recogn ition, and approxima te reasoning [12, 17, 18, 30, 34, 35]. The starting po int of rough set theor y in [14, 15] is the idea that elements of a uni verse having th e same descrip tion are ind iscernible wi th respect to th e available inform ation. The in discernibility was describ ed b y an eq uiv alenc e relation in the way th at two elements are r elated by the relation if an d o nly if they are in discernible fro m each other . As is well known, any eq uiv alenc e relatio n deﬁned on a universe U determines a p artition of U into a collection o f equiv alen ce classes (blocks): each class contains all an d only the elemen ts that are mu tually equiv a lent among them. Any p artition π of U repr esents a piece of knowledge abo ut the elements of U form ing a c lassiﬁcation and so any equiv alen ce class induced by π is interp reted as a granule of kno wledge contained in (or supported by) π . According to Pawlak’ s termin ology exp ressed in [16], any subset X of the universe U is called a concept in U . If the co ncept X is a union of equivalence classes f rom π , then X is pr ecise in π , otherw ise X is vague. The basic idea of roug h set theo ry con sists in replacing vague concepts with a pair of p recise concep ts, its lower and upper app roximatio ns [16 ], and thu s, a basic p roblem in this f ramework is to reason ab out the acc essible gr anules of knowledge. T o this end, various knowledge granulations (also, informa tion granulations or granulatio n measures), as an average measure o f kn owledge granules, have been propo sed and add ressed in [ 1, 3, 8, 1 1, 1 3, 21, 23, 24, 25, 26, 28, 3 2]. Among them, the re are several infor mation-th eoretic measu res of unc ertainty or granula rity for rough sets [1, 3, 8, 1 0, 11, 13, 21, 23, 2 5], wh ich are b ased upon the im portant n otion o f entr opy in troduce d by Shannon [22]; for more details, we refer the reader to the excellent surve y paper s [2 , 27]. It is worth n oting that the inf ormation -theore tic measure s mentioned ab ove ar e o nly depend ent on the sizes of equiv alen ce classes (essentially , the underlying partition) and the cardinality of the univ er se, in depend ent of the l ower Email addr esses: pzhubupt@gmail. com (Ping Zhu), wqy@bupt.edu .cn (Qiaoyan W en) Prep rint submitt ed to Elsevier October 31, 2018 and up per app roxima tion operators. F o r example, in [6, 13, 23, 26] the informatio n entropy H ( π ) of the pa rtition π = { U 1 , U 2 , . . . , U k } is de ﬁned as H ( π ) = − k X i = 1 n i n log n i n , where n i is the cardinality of U i and n = P k i = 1 n i . As a result, it often yields that some partitions like {{ 1 } , { 2 }} and {{ 1 , 2 } , { 3 , 4 }} have the same entr opy (or co-entropy). This seems som ewhat unreaso nable since the basic ide a of rou gh set theory aims at describing vague concepts by the lo we r and upper approxim ations. In other words, the result of this description relies on bo th the partition an d the appro ximations. In light of th is, we should p ay more attention to the lower and upper approxim ation operators. The previous o bservation motiv a tes u s to pro pose an other inform ation-theo retic entropy fu nction to measure the uncertainty associa ted to the par tition and the app roximatio n op erators in this p aper . More co ncretely , given a un iv er se U with n elemen ts and a partition π of U , we take count of the subsets of U described by every pair of lower and upper appro ximations. Assume that r i , 1 ≤ i ≤ m , is the nu mber of subsets descr ibed by the r ough set app roximatio n ( A i , A ′ i ) an d every subset o f U appears with the same p robab ility . It f ollows that th e ro ugh set appro ximation ( A i , A ′ i ) appears with th e ac cumulative probability r i / 2 n since the amoun t of all subsets of U is precisely 2 n . In this way , we obtain a probab ility distribution P ( π ) =  r 1 2 n , r 2 2 n , . . . , r m 2 n  . It giv es rise to an information entropy , say H ( π ) , according to Shannon’ s information theory [22]. On the other hand, we can get by the proba bility distrib ution a co-en tropy G ( π ). It tur ns out that H ( π ) + G ( π ) = n . After exploring some proper ties of the en tropy and co-entro py , we discuss the r elationships o f co -entrop ies between d i ﬀ erent u niv erses. Roughly speaking, the co-e ntropy mo noton ically increases when the p artition becomes coarser . For exam ple, the co-entro py o f {{ 1 , 2 } , { 3 , 4 }} is gre ater than that of {{ 1 } , { 2 }} . The remain der of the pa per is structured as follows. In Section 2, we br ieﬂy revie w some basics of Pawlak’ s rough set theory and the information- theoretic measur es of uncertainty and granularity for rough sets in the literature. Section 3 is dev ote d to ou r novel notions of entr opy and co-entro py and their pro perties. W e ad dress the re lationship of co-en tropies b etween d i ﬀ erent universes in Sectio n 4 and c onclude the paper in Section 5 with a brief discussion on the future research. 2. P reliminaries This sectio n con sists of two subsections. W e brieﬂy recall the deﬁnition of P awlak’ s rough sets in the ﬁrst sub- section a nd th en r evie w two inform ation-the oretic measures of uncertain ty and g ranular ity in r ough set theo ry in the second subsection. 2.1. Rough s e ts W e start by reca lling some basic notions in P awlak’ s rough set theory [14, 15]. Let U be a ﬁnite and nonempty univ er sal set, and let R ⊆ U × U be an equiv alence relation on U . Denote by U / R the set of all eq uiv alence classes induced by R . Such equiv alence classes are also called elemen tary sets ; every union (not necessarily nonemp ty) of elementary sets is called a deﬁnable set . For any X ⊆ U , one can ch aracterize X by a pair o f lower and upp er ap prox imations. The lower appr oximatio n a p p R X of X is deﬁned as th e greatest d eﬁnable set contained in X , while the u pper ap pr oximation a p p R X of X is deﬁned as the least deﬁnable set containing X . Formally , a p p R X = ∪{ C ∈ U / R | C ⊆ X } an d a p p R X = ∪{ C ∈ U / R | C ∩ X , ∅ } . The p air  a p p R X , a p p R X  is referre d to as the r ough set ap pr oximatio n of X . It follows immed iately f rom d eﬁnition that a p p R X ⊆ X ⊆ a p p R X for any X ⊆ U . The ord ered pair h U , R i is said to b e an appr ox imation space . A r o ugh set in h U , R i is th e family of all sub sets of U having the sam e lower and up per appr oximation s. Thus, the general no tion of rough set can be simply id entiﬁed with the rough approx imation of any giv en set. 2 Recall that a p artition of U is a co llection of n onemp ty subsets of U such th at every element x o f U is in exactly one of the se s u bsets; such subsets making up the partition are called blocks . W e write Π ( U ) for the s et of all p artitions of U and P ( U ) for the power set of U . It is well-known that the notion s of p artition and equiv alen ce relation are essentially equivalent, that is, for any equivalence relatio n R on U , the set U / R is a par tition of U , and co n versely , fro m any par tition π o f U , one ca n d eﬁne an equivalence relation R π on U such that U / R π = π in the obvious way . Thu s, we sometimes say that the ordered pair h U , π i is an approxim ation space and write a p p π X a nd a p p π X f or a p p R π X a nd a p p R π X , respectiv ely . Mo re generally , we will use equi valence relation and partition indiscriminately . If a un iv e rse U has more than one element, it is always possible to introdu ce at least tw o canonica l par titions: One is the tri vial partition , denoted by ˇ π , consisting of a unique block, and the other is the discrete partition, denoted by ˆ π , consisting of all singletons from U . F o rmally , ˇ π = { U } an d ˆ π = {{ x } | x ∈ U } . W e n ow deﬁne a partial order “  ” on Π ( U ): For any π, σ ∈ Π ( U ), σ  π if and on ly if for any C ∈ σ , th ere exists D ∈ π such that C ⊆ D . For instance, ˆ π  π  ˇ π fo r any π ∈ Π ( U ) . W e say that σ is ﬁ ner than π and th at π is coarser than σ if σ  π . When σ ≺ π , that is, σ  π and σ , π , we say tha t σ is strictly ﬁner than π and th at π is strictly coarser than σ . In formally , t h is means that σ is a f urther fragmentation of π . 2.2. Information-theo r etic measur es In this subsection , we r evie w two info rmation- theoretic m easures associated with roug h sets in the liter ature. These measures are concerned with the uncertainty or granularity of knowledge provided by a partition. In [6, 1 3, 2 3, 26], Shannon entro py [22] has been used as a measu re of inf ormation for ro ugh set the ory as follows. For subsequ ent need , we ﬁx a n otational c onv en tion: Throu ghou t the pap er , all lo garithms ar e to base 2 unless otherwise speciﬁed. Deﬁnition 2 .1 ([6, 13 , 23, 2 6]) . Let h U , π i be an app r oximation spa ce, wher e the partition π con sists o f blocks U i , 1 ≤ i ≤ k , each having car dina lity n i . Th e information entropy H ( π ) of partition π is deﬁned by H ( π ) = − k X i = 1 n i n log n i n , wher e n = k X i = 1 n i . (1) When π = ˇ π , the en tropy f unction H achieves the minimum value 0, an d when π = ˆ π , it achie ves the maxim um value log n . Moreover, it has been sho wn in [23] th at for any two partitions π and σ of U , if σ ≺ π , then H ( σ ) > H ( π ). The equation (1) can be rewritten as follo ws: H ( π ) = log n − k X i = 1 n i n log n i . (2) Recall that the Hartley measure [7] of uncertainty for a ﬁnite set X is H ( X ) = log | X | , where “ | X | ” deno tes the c ardinality of the set X . It m easures th e am ount of uncertain ty associated with a ﬁn ite set of possible alternatives, the n onspeciﬁcity inherent in the set. The ﬁrst term log n (i.e., log | U | ) in Eq . (2) is exactly the Hartley measure of U , which is a constant independ ent of any partition. The second term of the e quation is basically a n expectation of granu larity with re spect to all blocks in a partition. This q uantity has bee n used b y Y ao to m easure the gran ularity of a p artition in [ 26] and h as been d eﬁned by Lia ng an d Sh i as the ro ugh entro py of knowledge in an app roximatio n space in [11]. This quantity has also b een referred to as co-entropy by some scholars (see, for e x ample, [2, 3]). Deﬁnition 2.2 ([2, 3, 11, 26]) . Let h U , π i be an a ppr oxima tion space, where the p artition π consists o f blocks U i , 1 ≤ i ≤ k , each having car dina lity n i . Th e co-entropy G ( π ) of partition π is deﬁn ed by G ( π ) = k X i = 1 n i n log n i , where n = k X i = 1 n i . (3) 3 It follows immediately from deﬁnition that H ( π ) + G ( π ) = log n . Contrary to the uncertainty measure H , the co-entropy function G a chieves the max imum v alue log n when π = ˇ π an d the m inimum v alue 0 when π = ˆ π ; moreover , it has been kno wn [11 ] that for any two partitio ns π and σ of U , if σ ≺ π , then G ( σ ) < G ( π ). As argued in [2, 3], the entropy H ( π ) ca n be interpreted a s the uncertain ty m easure o f the partition π , while the co-entro py G ( π ) can be r egarded as th e granular ity measure of π . In [ 21], Sen an d Pal intr oduced two othe r en tropy measures fo r cr isp sets and fuzz y sets with (crisp or fuzzy ) equivalence relatio ns or (crisp o r f uzzy) toleran ce relation s, which ar e based upo n th e rou ghness measur es o f X and of the comp lement of X in the u niv e rse an d have been used to analyze the grayness a nd spatial ambiguities in images. Und er the s am e name, there are some di ﬀ erent concepts of entropy in the literature of rough set theory (see, for e x ample, [9, 20]). 3. A novel pair of entropy and co-entropy In this section, we ﬁrst introduce a novel entropy and the correspondin g co-entropy and then explore their proper- ties. Let u s begin with some notations. T hroug hout th is sectio n, we write h U , π i for an approximation space and assume that | U | = n . Giv en a h U , π i , we use A ( U , π ) to deno te the set of roug h set a pprox imations of all subsets of U . Mo re formally , we set A ( U , π ) =   a p p π X , a p p π X      X ⊆ U  . (4) It follows from E q. (4) tha t A ( U , π ) has at le ast two elem ents: ( ∅ , ∅ ) and ( U , U ). If n = 1, then A ( U , π ) exactly consists of the two elements; if n > 1 and π = ˇ π , then A ( U , π ) con tains one m ore elem ent ( ∅ , U ); for any n ≥ 1, if π = ˆ π , then we see th at A ( U , π ) = { ( X , X ) | X ⊆ U } , wh ich con sists of 2 n elements. No te th at the set A ( U , π ) is n ot a multiset, th at is, the same elem ent can not app ear mo re than o nce in A ( U , π ). I n gene ral, we have that |A ( U , π ) | ≤ 2 n since the subset X of U in Eq. (4) h as only 2 n alternatives. For simplicity , we use m to stand for |A ( U , π ) | . For any ( A i , A ′ i ) ∈ A ( U , π ), 1 ≤ i ≤ m , we set A i =  X ⊆ U      a p p π X , a p p π X  = ( A i , A ′ i )  and |A i | = r i . (5) In other word s, r i is the n umber of sub sets of U tha t have the r ough set ap proxim ation ( A i , A ′ i ). It tur ns out that {A 1 , A 2 , . . . , A m } g i ves rise to a partition of P ( U ). T herefo re, we get by Eq. ( 4) that m X i = 1 r i = 2 n . T o illustrate the above concepts, let us see an example. Example 3. 1. Consider U = { 1 , 2 , 3 , 4 } and π = {{ 1 , 2 } , { 3 , 4 }} . In this case, U h as 16 sub sets. F or ea ch subset X of U , we compute the r ough set appr oximatio n of X ; the r esults ar e listed in T able 1. Hence, we see that A ( U , π ) = { ( ∅ , ∅ ) , ( ∅ , { 1 , 2 } ) , ( ∅ , { 3 , 4 } ) , ( { 1 , 2 } , { 1 , 2 } ) , ( ∅ , U ) , ( { 3 , 4 } , { 3 , 4 } ) , ( { 1 , 2 } , U ) , ( { 3 , 4 } , U ) , ( U , U ) } . As an e xa mple, let us calculate r 2 . B y deﬁnitio n, r 2 =       X ⊆ U      a p p π X , a p p π X  = ( ∅ , { 1 , 2 } )       = |{{ 1 } , { 2 }}| = 2 . This is exactly the number o f subsets of U that have the r ough set a ppr oxima tion ( ∅ , { 1 , 2 } ) , which can be counted fr om the table. I n lig ht of this, we ma y get T able 2 b y r earranging T able 1. It follows immedia tely fr om T able 2 th at r 1 = r 4 = r 6 = r 9 = 1 , r 2 = r 3 = r 7 = r 8 = 2 , and r 5 = 4 . 4 T able 1: The subsets and corresponding rough set approximat ions in E xample 3.1. subset approx imation s u bset app roximatio n subset app roximatio n subset approx imation ∅ ( ∅ , ∅ ) { 1 } ( ∅ , { 1 , 2 } ) { 2 } ( ∅ , { 1 , 2 } ) { 3 } ( ∅ , { 3 , 4 } ) { 4 } ( ∅ , { 3 , 4 } ) { 1 , 2 } ( { 1 , 2 } , { 1 , 2 } ) { 1 , 3 } ( ∅ , U ) { 1 , 4 } ( ∅ , U ) { 2 , 3 } ( ∅ , U ) { 2 , 4 } ( ∅ , U ) { 3 , 4 } ( { 3 , 4 } , { 3 , 4 } ) { 1 , 2 , 3 } ( { 1 , 2 } , U ) { 1 , 2 , 4 } ( { 1 , 2 } , U ) { 1 , 3 , 4 } ( { 3 , 4 } , U ) { 2 , 3 , 4 } ( { 3 , 4 } , U ) U ( U , U ) T able 2: The rough set approximati ons and correspon ding subsets in Example 3.1. approx imation subsets approx imation subsets appro ximation subsets ( ∅ , ∅ ) ∅ ( ∅ , { 1 , 2 } ) { 1 } , { 2 } ( ∅ , { 3 , 4 } ) { 3 } , { 4 } ( { 1 , 2 } , { 1 , 2 } ) { 1 , 2 } ( ∅ , U ) { 1 , 3 } , { 1 , 4 } , { 2 , 3 } , { 2 , 4 } ( { 3 , 4 } , { 3 , 4 } ) { 3 , 4 } ( { 1 , 2 } , U ) { 1 , 2 , 3 } , { 1 , 2 , 4 } ( { 3 , 4 } , U ) { 1 , 3 , 4 } , { 2 , 3 , 4 } ( U , U ) U Because we ar e concerned with the p artition granulation o f h U , π i with respect to the approximation o perators a p p and a p p , we may a ssume that every sub set of U appear s with the same probability 1 / 2 n . As a result, the ro ugh set approx imation ( A i , A ′ i ) appears with the accumulative pr obability r i / 2 n and we thus obtain a prob ability distrib utio n P ( π ) =  r 1 2 n , r 2 2 n , . . . , r m 2 n  . (6) According to Shannon’ s infor mation theor y [22], the Shanno n entropy func tion of the p robab ility distribution P ( π ) is deﬁned as follows. Deﬁnition 3.1. K eep the notations as above. Th e information entropy H ( π ) of h U , π i (with r espe ct to the appr oxima- tion operators a p p and a p p) is deﬁn ed by H ( π ) = H ( P ( π )) = − m X i = 1 r i 2 n log r i 2 n . (7) In the above d eﬁnition, for simplicity we ha ve used the notation H ( π ) instead of H ( U , π ) . Follo w ing the e xp lana- tion of Shanno n entropy in information theory , the quantity H ( π ) measures the uncertain ty associated to the partition π with respe ct to the appro ximation op erators a p p and a p p . For in stance, the p robab ility distribution corre spondin g to the partition π = {{ 1 , 2 } , { 3 , 4 }} in Example 3.1 is P ( π ) = 1 2 4 , 2 2 4 , 2 2 4 , 1 2 4 , 4 2 4 , 1 2 4 , 2 2 4 , 2 2 4 , 1 2 4 ! . It follows from Deﬁnition 3.1 that H ( π ) = − 9 X i = 1 r i 2 4 log r i 2 4 = − " 1 2 4 log 1 2 4 + 2 2 4 log 2 2 4 + 2 2 4 log 2 2 4 + 1 2 4 log 1 2 4 + 4 2 4 log 4 2 4 + 1 2 4 log 1 2 4 + 2 2 4 log 2 2 4 + 2 2 4 log 2 2 4 + 1 2 4 log 1 2 4 # = 3 . 5 Similar to other entro py functions in rough set theory , the information entropy in Deﬁnition 3.1 has the following proper ties. Theorem 3.1. (1) F or any π, σ ∈ Π ( U ) , if σ ≺ π , then H ( σ ) > H ( π ) . (2) The entr opy function H r ea ches the maximum value n for the ﬁnest partition ˆ π . (3) The entr opy function H r ea ches the minimum value n − 2 n − 2 2 n log(2 n − 2) for the coarsest partition ˇ π . Pr oof. (1 ) W ithout loss of g enerality , we may assume that π = { U 1 , U 2 , . . . , U k } an d σ = { U a , U b , U 2 , . . . , U k } , where U a ∪ U b = U 1 . Suppose that |A ( U , π ) | = m and for any ( A i , A ′ i ) ∈ A ( U , π ), 1 ≤ i ≤ m , we write r i for       X ⊆ U      a p p π X , a p p π X  = ( A i , A ′ i )       . Based on the partition π , the power set P ( U ) is partitioned into m bloc ks an d the i -th block has the cardinality r i . Similarly , we de note b y s j the cardinality o f the j -th block o f P ( U ) associated to the partition σ . W e no w conside r the elem ents of A ( U , σ ). For a ny ( B j , B ′ j ) ∈ A ( U , σ ), ther e are two p ossibilities: On e is th at ( B j , B ′ j ) ∈ A ( U , π ), say ( B j , B ′ j ) = ( A i j , A ′ i j ) for s o me i j . In this c ase, it is clear that s j = r i j . T he other case is that ( B j , B ′ j ) ∈ A ( U , σ ) \A ( U , π ) , where the sym bol A \ B deno tes the set o f all elem ents which a re memb ers of A but n ot membe rs of B . It f ollows that for some i j ,  X ⊆ U      a p p σ X , a p p σ X  = ( B j , B ′ j )  (  X ⊆ U      a p p π X , a p p π X  = ( A i j , A ′ i j )  , because the partition σ is strictly ﬁner tha n π . In this case, we also see that the i j -th block provided b y π is partitioned into s m aller blocks and th us r i j = P j s j > s j . In summary , we ge t that e ither r i = s j or r i = P j s i j > s i j , and moreover , the latter case must exist as σ ≺ π . W e th us assume that r i = s i j for i ∈ I 1 and r i = P j s i j > s i j for i ∈ I 2 , where I 2 , ∅ and I 1 ∪ I 2 = { 1 , 2 . . . , m } . Let us compare H ( σ ) with H ( π ) . H ( π ) = − m X i = 1 r i 2 n log r i 2 n = − X i ∈ I 1 r i 2 n log r i 2 n − X i ∈ I 2 r i 2 n log r i 2 n = − X i ∈ I 1 s i j 2 n log s i j 2 n − X i ∈ I 2 P j s i j 2 n log P j s i j 2 n = − X i ∈ I 1 s i j 2 n log s i j 2 n − 1 2 n X i ∈ I 2         X j s i j                 log         X j s i j         − n         = − X i ∈ I 1 s i j 2 n log s i j 2 n − 1 2 n X i ∈ I 2           log         X j s i j          P j s i j  − n         X j s i j                   < − X i ∈ I 1 s i j 2 n log s i j 2 n − 1 2 n X i ∈ I 2         log         Y j s s i j i j         − n         X j s i j                 = − X i ∈ I 1 s i j 2 n log s i j 2 n − 1 2 n X i ∈ I 2         X j s i j log s i j − n         X j s i j                 = − X i ∈ I 1 s i j 2 n log s i j 2 n − X i ∈ I 2 X j s i j 2 n log s i j 2 n = H ( σ ) , namely , H ( σ ) > H ( π ). Th erefore, the clause (1) holds. 6 (2) It follows from (1) that H reaches the maximum v a lue when π = ˆ π . In this case, we get by deﬁnition that H ( ˆ π ) = − 2 n X i = 1 1 2 n log 1 2 n = n . This proves (2). (3) By ( 1), we see that H re aches th e m inimum value wh en π = ˇ π . In this case, the empty sub set ∅ of U has the rough set approximation ( ∅ , ∅ ) and U itself has the rough set approxim ation ( U , U ). For any prop er subset of U , if an y , it has the rough set approx imation ( ∅ , U ). He nce, r 1 = r 2 = 1 an d r 3 = 2 n − 2. W e thus obtain by deﬁnition that H ( ˇ π ) = − 1 2 n log 1 2 n − 1 2 n log 1 2 n − 2 n − 2 2 n log 2 n − 2 2 n = n − 2 n − 2 2 n log(2 n − 2) . Whence, (3) holds, ﬁnishing the proof of the proposition. Note that in the clause (3) of Theorem 3.1, if n = 1, the value of the correspond ing summ and 0 log 0 is taken to be 0, which is consistent with the limit: lim x → 0 + x log x = 0 . For later need, let us recall the follo wing deﬁnition from [31]. Deﬁnition 3.2. Let h U , π i an d h V , σ i be two appr oxima tion spaces, and suppose that f : U − → V is a ma pping. (1) The map ping f is called a homomor phism fr o m h U , π i to h V , σ i if for any C ∈ π , ther e exists D ∈ σ such that f ( C ) ⊆ D, wher e f ( C ) = { f ( u ) | u ∈ C } . (2) A homomorph ism f is called a mon omorp hism if f is an in jective mapping. (3) A monomo rphism f is called strictly mo nomo rphic if either ther e e xist C ∈ π and D ∈ σ such that f ( C ) ( D, namely , f ( C ) ⊆ D an d f ( C ) , D, or | V | > | U | . (4) The ma pping f is ca lled an isomo rphism if the ma pping f : U − → V is bijective, an d mor eover , both f an d its in verse mapping f − 1 ar e h omomorph isms. W e can now state the following facts. Proposition 3.1 . Let h U , π i and h V , σ i be two a ppr oxima tion spaces w ith | U | = | V | , and let f : U − → V be a ma pping. (1) If f is a mon omorph ism fr om h U , π i to h V , σ i , in particular , π  σ , th en H ( π ) ≥ H ( σ ) . (2) If f is a strict m onomorp hism fr om h U , π i to h V , σ i , in particular , π ≺ σ , then H ( π ) > H ( σ ) . (3) If f is a n isomorphism fr om h U , π i to h V , σ i , then H ( π ) = H ( σ ) . Pr oof. It follows immediately from Deﬁnition 3.1 and Theorem 3.1. T o measure the granularity with r espect t o the approximation op erators a p p an d a p p ca rried by the partition π , w e introdu ce the concept of co-entropy , which correspond s to the information entropy in Deﬁnition 3.1. Deﬁnition 3.3. K eep the notations as in D eﬁnition 3.1. The co-entro py G ( π ) of h U , π i (with r espect to the appr oxima- tion operators a p p and a p p) is deﬁn ed by G ( π ) = G ( P ( π )) = m X i = 1 r i 2 n log r i . (8) The qu antity G ( π ) fu rnishes a m easure of th e average granularity carried b y the partition π as a whole. It f ollows immediately from deﬁnition that H ( π ) + G ( π ) = n . (9) It m eans that the two measures complement eac h o ther with respect t o the co nstant quantity n = | U | , which is in variant with respect to the choice of the partition π of U . The co-entro py function G is of the following properties. 7 Theorem 3.2. (1) F or any π, σ ∈ Π ( U ) , if σ ≺ π , then G ( σ ) < G ( π ) . (2) The co-entr opy function G r eaches the minimum value 0 for the ﬁnest partition ˆ π . (3) The co-entr opy function G r eaches the maximum value 2 n − 2 2 n log(2 n − 2) for the coarsest partition ˇ π . Pr oof. All th e clauses follow directly from Theorem 3.1 and Eq. (9). Similar to Proposition 3.1, we have the following observation. Proposition 3.2. Let h U , π i and h U , σ i be tw o appr oximatio n spa ces with | U | = | V | , and let f : U − → V b e a mapping. (1) If f is a mon omorph ism fr om h U , π i to h V , σ i , in particular , π  σ , th en G ( π ) ≤ G ( σ ) . (2) If f is a strict m onomorp hism fr om h U , π i to h V , σ i , in particular , π ≺ σ , then G ( π ) < G ( σ ) . (3) If f is a n isomorphism fr om h U , π i to h V , σ i , then G ( π ) = G ( σ ) . Pr oof. It follows immediately from Proposition 3.1 and Eq. (9). As a c orollary of The orem 3.2 and Pr oposition 3.2, we see that G is a partitio n measure on U in th e sense o f [31, Deﬁnition 3.4], th at is, G is non negativ e and satisﬁes the fo llowing two co nditions: G ( σ ) < G ( π ) if σ ≺ π ; G ( π ) = G ( σ ) if ther e exists an isomorp hism from h U , π i to h V , σ i . Note that our information entropy and co-entropy are not directly based on the blocks of a partition. Ther efore, in general they do not satisfy the deﬁnition of expected granularity proposed in [28]. 4. Re lationship of co-entropies between di ﬀ erent uni verses In the la st section, we ha ve seen that if f is a strict monomorp hism from h U , π i to h U , σ i , in particular, π ≺ σ , then H ( π ) > H ( σ ) an d G ( π ) < G ( σ ). In this section, we consider th e monoto nicities of H an d G for di ﬀ eren t universes. In othe r words, we compar e H ( π ) with H ( σ ) and G ( π ) with G ( σ ) when there exists a strict mono morph ism fro m h U , π i to h V , σ i , where | V | > | U | . For con venien ce, we write h U , π i ֒ → h V , σ i if | V | > | U | an d there exists a strict monom orphism from h U , π i to h V , σ i . W e start with the fo llowing observation on the entro py functio n H and the co -entropy fu nction G revie we d in Section 2 .2. Con sider h U 1 , π 1 i = h{ 1 } , {{ 1 }}i , h U 2 , π 2 i = h{ 1 , 2 } , {{ 1 } , { 2 }}i , and h U 3 , π 3 i = h{ 1 , 2 , 3 } , {{ 1 , 3 } , { 2 }}i . Clearly , h U 1 , π 1 i ֒ → h U 2 , π 2 i ֒ → h U 3 , π 3 i . It is easy to check by Deﬁnition 2.1 that H ( π 1 ) = 0, H ( π 2 ) = 1, and H ( π 3 ) = log 3 − 2 3 < 1. Th is means that the entropy function H is not mon otonic. By the way , we can get by a d irect computatio n that G ( π 1 ) = 0 , G ( π 2 ) = 0, an d G ( π 3 ) = 1 2 . Let us con tinue to discuss the mon otonicity of co-entr opy function G . Consider h U 1 , π 1 i = h{ 1 } , {{ 1 }}i , h U 2 , π 2 i = h{ 1 , 2 } , {{ 1 , 2 }}i , and h U 3 , π 3 i = h{ 1 , 2 , 3 } , {{ 1 , 2 } , { 3 }}i . Again, we see that h U 1 , π 1 i ֒ → h U 2 , π 2 i ֒ → h U 3 , π 3 i . It is easy to check by Deﬁnition 2. 2 th at G ( π 1 ) = 0, G ( π 2 ) = 1, and G ( π 3 ) = 2 3 . T his shows that the co-e ntropy function G is not monotonic either . In this c ase, we can obtain by a direct computation that G ( π 1 ) = 0, G ( π 2 ) = 1 2 , and G ( π 3 ) = 1 2 . Finally , we address the mono tonicity of entr opy func tion H . Con sider h U 1 , π 1 i = h{ 1 , 2 } , {{ 1 , 2 }}i , h U 2 , π 2 i = h{ 1 , 2 , 3 } , {{ 1 , 2 } , { 3 }}i , and h U 3 , π 3 i = h{ 1 , 2 , 3 , 4 } , {{ 1 , 2 , 4 } , { 3 }}i . Obviously , we h av e that h U 1 , π 1 i ֒ → h U 2 , π 2 i ֒ → h U 3 , π 3 i . By a rou tine com putation we c an g et that H ( π 1 ) = 3 2 , H ( π 2 ) = 5 2 , an d H ( π 3 ) = 13 4 − 3 4 log 3 < 5 2 . Consequently , th e entropy function H is not mono tonic either . On the o ther hand , it fo llows fr om E q. (9) tha t G ( π 1 ) = 1 2 , G ( π 2 ) = 1 2 , and G ( π 3 ) = 3 4 + 3 4 log 3. 8 As a result, in all the above three cases we alw ay s ha ve that G ( π 1 ) ≤ G ( π 2 ) ≤ G ( π 3 ) . W e thus conjec ture that G ( π ) ≤ G ( σ ) whenever h U , π i ֒ → h V , σ i . Ind eed, it holds true, as we will see later . T o prove the conjecture, it is con venient to introduce the following no tion and a ke y lem ma. Deﬁnition 4 .1. Let h U , π i b e a n a ppr oxima tion space a nd a < U . Th e ap pr oximatio n space h U ∪ { a } , π ∪ {{ a }}i is called the one- point extension of h U , π i by a. W e s ay th at h V , σ i is a one-po int extension of h U , π i if h V , σ i = h U ∪ { a } , π ∪ {{ a }}i for some a. For example, h U 2 , π 2 i = h{ 1 , 2 , 3 } , {{ 1 , 2 } , { 3 }}i is the one-poin t extension of h U 1 , π 1 i = h{ 1 , 2 } , {{ 1 , 2 }}i by 3. The following lemma shows that one-poin t extension does not change co-entropy . Lemma 4.1. Let h V , σ i be a one-po int e xten sion of h U , π i . Then G ( σ ) = G ( π ) . Pr oof. Sup pose th at π = { U 1 , U 2 , . . . , U k } and σ = { U 1 , U 2 , . . . , U k , { a }} , wh ere a < U ; assum e that A ( U , π ) = { ( A i , A ′ i ) | 1 ≤ i ≤ m } and A i =  X ⊆ U      a p p π X , a p p π X  = ( A i , A ′ i )  with |A i | = r i . It thus follows that A ( V , σ ) = A ( U , π ) ∪  A i ∪ { a } , A ′ i ∪ { a }  | 1 ≤ i ≤ m  . For any ( B i , B ′ i ) ∈ A ( V , σ ), we w rite B i for  X ⊆ V      a p p σ X , a p p σ X  = ( B i , B ′ i )  and s i for |B i | . If ( B i , B ′ i ) = ( A i , A ′ i ) ∈ A ( U , π ), then we see that B i = A i and thus s i = r i in this case. If ( B i , B ′ i ) =  A i ∪ { a } , A ′ i ∪ { a }  ∈ n A i ∪ { a } , A ′ i ∪ { a }  | 1 ≤ i ≤ m o , th en we have that B i = { X ∪ { a } | X ∈ A i } and s i = r i still h olds in this case. There- fore, we get by Deﬁnition 3.3 that G ( σ ) = m X i = 1 r i 2 n + 1 log r i + m X i = 1 r i 2 n + 1 log r i = m X i = 1 r i 2 n log r i = G ( π ) , ﬁnishing the proof of the lemma. For subsequent need, we w o uld like to generalize Deﬁnition 4.1 as follows. Deﬁnition 4.2. Let h U , π i and h V , σ i be two a ppr oxima tion spaces. W e say that h V , σ i is a m ulti-one- point extension of h U , π i if th er e a r e a ppr oxima tion spac es h U i , π i i , 0 ≤ i ≤ l, with h U 0 , π 0 i = h U , π i a nd h U l , π l i = h V , σ i such tha t each h U i , π i i , 1 ≤ i ≤ l, is a one- point e xtension o f h U i − 1 , π i − 1 i . For example, h V , σ i = h{ 1 , 2 , 3 , 4 } , {{ 1 , 2 } , { 3 } , { 4 }}i is a mu lti-one-p oint extension of h U , π i = h{ 1 , 2 } , {{ 1 , 2 }}i . I n fact, we may take h U 0 , π 0 i = h U , π i , h U 1 , π 1 i = h{ 1 , 2 , 3 } , {{ 1 , 2 } , { 3 }}i , and h U 2 , π 2 i = h V , σ i . The following f act follows immediately from Lemma 4.1. Corollary 4.1. If h V , σ i is a multi-o ne-po int extension of h U , π i , th en G ( σ ) = G ( π ) . In light of the above corollar y , let u s refer to multi-one-p oint extensions as one-poin t extensions for simplicity . Further, we h av e the following observation. Theorem 4 .1. Supp ose that th er e is a mono morphism fr om h U , π i to h V , σ i . If ther e e xists h U ′ , π ′ i tha t satisﬁes the following two condition s: (a) either h U ′ , π ′ i = h U , π i or h U ′ , π ′ i is a o ne-po int e xten sion of h U , π i , (b) h U ′ , π ′ i is isomo rphic to h V , σ i , 9 then G ( π ) = G ( σ ) ; othe rwise, G ( π ) < G ( σ ) . Pr oof. W e ﬁrst co nsider the ca se th at there exists h U ′ , π ′ i that satisﬁes the cond itions (a) and (b). In this case, if h U ′ , π ′ i = h U , π i and h U ′ , π ′ i is isomo rphic to h V , σ i , then | V | = | U | and we see by Pro position 3.2 that G ( π ) = G ( σ ). If h U ′ , π ′ i is a one- point e x tension of h U , π i and h U ′ , π ′ i is isomo rphic to h V , σ i , then we get tha t G ( π ) = G ( π ′ ) by Corollary 4.1 and G ( π ′ ) = G ( σ ) b y Proposition 3.2. Consequently , G ( π ) = G ( σ ). W e n ow co nsider the case that there does no t exist h U ′ , π ′ i such that the co nditions are satisﬁed. I t forces that the mono morph ism, say f , fro m h U , π i to h V , σ i is strict. T wo cases n eed to consider . On e is th at | V | = | U | . In this case, it f ollows f rom Prop osition 3.2 that G ( π ) < G ( σ ). The o ther case is that | V | > | U | . I n this case, let us set h V 1 , σ 1 i = h f ( U ) , f ( π ) i , where f ( U ) is the im age of U under f and f ( π ) = { f ( U ′ ) | U ′ ∈ π } . In fact, f gives rise to an isomorph ism b etween h U , π i an d h V 1 , σ 1 i . Th erefor e, G ( π ) = G ( σ 1 ). Note that V 1 = f ( U ) ⊆ V . W e now take h V 2 , σ 2 i as follows: V 2 = V , σ 2 = σ 1 ∪ {{ a } | a ∈ V \ V 1 } . It fo llows that h V 2 , σ 2 i is a one- point extension of h V 1 , σ 1 i . Hence, G ( σ 1 ) = G ( σ 2 ). Because f is a strict mono mor- phism, we see that h U , π i ֒ → h V 2 , σ 2 i and σ 2 ≺ σ . Whence, we get G ( σ 2 ) < G ( σ ) by Theor em 3.2. As a result, G ( π ) < G ( σ ). This comp letes the proof of the theorem. Let us provide an informal explanation of Theor em 4 .1. The hypothesis that there is a monomo rphism fro m h U , π i to h V , σ i mean s th at h U , π i is ﬁn er than h V , σ i . I n the special case tha t the mon omorp hism is n ot strict, we h av e that h U , π i an d h V , σ i are isomo rphic, and thus, th ey have the same co -entropy . If the mo nomo rphism is strict, th en after renaming the elements of U , we can get a ﬁner partition than h V , σ i by using one- point e x tensions. T heorem 4.1 says that G ( π ) < G ( σ ) if π is ﬁner than σ . W e end this section with several e x amples. Example 4.1. A trivial example is that h U , π i = h{ 1 , 2 , 3 } , {{ 1 , 2 } , { 3 }}i and h V , σ i = h{ a , b , c } , {{ a , b } , { c }}i . The map - ping f tha t ma ps 1 , 2 , and 3 to a, b, and c respectively is a monomo rphism. In fact, f is an isomo rphism. Hence, G ( π ) = G ( σ ) . A dir ect computation shows that G ( π ) = 0 . 5 = G ( σ ) . Consider h U , π i = h{ 1 , 2 } , {{ 1 , 2 }}i and h V , σ i = h{ a , b , c , d } , {{ a , b } , { c } , { d }}i . The ma pping f th at ma ps 1 and 2 to a and b respectively is a monomorphism, w h ich yields that h U , π i is isomorphic to h V 1 , σ 1 i = h{ a , b } , {{ a , b }}i . Clearly , we can get h V , σ i by one- point extensions of h V 1 , σ 1 i . Ther efo r e, G ( π ) = G ( σ ) . On the o ther ha nd, we can get by a computatio n that G ( π ) = 0 . 5 = G ( σ ) . F inally , consider h U , π i = h{ 1 , 2 } , {{ 1 , 2 }}i and h V , σ i = h{ a , b , c , d } , {{ a , b } , { c , d }}i . As mentione d earlier , the mapping f that maps 1 and 2 to a and b respectively is a monomorphism, w h ich gives a n i so morphism between h U , π i and h V 1 , σ 1 i = h{ a , b } , {{ a , b }}i . W e can get h V 2 , σ 2 i = h{ a , b , c , d } , {{ a , b } , { c } , { d }}i by one-po int extensions of h V 1 , σ 1 i . Clearly , σ 2 ≺ σ . As a r esult, G ( π ) < G ( σ ) . On the other hand, w e can obtain by a dir ect computation that G ( π ) = 0 . 5 and G ( σ ) = 0 . 75 . 5. Co nclusion In this pap er , we have proposed the novel notions of entropy and co-en tropy by taking both par titions and the lower an d upper app roximatio ns into ac count. Some desirab le prop erties o f the en tropy and co-en tropy have been presented. Fu rthermo re, we ha ve inv estiga ted the relationship of co-entrop ies between di ﬀ erent uni verses. There are se veral problem s which are worth fu rther studying. Firstl y , the present work fo cuses on the classical rough sets b ased on p artitions. It would be inter esting to ge neralize th e notio ns of entropy and c o-entro py here into the framework of covering ro ugh sets [4, 19, 29] or fuzzy rou gh sets [5]. It is also in teresting to compare the entropies (co-entrop ies) un der some special homomo rphisms such as neighbo rhood -consistent functions introduced in [33]. Seco ndly , it rem ains to develop the co rrespon ding roughness measure based on the entropy or co-en tropy for measuring numer ically the r oughn ess of an appro ximation. Finally , the con ditioned entropy and conditio ned co-entro py [2] a re yet to be addressed in our framew o rk. Acknowledgements This work was su pported by the N ational Natur al Science Foundatio n of Chin a und er Grants 608 2100 1, 608 7319 1, 60903 152, and 610702 51. 10 References [1] Bea ubouef, T ., Petry , F . E., Arora, G., 1998. Information-theo retic measure s of uncertaint y for rough sets and rough relation al databases. Inform. Sci. 109, 185–195. [2] Bia nucci, D. , Cattan eo, G., 2009. Information entropy and granulat ion co-en tropy of partitions and cov erings: A summary . In: Peters, J., Sko wron, A., W olski , M. , Cha kraborty , M. , W u, W .-Z. (Eds.), T ransaction s on Rough Set s X. V ol. 5656 of Lec t. Notes Comput. Sci . Springer Berlin / H eidel berg, pp. 15–66. [3] Bia nucci, D., Cattaneo, G., Ciucci, D. , 2007. Entropies and co-entropies of cov erings with applicati on to incomple te information systems. Fund. Inform. 75, 77–105. [4] Bryni arski, E., 1989. A calcu lus of rough sets of the ﬁrst order . Bull. Pol. Acad. Sci. 36 (16), 71–77. [5] Duboi s , D., Prade, H., 1990. Rough fuzzy sets and fuzzy rough sets. Int. J. Gen. Syst. 17, 191–208. [6] D ¨ untsch, I., Gediga , G., 1998. Uncertainty measures of rough set prediction. Artif. Intell. 106 (1), 109–137. [7] Hart ley , R. V . L., 1928. Transmission of informa tion. Bell Syst. T ech. J. 7, 535–564. [8] Liang , J., Shi, Z., Li, D. , Wie rm an, M. J., 2006. Information entropy , rough entropy and knowledg e granulation in incomplete information systems. Int. J. Gen. Syst. 35 (6), 641–654. [9] Liang , J. Y ., Chin, K. S., Dang, C. Y . , Y am, R. C. M., 2002. A new method for measuring uncert ainty and fuzziness in rough set theory . Int. J. Gen. Syst. 31, 331–342. [10] Lian g, J. Y ., Qian, Y . H., 2008. Information granules and entropy theory in information systems. Sci. China Ser . F: Inform. Sci. 51 (10), 1427–1444. [11] Lian g, J . Y ., Shi, Z. Z., 2004. The informati on entropy , rough entropy and knowl edge granula tion in rough set theory . Int. J. Uncert. Fuzz. Kno wl. Syst. 12 (1), 37–46. [12] Lin, T . Y ., Cercone, N. (Eds.), 1997. Rough Sets and Data Mining. Kluwer Academic Publishers, Boston. [13] Mia o, D. Q., W ang, J. , 1998. On the rel ationships between information entrop y and roug hness of kno wledge in rough se t the ory (in Chinese). Patt ern Recognit. Artif. Intell. 11 (1), 34–40. [14] P awlak, Z., 1982. Rough sets. Int. J. Comput. Inform. Sci. 11 (5), 341–356. [15] P awlak, Z., 1991. Rough Sets: Theoretical Aspects of Rea soning about D ata. Kluwer Acade mic Publishers, Boston. [16] P awlak, Z. , 1992. Rough s ets: a new approach to vague ness. In: Zadeh, L . A., Kacprzyc, J. (Eds. ), Fuzzy Logic for the Management of Uncerta inty . John Wil ey & Sons, Inc, Ne w Y ork, pp. 105–118. [17] Polk owski, L., Sko wron, A. (Eds. ), 1998. Rough Sets and Current Tre nds in Computing. V ol. 1424. Springer , Berlin. [18] Polk owski, L., Sko wron, A. (Eds. ), 1998. Rough Sets in Kno wledge Discov ery . V ol. 1 and 2. Physica-V erlag, Heidelber g. [19] Pomyka la, J. A., 1987. Approximat ion operations in approxi m ation space. Bull. Pol. Acad. Sci. 35 (9-10), 653–662. [20] Qia n, Y . H. , Liang, J. Y . , 2008. Combination entrop y and combination granulati on in rough set theory . Int. J. Uncert. Fuzz. Knowl. Syst. 16 (2), 179–193 . [21] Sen, D., Pal, S. K., 2009. Generali zed rough sets, entropy , and image ambiguity measures. IEE E Trans. Syst., Man, Cybern. B, Cybern. 39 (1), 117–128 . [22] Shan non, C. E., 1948. A mathemati cal theory of communica tion, I, II. Bell Syst. T echn. J. 27, 379–423, 623–656. [23] W ierman, M., 1999. Measuri ng uncertainty in rough set theory . Int. J. Gen. Syst. 28, 283–297. [24] Xu, W . H. , Zhang, X. Y ., Zhang, W . X., 2009. Knowle dge granulation, kno wledge entropy and knowledge uncertaint y measure in ordered informati on systems. Appl. Soft Comput. 9 (4), 1244–1251. [25] Y ao, Y . Y ., 2003. Information-the oretic measures for knowle dge discover y and data m ining. In: Karmeshu (Ed. ), Entropy Measures, Maxi- mum Entropy and Emer ging Applicat ions. Springer , Berlin, pp. 115–136. [26] Y ao, Y . Y ., 2003. Probabilisti c approac hes to rough sets. Expert Syst. 20 (5), 287–297. [27] Y ao, Y . Y ., 2010. Notes on rough set approximat ions and associate d measures. J. Zhejiang Ocea n Univ . (Natur . Sci.) 29, 399–410. [28] Y ao, Y . Y ., Zhao, L. Q., 2009. Granularit y of partiti ons, (pri vate communication ). [29] Zak owski, W ., 198 3. Approximati ons in the spac e ( u , π ). Demonstr . Math. 16, 761–769 . [30] Zhong, N., Y ao, Y ., Ohshima, M. , 2003 . Peculiarity oriented multidataba se mining. IEEE Trans. Kno wl. Data Eng. 15 (4), 952–960. [31] Zhu, P ., 2009. An axiomatic approach to the roughness m easure of rough sets. Fund. Inform.T o appear , av ailabl e at: Arxiv preprint arXi v:0911.5395 . [32] Zhu, P . , 2009. An impro ved axiomatic deﬁnition of information granulatio n. Arxiv prepri nt arXiv: 0908.3999 . [33] Zhu, P . , W en, Q., 2010. Some impro ved results on communication between information systems. Inform. Sci. 180 (18), 3521–3531. [34] Zhu, W . , W ang, F . Y ., 2006. Cove ring based granular computing for conﬂict analysis. V ol. 3975 of L ect. Notes Comput. Sci. pp. 566–571, IEEE Int. Conf. Intel l. Secur . Inform. (ISI 2006), San Diego, CA, May 23-24, 2006. [35] Ziar ko, W . (Ed.), 1994. Rough Sets, Fuzzy Sets, and Knowledg e Discove ry . Springer-V erlag, Berlin. 11

Information-theoretic measures associated with rough set approximations

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment