Data Accuracy Model for Distributed Clustering Algorithm based on Spatial Data Correlation in Wireless Sensor Networks

Data Accuracy Model for Distribu te d Clustering Algorithm based on Spatial Data Correlation in Wireless S ensor Networks 1 Jyotirmoy Karjee , 2 H.S Jamadagni 1 Centre for Electronics Design and Tech nology, In dian Institute of Science, Bangalore, Ind ia kjyotirmoy@cedt.iisc.ernet.in 2 Centre for Electronics Design and Tech nology, In dian Institute of Science, Bangalore, Ind ia hsjam@cedt.iisc.ernet.in Abstract Objective: The main objective of this paper is to constr uct a distributed clustering algorithm based upon spatial data correlati on among sensor nodes and perform data accuracy for each distributed cluster at their respective cluster head node. Design Procedure/Approach: We investigate that due to deployment of high density of sensor nodes in the sensor field, spa tial data are highly correlated among sensor nodes in spatial domain. Based on high data correlation among sensor nodes, we propose a non - overlapping irregular distributed clustering algorithm with different sizes to collect most accurate or precise data at th e cluster head node for each resp ective distributed cluster. To collect the most accurate data at the cluster head node for each distributed cluster in sensor field, we propose a Data accuracy m odel and compare the results with Information accuracy model. Finding: Simu lation results shows that our propose Data accuracy model collects more accurate data and gives better performance than Information accuracy model at the cluster head node for each respective distribu ted cluster in our propose distributed clustering algorithm.Morover ther e exist a optimal clus ter of sensor nodes which is adequate to perform approximately the same data accu racy achieve by a cluster. Practical Implementation: Measuring humidit y and moisture content in an agricultural field, measuring temperature in physical environment. Inventive /Novel Idea: A distributed clustering algorithm is proposed based on spatial data correlation among sensor nodes with Data accuracy model. Keywords: Spatial correlation, distributed cluste rs, data accuracy, wireless sensor networks. 1. Introduction R ecent development of wireless technology and em bedded system made a drastic improvement over wireless sensor networks. Due to ease of deployment and reliable cost , sensor networks are used in many applications to se nse or collect the physical phenom enon of raw data for any event such as temperature, hu midity, seism ic event, fi re, etc from the physical environment [1]. A small processing unit device called node captures the physical phenomenon of raw data from the physical environment. These nodes can process the raw data, communicate wirelessly among other nodes and finally transmits the collected raw data to the base station or sink node. Generally the physically sensed data collecte d by the sensor nodes are spatially correlated [2] in the sensor field. If the deployed density of sensor nodes increases , the spatially proximal sensor observations are highly correlated [3] in the sensor field. Since the sensor observations are highly correlated among sensor nodes, the sensor nodes form distributed clusters [4] in the sensor field to minimize data co llection cost [5] . According to literature survey, LEACH [6 ] dem onstrates a clear concept about distributed dynamic cluster form ation according to priori probability. Each distributed cluster has respective Cluster Head (CH) [7] node wh ich aggregates the data collected from all the sensor nodes in the cluster and finally transmits the processed data to the sink node. Moreover SEP [8] gives the cluster formation in the heterogeneous sensor networks. Literature [3, 9] shows the spatial correlation of observed data among sensor nodes to form distributed clusters. A grid based clustering method proposed in litera ture [10] shows a spatial correla tion model for cluster formation. Basically this type of theoretical clustering model ra rely hap pens in practical scenar io in the sensor field. A disk-shaped circular clus ter proposed in literatur e [11] shows grouping of nodes into disjoint set each managed by a designated CH node. However fo rmation of disk shape clu ster doesn’t really appears in original scenario. Mos t of the cases, cl uster formation are irregular in shape and size the in spatial domain. In lite rature [4] authors proposed a distribute d clustering algorithm with different shape and size based upon shortest distance among sensor nodes and CH node s in spatial domain. Here in this paper, we propose a formation of distributed clustering al gorithm based upon spatially correlated data among sensor nodes. Our propose m odel for distributed clustering algorithm which form irregular shape and size is much m ore prac tical than the pre viously proposed clustering algorithm in spatial domain. As the numbers of sensor nodes are more in the sensor field, the data correlation among the sensor nodes increases [3] and form distributed cluste rs for high density of sensor nodes in our clustering algorithm. Thus finally we form spatially correlated distributed irregular non overlapping cl usters of different sizes with high density of sensor nodes in spatial domain. More over the size of each distribute d cluster in our algorithm is based upon a threshold value given in data correlati on model [4] in spatial domain. In literature [12, 13, 14], authors propos ed Information accu racy (distortion function) mo del where base station or sink node can estimate the information accuracy for observed data sensed by all the sensor nodes. These types of model are based on one hop comm unication where observed data are sensed by all the sensor nodes and directly transmit the observed data to the sink node. But in literature [4, 15] authors proposed two hop comm unication where observed data are transmitted to the sink node via intermediate node (CH node) where the sensor field is large. Again in this paper, we consider two hop communications for our distri buted clustering algorithm based on spatial data correlation among sensor nodes in which observed da ta are transmitted to the s ink node via CH node. From literature survey, it has been noted that estimated data collected from all the sensor nodes in a cluster are directly send to CH node for aggregat ion[24,25] without verifying the accuracy. Hence it is important to verify the estimat ed data before data aggregation at CH node and then send it to the sink node. For each distributed cl uster, the data accuracy is verified using MMSE estimator [23] before data aggregation and then only transm its the most accurate data to the sink node. Thus verifying data accuracy at CH node befo re data aggregation for each distributed cluster may reduce communication overhead. It may possi ble that som e of the sensor nodes in the distributed cluster get malicious [16] due to external physical environment .In such tropical situation sensor nodes can sense and read inaccurate data. These inaccurate data transm itted by malicious nodes may cause incorrect data ag gregation at the CH node for respective clusters. Hence it is required to estimate and verify the data accuracy before d ata aggregation in the CH node for each distributed cluster to reduce data redundancy and power consumption. In this paper, we propose Data accur acy model where we use Minimum Mean Square Error (MMSE) estimation to perform data accuracy at the CH node before data aggregation [17] for each distributed cluster. Most of the work done [12, 18 ] till today is to perform MMSE estimation at each individual sensor nodes for the observed data before transmitting the estim ated data at the CH 1 node in a cluster. According to literatu re [18], once the estimated da ta is received at the CH no de transmitted by all the sensor nodes in a cluster, aver aging the estim ated data at CH node and finally transmits the m ost accurate data to the sink node. Howe ver to the best understanding of authors, this is the first time to pe rform MMSE estimation only at the CH node for all the observed data sensed by all the sensor nodes in a cluste r .In our Data accuracy model, calculating MMSE estimation only at the CH node for the observed data sensed by all the sensor nodes in a cluster can increase the data accuracy and reduce the communication ove rhead before data aggregation. Rest of the paper is given as follows. In section-2, we construct a data correlation model [4] among sensor nodes in spatial do main. Data correlation model s hows the degree of correlation coefficient for observed data among sensor nodes. Th e degree of correlation coefficient for observed data are measured by an assumed threshold value. If the correlation coefficients for observed data among sensor nodes are greater than the threshold value, observed data are spatially correlated among sensor nodes in spatial domain otherwise not. Ultim ately from th is threshold value, we get an approximated circular data corr elation range among sensor nodes. The size of this approximated circular data correlation range depends upon the thre shold value. The sensor nodes w hich fall with in this circular data correlation ra nge, the spatial data among them are highly correlated in the spatial domain. Hence the correlation coefficients for obs erved data among these sensor nodes are greater than the threshold va lue. In s ection-3, we propose a distributed clustering algorithm based upon spatial correlation for observed data among sens or nodes in the sensor field .It f orms non over- lapping irregular shape and size of different dist ributed clusters in the spatial dom ain. Once the distributed clusters are formed in the sensor field, each clus ter can perform th e data accuracy at their respective CH node and transmit the most accurate data to the sink node which is discussed in section-4. We also construct a Data accuracy m odel and com pare it with Information accuracy model with respect to data accuracy . In section 5, we perform the sim ulation and validation for our proposed distributed clustering algorithm and Data accu racy model. Finally we conclude our work in section 6. 2. Data Correlation Model in Spatial Domain In this section, we ar e interested to illustrate the spatial data correlation among sensor nodes i and j to sense or measure a tracing point [4] i S for 1 i  in a spatial domain. Tracing point is a reference value which we are interested to m eas ure and sense in the sp atial domain. For example tracing point has higher concentra tion of moisture content in an agricultural field. It has higher concentration of data with higher variation with re spect to lower v ariation of data in the spatia l domain. As the sensor node density increase s, the spatial cor rela tion of observed data (, ) ij SS among the sensor nodes i and j also increases in the spatial domai n. The sensor nodes sense and measure the tracing point over a window frame of time interval T to capture the continuous data sample with S i ={ s i1 , s i2, s i3, ……..s in } and S j ={s j1 , s j2, s j3, ……..s jn } respectively. If the tracing point sensed and measured by the sensor nodes i and j located near to each other, the data correlation is strong. The 1 According to literature [18] CH node is only a logical entity a nd can also be called as sink node depending upon applications. data correlation decreases as the sensor nodes i and j are far apart from the tracing point. The sensor nodes i and j can compute the mean of continuous data sa m ple over a window frame of time interval T . Thus the mean of continuous data samp le sensed and m easured by sensor nodes i and j are given as follows. 1 1 n SS i ik k n    and 1 1 n SS j jk k n    (1) We compute the variance of continuous sa m ple data captured by the sensor nodes i and j in spatial domain. Variance is used to measure how far a se t of continuous data sam ple of sensor nodes are spread out from each other. Thus the sensor nodes com pute the variance of sample data as follows. 1 2 () ( ) 1 1 n Var S S S ii ik k n     and 1 2 () ( ) 1 1 n Var S S S jj jk k n     (2) We compute the covariance of sample data for nodes i and j which is given as 1 (, ) ( ) ( ) 1 (1 ) n Cov S S S S S S ij i j ik jk k n      (3) Covariance is defined as a measure of how much two variable of continuous sample data change together in a spatial domain for sensor nodes i and j .We find the correlation coefficient ( ij SS  ) for spatial correlation between sample data (, ) ij SS sensed by the sensor nodes i and j which is given as () () ( ) Cov S S ij SS ij Var S Var S ij   1 () ( ) 1 (1 ) 11 22 () ( ) 11 (1 ) (1 ) n SS S S ij ik jk k n SS nn ij SS S S ij ik jk kk nn                    (4) Thus equation no. (4) shows the data correlation between the sample data among sensor nodes i and j in the spatial domain. These spatially correlated data among sensor nodes i and j can be modeled as Joint Gaussian Random Variables (JGRV) [12, 14] as follows: [] 0 ES i  , [] 0 ES j  for i=1,2,…………..n and j=1,2 ……..n 2 [] Var S ii S   , 2 [] Var S jj S   for i=1,2……….n and j=1,2,……..n 2 [, ] [, ] Cov S S Corr S S ij i ij S   22 [] [] [] [ ] C o r r SS SS ESS C o vSS i ij i ij ij ij SS     [] [] ( ) [] [] , 22 V ESS C o v S S ij ij Kd C o r r S S S S ij i j i j ii SS      (5) (.) V K is a correlation model [14]and the Eu clidian distance between the sensor nodes i and j can be represented as , || | | ij i j dS S  for the sensed data . We assume the covariance function to be non - negative and can decrease monotonically with distance , || | | ij i j dS S  ,with limiting value of 1 at 0 d  and of 0 at d  . W e adopt power exponential model [19, 20] which is given as 2 1 , (/ ) () PE Vi j d Kd e     for 1 0   , 2 (0 , 2 ]   (6 ) where 1  is called range parameter which controls the relation between th e distance among sensor nodes and the correlation coefficient. It also controls how fast the correla tion decays with distance among the sensor nodes. 2  is called a smoothness or roughness parameter which controls geometrical properties of th e random field. It cont ains exponential model for 2 1   and squared exponential model for 2 2   .From equations no. (5) and (6), we find the correlation coefficient of observed data (, ) Sx y ii i as well as (, ) j jj Sx y among the sensor nodes i and j using power exponential model as follows 2 , 1 [] ij ij d SS e          (7) We define a threshold  for 01   which determines whether the spa tial data are correlated a mong the sensor nodes in the sensor fi eld. Using the threshold value  , we show two properties for spatially correlated data among sensor nodes as follows:  If [] ij SS    , spatial data are strongly correlated among sensor nodes i and j in the spatial domain.  If [] ij SS    , spatial data are weakly correlated among sensor nodes i and j in the spatial domain. From equations no. (4 ), (6 ) and (7 ), we define the correlation coefficient [] ij SS  for the observed data using power exponen tial model am ong sensor nodes i and j where the data are strongly correlated in the spatia l domain represented as f ollows 2 , 1 2 [] [] ij ij ij i d Cov S S SS e S              (8) From equation no 8, we find the relation between the threshold values  and power exponential model represented as 2 1 ij d e          2 2 22 1 1 log ij d           (9) We compare the equation no (9) with the Euclidean distance among the coordinates of sensor nodes i and j as follows. 222 () ( ) ij i j i j dx x y y   (10) From equations no (9) and (10), we get 2 2 22 2 1 1 () ( ) l o g ij i j xx y y             (11) Comparing equation no. (11) with th e equation of a circle , we get 22 2 () ( ) ij i j x xy y r    (12) From equations no. (11) and (12 ) , we find the radius r for range of circular data correlation area denoted as () cir i around a sensor node i as a centre coordinate . 2 2 22 1 1 log r           (13) The sensor nodes j which falls under () ci r i , the observed data among sensor nodes i and j are highly correlated in the spatial dom ain. The spatial data correlation [] ij SS  among sensor nodes i and j with in () cir i are greater than the threshold value  . Equation no. (13), show s that the radius r of circular data correlation area () cir i depends upon the threshold value  , 1  and 2  in the spatial domain. W e define two properties from equation no. (13) given as follows:  For a fixed value of 1  and 2  , if the threshold  increases, the radius r of () cir i decreases exponentially.  Similarly with a fixed value of 1  and 2  , if the radius r of ( ) cir i increases, the size of ( ) cir i also get increase and the average number of distributed clusters (discuss in section-5) decreases exponentially in the sensor region. Hence we take an appropriate threshold value  to find the size of ( ) cir i where the observed data among sensor nodes are strongly corre lated in the spatial domain. In the next section, we propose a distributed clustering algorithm based upon data correlation among sensor nodes in each ( ) cir i in the sensor field. 3. Distributed Clustering Algor ithm based on Spatial Data Correlation In this section, we propose a distributed cl ustering algorithm which forms non overlapping clusters of irregular shape and size in the sensor field. If the de ployed sensor nodes increases in the sensor field, the spatial data correlation among the sensor nodes increases. Based upon the spatial data correlation among the sensor nodes for each ( ) ci r i in the sensor field, we construct the distributed clustering algorithm. Notations used in the clustering algorithm: M  Total number of sensor nodes in deployed sensor field i  Represents each senso r node where iM  () id i =Represents identification number of each sensor node i () ci r i =Range of data correlation area which is approx im ated by a circular area around each sensor node i as centre r =Radius of data co rrelation range area () cir i () Gi =A group of neighboring sensor nodes j which is a subset of () cir i of node i as centre max ( ) NodeG i =Maximum number of sensor nodes j in () Gi of () ci r i max ( ) DisG i =Maximum Euclidian distance betw een the farthest node j from node i as a centre in max ( ) Node G i min ( ) SizeG i =Minimum size of ma x ( ) DisG i [] ij SS  =Spatial data correlation coefficient between nodes i and j   Threshold value W  Set of ( ) id i which doesn’t form cluster W = Set of ( ) id i which form cluster Distributed Clust ering Algorithm Step 1: Start Step 2: Initially {} WM  where () id i M  for 1 , 2, ............. .. iM  and {} W   Step 3: For each i , () { : (, ) , } Gi j d i j r i j   where (, ) di j is the Euclidian distance between i and j Step 4: if () Gi  () cir i ,then [] ij SS  is strongly correlated and [] ij SS    Step 5: Compute () Gi for ( ) cir i of each sensor node i Step 6: Check for max ( ) Node G i in each ( ) cir i in the sensor field If more than one same max ( ) Node G i in the sensor field { Compute max ( ) DisG i for all max ( ) Node G i Compute min ( ) SizeG i among max ( ) DisG i min ( ) SizeG i form the cluster among ma x ( ) Nod eG i with CH node as i and add each () id i in W } else max ( ) Node G i for m the cluster with CH node i and add each () id i in W Step 7: Repeat Step 6 until {} W  and {} WM  where each () id i M  for all 1 , 2.......... iM  where () Gi  () Gj =  in the sensor field Step 8: Stop We consider a r ectangular sensor field where M sensor nodes are randomly deployed. We assume that every sensor nodes knows the position of the coordinates of all sensor nodes in the sensor field like MTE routing [6] to simplify the deployment topology. In the previous section, we clarify that for a threshold value  , we get the radius r of ( ) ci r i for each sensor node i in the sensor field. Hence we fix a threshold value  for which we get radius r of an appropriate size of ( ) cir i for each sensor node i in the sensor region. This means that each sensor node i perform the data correlation with the neighbor ing [26, 27] sensor nodes j to form () Gi within the data correlation range area () cir i . () cir i is approximated by a circ ular area around the node i with radius of data correlation range r . () Gi includes the node i itself as the centre of () cir i and the neighboring nodes j which fall under the data correlation range of area () cir i with radius r . Thus () Gi for data correlation among the sensor nodes i and j with in () cir i can be given as () { : (, ) , } Gi j di j r i j   (14) Where (, ) di j is the Euclidian distance between sensor nodes i and j . The spatial data correlation for () Gi of () cir i are partially or fully overlapped with () Gj of () j cir in the sensor field. Thus overlapping of many data correlatio n range area occurs in the sensor field. Overlapping of spatial data correlation for () cir i and () j cir can share the same correlated overlapping of data among () Gi and () Gj . Thus overlapping of same correlated data is like utilizing the same resource [4] among the () Gi and () Gj in the data correlation range areas () cir i and () j cir . Hence it increases the data redundancy among () cir i and () j cir . Hence a distributed clustering algorithm is proposed to overcome the overlapping problem of spatially correlated data am ong () Gi and () Gj . Thus the distributed clustering formation c onsists of the following phases: Phase-I: Each sensor node i has its node identification number () id i for {} WM  where () id i M  , 1 , 2, ............... iM  and {} W  . For each sensor node i , () id i M  which participate to form cluster in later phase, leaves from the array W and add to an array W . W is an array which signifies that each sensor node i of ( ) id i participate to form cluster. Phase-II: Each sensor node i computes () Gi with in the data correlation range area ( ) ci r i with radius r and satisfies the equation no. (14). Phase-III: Check for each sensor node i having max ( ) Node G i of ( ) cir i in the sensor field which forms the first cluster in the sensor region. Sensor node i form the CH node of max ( ) Node G i in ( ) cir i .Hence max ( ) Node G i forms the cluster in ( ) cir i leaves from the array W and added to array W . Phase-IV: If there are more than one same max ( ) Node G i of ( ) ci r i in the sensor field, then there is a big question that which max ( ) Node G i forms the cluster. This problem can be resolve in two steps:  Firstly we compute max ( ) DisG i for all max ( ) Node G i of () cir i in the sensor field.  Secondly we find the mi n ( ) SizeG i among max ( ) DisG i for all max ( ) Node G i of () cir i in the sensor field. We calculate min ( ) SizeG i among max ( ) DisG i because the data correlation among closer n odes for min ( ) SizeG i are strong to form cluster. Hence min ( ) SizeG i forms the cluster among ma x ( ) DisG i with sensor node i as CH node and add each () id i in array W . Phase-V: Repeat Phases-III & IV until {} W  and {} WM  where () id i M  for all 1 , 2.......... iM  . Finally all the sensor nodes participate to form non overlapping distributed clusters with () Gi  () Gj =  in the sensor field. Therefore we construct a non overlapping distri buted clustering algorithm in this section based upon spatial data correlation among sensor nodes in the sensor field. In th e next section we are interested to find the data accuracy estimation for each distributed cluster and send the most accurate data to the sink node. 4. Distributed Cluster-based Data Accuracy Model In the previous section, we develop a non-overl apping distributed clus tering algorithm with irregular shape and size in the sensor field based upon data correlation among sensor nodes. We assume each distributed cluster can sense and m easure a single tracing point of same event and perform the data accuracy for the m easured data at the CH node for the respective cluster. F inally CH node of each distributed cluster transmits the m ost accurate data to the sink node in the sensor region. Each distributed cluster has different set of sensor nodes to perform the data accuracy. The data accuracy is perform to verify the estimated data received at the CH node from all the sensor nodes for a cluster are most accurate and doesn’t c ontain any redundant data in it. It m ay reduce the communication overhead. For the simplest analysis of our propose Data accuracy model, we choose a single cluster of M sensor nodes. Cluster with M sensor nodes can sense a single tr acing point and check the data accuracy at the CH node before data aggregation and then transmit the most accurate data to the sink node. Here we demonstrate the mathematical analys is of data accuracy for a single cluster with M sensor nodes. Each sensor node i can observe and measure the physically sensed data S i for the tracing point value S with observation noise i n for the cluster. Theref ore the observation done by the sensor node i in a cluster is illustrated as x sn ii i   where iM  (15) The sensor node i sense the observation sample data i x and transmits i x to the CH node sharing wireless Additive W hite Gaussian Noise(AW GN) channel [12 ,21 ] wher e i n is independent of each other and modeled as Gaussian Random Variable of zero m ean and variance 2 n  . Thus the observation sample data i x passes through AWGN channel to th e CH node for the cluster which reconstructs estimation ˆ S of the tracing point S . The CH node receive all M observation sample for the cluster given by X AZ N   (16) where X is a 1 M  data vector for observation done by M sensor nodes in a cluster , Z is a (1 ) 1 M  random vector for physically sensed data i S for iM  including the point event S where we estimate for N (0 , ) Z C , A is a known (1 ) M M   matrix and N is a 1 M  noise vector for the observed data of M sensor nodes with N (0 , ) Z C .The random vector Z with zero mean and covariance N (0, ) Z C can be shown as follows : 1 2 [] 0 [] 0 [] 0 [] . . . . 0 [] M z Es Es Es EZ Es                       and 11 1 1 2 1 22 1 2 2 2 12 [, ] [ ] [ ] . . [ ] [] [ ] [ ] . . [ ] [] [ ] [ ] . . [ ] [] .. . . . . .. . . . . [] . . . . [ ] M M MM M M T z Es s Es s Es s Es s Es s Es s Es s Es s Ess Es s Ess Ess CE Z Z E ss E ss                      Thus the covariance m atrix is 2 11 1 1 1 T M zs MM M R C RB           (17) where 1 2 , , 1 , . . M s s s s M s s R              11 1 2 1 21 2 2 2 12 ,, , ,, , , ,, , . . .. . . . M M ij MM M M ss ss ss ss ss ss MM s s ss ss ss B                        In Z C matrix, 1 M R  = , i SS  gives the correlation coefficient between i S , S respectively and M M B  = , ij SS  gives the correlation coefficient between i S , j S respectively. Now the power exponential model [19, 20] can be used for co rrelation model to show the relation between i S and S as well as i S and j S . Thus we get , i SS  =  2 / ,1 d Si e    and  2 / ,1 d Si SS ij e      in the covariance m atrix Z C . CH node collects all the observations from M sensor nodes in the cluster to find the estimate of ˆ S from ˆ i S . If the observed data X can be modeled by Ba yesian Linear Model [22 ] for all sensor nodes in cluster , the MMSE estimator to estimate the tracing point at the CH node in a cluster is given as :  1 2 ˆ ˆ ˆ ˆ | . ˆ M s s ZE Z X s s                21 ˆ () TT ZZ N M M Z CA A CA I X     1 2 2 ˆ T N MM S R Z BI X B            (18) The measurement of perform ance for the MMSE estimat or at the CH node for the cluster is given as the error ˆ () SS   with mean zero and covariance m atrix illustrated as ] ˆˆ [( )( ) T EZ Z Z Z  21 () TT Z ZZ N M M Z CC A A C A I A C      1 2 22 2 1 T T N SS M M S R R BI RB B                 ( R B ) (19) From equation no.(18), we get th e estimation of tracing point ˆ () S at the CH node in a cluster given as 1 2 2 ˆ T N MM S SR B I X         (20) We find the distortion factor between S and ˆ S to perform data accuracy at the CH node for a cluster. From equation no (19), we get the distortion factor as 2 [( ) ] D ES S  1 2 22 2 T N SS M M S DR B I R           (21) We normalize the distortion factor a nd calculated the data accuracy for M sensor nodes for a cluster as 2 () 1 A S D DM   1 2 2 () T N AM M S DM R B I R         where 1 2 2 N MM S BI R           () T A DM R   (22) () A DM calculated at the CH node for each distributed cl uster is performed before data aggregation and finally send the most appropriate data to th e sink node. Hence the purpose of verifying the data accuracy () A DM at CH node for each distributed cluster is to confirm that the most accurate data transmitted by M sensor node can aggregate rather than aggregating all the redundant data at the CH node. Once we perform the estimation to calculate the data accuracy T R  at the CH node for each distributed cluster, the most precise data ge t aggregated and finally send to the sink node. The information accuracy model proposed in li terature [18] shows that at first each sensor nodes i can calculate the MMSE estimate ˆ i S for observed data and then transmits the estimated data ˆ S i to the CH node i.e. ˆ S i in order to find ˆ S . Finally averaging all ˆ i S at the CH node for the cluster for M sensor nodes to get ˆ S .But in Data accuracy model () A DM , at first we collect all the observed data from M sensor nodes and then only perform the MMSE estimation at the CH node for each distributed cluster. It is bette r to perform the MMSE estim ation onl y at the CH node rather than performing the MMSE estimation at individual node s and then averaging it at the CH node for distributed cluster. We perform the MMSE at th e CH node as it is the only central authority for each distributed cluster and it knows th e activities of cluster members. 5. Simulation and Validation Data correlation model discussed in section- 2, shows that the spatial correlation for observed data ( S i and S j ) among sensor nodes i and j . Spatial correlations for observed data are strong when it is greater than some threshold value  . So we fixed a threshold value  for 01   . Above this threshold value  , spatial data are strongly correlated among sensor nodes i and j . Depending upon the threshold value  ,we get a radius r for each sensor node i to perform data correlation with neighboring node j ,approximated by a circular data correlation range area () cir i around each node i . This means with in the range of () cir i for node i with data correlation radius of r , data are strongly correlated with other nodes j . So in the first simulation setup, we clarif y the relation between  and r . In the Fig. 1(a), we plot the relation between th reshold values  and the corresponding size of da ta correlati on radius r for node i of ( ) cir i . If the threshold value  increases for 01   with 1  =70 and 2  =1, the radius r of () cir i decreases exponentially. In Fig. 1(b), we show the sizes of data correlation radius r for node i of ( ) cir i and the average number of clusters based on spatial correlation for threshold values  =0.5. If r of ( ) cir i increases, the size of () cir i increases for 1  =70, 2  =1 and the average number of distri buted clusters based on spatial correlation (discussed in secti on -3) decreases exponentially. In the second simulation set up, we have a sensor field of 22 mm  grid based sensor topology with CH node on one of the corner edge and a fixe d tracing point located in the center as given in Fig .2 according to literature [ 18]. We deployed thirty four sens or nodes and a CH node which forms a cluster in grid based sensor topology. We are interested to demonstrate the data accuracy with respect to the number of sensor nodes. We set th e same sensor field topol ogy (Fig.2) as given in literature [18] where the position of sensing nodes are located at point (6,2),(8,4),(6,4 ),(4,4) and the tracing point at (6,4). For these four jo intly sensing nodes, the inform ation accuracy () I M in literature [18] is 0.7469 and our result for data accuracy () A DM is 0.7545. This shows our propose Data accuracy model () A DM give more accurate data than the Inform ation accuracy model () I M proposed in literature [18] for the same sensor node s with same topology. More over, if we introduce a fifth sensing node located at ( 10, 4), the information accuracy () I M is 0.7462. This clarifies that introduce of a fifth node which is far away from the tracing point dominates its observation results and decreases the information accuracy. But in our Data accuracy m odel () A DM ,the introduce of fifth node may increase the data accuracy of 0.7665. Hence introdu ce of a new sensor nodes in the sensor field increase data accuracy in our propose () A DM . Fig.3 shows that th e results for data accuracy () A DM is always greater than infor mation accuracy () I M as we keep increasing the number of sensor nodes for 1  =70 and 2  =1. Thus our propose Data accuracy model give more accurate data and better performan ce than the Inform ation accuracy model with resp ect to number of sensor nodes in a cluster. 0.3 0.35 0. 4 0.45 0.5 0. 55 0.6 0.65 0. 7 0. 75 0 10 20 30 40 50 60 70 80 T hes hold V al ue S ize of c ir( i) ra dius in m etre 0 10 20 30 40 50 60 70 80 0 5 10 15 20 25 S iz e of c ir(i ) radius in m et re A v erage Num ber of Clusters Moreover, if we continuously increase the num ber of sensor nodes in the sensor field, the data accuracy remains approximately sam e . Fig. 3 shows fifteen to twenty sensor nodes are sufficient to perform the same data accuracy le vel which we achieve for thirty four sensor nodes. Hence we can reduce the number of sensor nodes in a cluster with respect to data accuracy. It is unnecessary to deploy thirty four sensor nodes beyond this upper bound because fifteen to twenty sensor nodes are sufficient to give approximately the s ame data accur acy level achieve in the cluster. Hence fifteen to twenty sensor nodes perform the data accuracy at the CH node for the cluster and transmit the accurate data to the sink node. Thus fifteen to twenty sensor nodes (optimal cluster) perform the communication process and rest of the sensor nodes goes to sleep mode in the cluster .Reducing the number of sensor nodes to fifteen to twenty sensor nodes instead of deploying thirty four sensor nodes can reduce communication overhead as well as energy consumption in a cluster.  0 2 4 6 8 10 12 8 6 4 2 0 Fig. 2 . Wireless sensor network topology: means sensor node , means CH node ,  means tracing point Fig.1 (a) Threshold value versus size of ( ) cir i radius Fig.1(b) Size of ( ) cir i radius versus average number of clusters 0 5 10 15 20 25 30 35 0. 78 0. 8 0. 82 0. 84 0. 86 0. 88 0. 9 0. 92 0. 94 0. 96 0. 98 Num ber of S ens or Nodes D a ta Accu r a cy Informat ion A c c ura c y M o del D ata Accu r a cy Mo d el In the third simulation setup, we have fixed number of sensor nodes ( 4 M  ), which forms a single cluster to sense and measure a single tracing point. We place four sensor nodes in a deployed circular cluster and set a tracing poi n t at the central co-o rdinate of deployed circular cluster as shown in Fig.4 (a). Since we have fixed number of sensor nodes, we vary the distance from M number of sensor nodes from the tracing point S in a deployed circular topology. As we increase the radius of the deployed circular cluster with sa me proportion from the tracing point S as a centre, data accuracy decreases for the value 1  =70 and 2  =1. We compare our resu lts from the results derived in literature [18] a nd conclude that as the radius of the deployed circular cluster in creases with sam e proportion ,our Data accuracy model () A DM always show better performance than Inform ation accuracy model () I M with decreasing data accuracy as given in Fig.4(b). In the fourth simulation set up, we have de ployed thirty sensor nodes ra ndom ly in a sensor field of 100 100 mm  based sensor topology. Each sensor node i has a data correlation radius r = 23.5432 of () cir i for an assumed threshold value  = 0.5. The neighboring sensor nodes j which falls under the circular data correlation range ( ) cir i around each node i as a centre, the spatial observed data ( i S and j S ) are strongly correlated among them for which it is greater than equal to a threshold value( 0.5   ).Using this data correlation radius r of each sensor node i for ( ) ci r i ,we have developed a distributed clusteri ng algorithm based on spatial data correlation among sensor nodes i and j as discuss in section-3. The sensor nodes fo rm distributed non-overlapping clusters with irregular shape and size. Each distributed clus ter can sense and m easure a single tracing point located randomly with in the cluste r. In a prac tical scenario, signal and noise variance of observed data changes with different loca tion in the sens or field. For exam ple the temperature variation changes from place to place in a tropical dense fore st. Thus we adopt slight ly different signal and noise variance of observed data for e ach distributed cluster in the sens or field. Once each distributed cluster can measure the obser ved data for tracing point S , it calculates the data accuracy at the CH node for the respective cluster and finally transm its the most appropriate data to the sink node. Table.1 shows the comparison between Information accuracy m odel () I M and Data accuracy model Fig.3 . Number of sensor nodes versus data accuracy () D M A with respect to data accuracy for our pr oposed distributed clustering algorithm for each cluster. Each distributed clus ter has its associate nodes along with a CH node where the data accuracy is performed. From Table.1, we can conclude th at the result for our () D M A gives more degree of data accuracy than () I M implemented in our clustering algorithm for each distributed cluster. 1 1. 2 1. 4 1. 6 1.8 2 2. 2 2. 4 2. 6 2. 8 3 0. 935 0. 94 0. 945 0. 95 0. 955 0. 96 0. 965 0. 97 Radi us of Depel oy ed Circ l e D a t a Accu r a cy I nf or ma tio n Accu ra cy Mod el D ata Accur acy Mo de l Cluster Number Cluster head Node ID Associated Nodes ID in Cluster Information Accuracy () I M Data Accuracy () D M A 1 29 1,3,5,9,14,23,24,25,30 0.8909 0.9748 2 7 4,12,19,20 0.8701 0.9462 3 26 6,8,15,16 0.8393 0.9541 4 13 2.10.21.22 0.8509 0.9660 5 11 18,27,28 0.9095 0.9701 6 17 - 0.9476 0.9476 N1 CH N3 N2 S Fig. 4(a) Circ ular cluster topology with deployed nodes Fig. 4(b) Radius of circular cluster versus Data accuracy Table 1. Data accuracy f or each distributed cluster 6. Conclusions We conclude in this paper that a non overlapping distributed clustering algorithm based upon data correlation among sensor nodes is proposed which reduces the data redundancy in the wireless sensor networks. We perform data accuracy for each distributed cluster at their resp ective CH node based on spatial co rrelati on of data which shows that our prop ose Data accuracy m odel collects more accurate data and give better performance than Information accuracy m odel. Moreover our simulation results shows there exist an optimal cluste r which is sufficient to perf orm approximately the same data accuracy level achieve by a cluster. In a cluster, the optimal cluster can perform the data accuracy at the CH node and rest of the sensor nodes goes in sleep mode. Thus it may reduce the communication overhead, energy consumption a nd in crease the life time of distributed sensor networks. References 1. I.F Akyuildz ,W.Su, Y.Sankarasubramanian and E. Cayirci,“A Survey on Sensor Networks ”, IEEE Communications Magazine ,vol.40,pp.102-114.Aug 2002. 2. S.S. Pradhan ,K.Ramchandran ,“ Distributed Source Coding : Symme tric Rates and Applications to Sensor Networks”, in the proceedings of the data compre ssions conference ,pp.363-372, 2002. 3. C.Zhang ,B.Wang ,S.Fang,Z Li ,“Clustering Al gorithm for Wireless Sensor Netw orks using Spatial Data Correlation ”, Proceedings of IE EE International Confer ence on Information and Automation ,pp.53-58,June 2008. 4. Jyotirmoy Karjee , H.S Jam adagni ,“ Data Accuracy Estimation f or Spatially Correlated Data in Wireless Sensor Networks under Distributed Clustering ”, Journal of Networks,vol- 6,no.7,pp1072-1083,July 2011. 5. A.Abbasi and M.Younis ,“A Survey on Clustering Algorithms for Wireless Sensor Networks ”, Computer Communications ,vol-30,no.14-15,pp2826-2841,2007. 6. W.B Heinzelman , Anantha P. Chandrakasan , “ An Application Specif ic Protocol Architecture for Wireless Microsensor Networks”, IEEE tran sactions on W ireless Communications , vol-1 , no. 4 , pp. 660-670,Oct 2002. 7. L. Guo , F chen, Z Dai , Z. Liu,” Wireless Se nsor Network Cluster Head Selection Algorithm based on Neural Networks ”, International Conference on Machine vision and human machine Interference , pp.258-260,2010. 8. Georgios Smaragdakis ,Ibrahim Matta ,Azer Best avros, “SEP: A stable Election Protocol for Cluster Heterogeneous Wireless Sensor Networks”. 9. Chongqing Zhang ,Binguo Wang , Sheng Fang , Jiye Zheng, “ Spatial Data Correlation Based Clustering Algorithms for Wireless Sensor N etworks”, The 3 rd International Conference on Innovative Computing Informa tion and Control (ICICIC’08). 10. Zhikui chen , Song Yang , Liang Li and Zhijia ng Xie ,“ A clustering Approximation Mechanism based on Data Spatial Correlation in Wirele ss Sensor Networks”, Proceedings of the 9 th International Conference on Wireless Telecomm unication Symposium 2010. 11. Ali Dabirmoghaddam ,Majid Ghaderi ,Carey W illiamson, “ Energy Efficient Clustering in Wireless sensor Networks with Spatially Correlated Data”, IEEE Infocom 2010 proceedings. 12. Kang Cai, Gang Wei and Huifang Li, “Information Accuracy versus Jointly Sensing Nodes in Wireless Sensor Networks ”,I EEE Asia Pacific Conference on Circuit and System ,pp.1050- 1053,2008. 13. M.Gastpar ,M.Vetterli , “Source Channel Communi cations in Sensor Networks”, Second International Workshop on Information Proce ssing in Sensor Ne tworks ,(IPSN’2003). 14. Varun M.C,Akan O.B and I.F Akyildz, “Spa tio –Temporal Correlation : Theory and Applications in Wireless Sensor Networks”, Co mputer Network Journal (Elsevier Science), vol. 45,pp.245-259,june 2004. 15. Jyotirmoy Karjee,H.S Jam adagni, “ Data Accur acy Estimation for Cluster with Spatially Correlated Data in Wireless Sensor Networks”, IEEE International Conference on Inform ation System and Computational Intelligence , vol-3,pp.28-291,Harbin , China, 2011. 16. Jyotirmoy karjee, Sudipto Bane rjee, “Tracing the Abnormal Beha vior of Malicious Nodes in MANET ”, Fourth International Conference on Wireless Communications, Networking and Mobile Computing, pp.1-7, Dalian , China ,2008. 17. T.Minming ,N Jieru, W Hu, Liu Xiaowen ,“A da ta Aggregation Model for underground Wireless Sensor Network”, WRI World Congress on Comput er Science and Information Engineering, vol -1,pp.344-348,2009. 18. Huifang Li, Shengming Jiang , Gang W ei, “Infor mation Accuracy Aware Jointly Sensing Nodes Selection in Wireless Sensor Networks”, LNCS 4325.pp. 736-747, MSN Springer -2006. 19. J.O Berger, V.de Oliviera and B.Sanso , “ Objec tive Bayesian Analysis of Spatially Correla ted Data ”, Journal of Am. Statisti cs. Assoc., vol-96, pp.161-1374, 2001. 20. De Oliveria V, Kedan B and Short D.A ,“Bayesian Predication of Transformed Gaussian Random Fields ”, Journal of American Statistical Association , pp. 1422-1433,1992. 21. T.J Goblick, “Theoretical Limitation s on the tr ansmission of data from Analog Sources”. IEEE Transaction Theory, IT-11(4), pp.558-567,1965. 22. Steven M. Kay, “Fundamentals of Statistical Si gnal Processing –Estim ation Theory”, Pearson, Volume-1,2010. 23. V.Poor ,“An Introduction to Signal Detection an d Estim ation ”,Second Edition , Springer , Berlin 1994. 24. C.Y. Cho, C.L Lin , Y.H Hsia o, J S Wang , K.C Yong , “ Data Aggregation with Spatially Correlated Grouping Techniques on Cluste r Based W SNs”, SENSOR COMM, pp-584-589, Venice -2010. 25. Shirshu Varma , Uma Shankar Tiwary, “ Data Aggregation in Cluster Based Wireless Sensor Networks”, Proceedings of the first Internat ional Conference on In telligent Human Com puter Interaction , pp.391-400, part-5,2009. 26. S.Soro,Wedi B. Heinzelm an ,“Cluster Head Elec tion Techniques for coverage Preservation in Wireless Sensor Networks”, Adhoc Networks, Elsevier, pp-955-972,2009. 27. D.Tian , N.Georganas ,“A Node Scheduling Sche me foe Energy Conservation in large Wireless Sensor Networks ”,Wireless Communications and Mobile Computing Journal,3(2):271-290, March 2003.

Data Accuracy Model for Distributed Clustering Algorithm based on Spatial Data Correlation in Wireless Sensor Networks

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment