Optimal ordering of transmissions for computing Boolean threhold functions

We address a sequential decision problem that arises in the computation of symmetric Boolean functions of distributed data. We consider a collocated network, where each node's transmissions can be heard by every other node. Each node has a Boolean me…

Authors: Hemant Kowshik, P. R. Kumar

Most sensor network applications are typically interested only in computing some relevant function of the correlated data at distributed sensors. For example, one might want to compute the mean temperature for environmental monitoring, or the maximum temperature in fire alarm systems. On the other hand, sensor nodes are severely limited in terms of power and bandwidth, and are generating enormous quantities of data. Thus, we seek efficient in-network computation and communication strategies for the function of interest. Computing and communicating functions of distributed data presents several challenges. On the one hand, the wireless medium being a broadcast medium, nodes have to deal with interference from other transmissions. On the other hand, nodes can exploit these overheard transmissions, and the structure of the function to be computed, to achieve a more efficient description of their own data. Moreover, the strategy for computation may benefit from interactive information exchange between nodes. We consider a collocated network where each node's transmissions can be heard by every other node. At most one node is allowed to transmit successfully at any time. Each node has a Boolean variable and we focus on the specific problem of symmetric Boolean function computation. We adopt a deterministic formulation of the problem of function computation, allowing zero error. We suppose that node measurements are independent and distributed according to given marginal Bernoulli distributions. In this paper, we focus on optimal strategies for Boolean threshold functions, which are equal to 1 if and only if the number of nodes with measurement 1 is greater than a certain threshold. The set of admissible strategies includes all interactive strategies, where a node may exchange several messages with other nodes. In the case where each node has a single bit, the communication problem is rendered trivial, since it is optimal for the transmitting node to simply indicate its bit value. Thus, it only remains to determine the optimal ordering of nodes' transmissions so as to minimize the expected number of bits exchanged. For the class of Boolean threshold functions, we present a simple indexing policy for ordering the transmissions and prove its optimality. The optimal policy is dynamic, possibly depending on the previously transmitted bits. Further, the optimal policy depends only on the ordering of the marginal probabilities, but surprisingly not on their values. The problem of optimally ordering transmissions of nodes is a sequential decision problem and can indeed be solved by dynamic programming. However, this would require solving the dynamic program for all thresholds and all probability distributions, which is computationally hard. We avoid this, and establish a more insightful solution, in the form of a simple rule defining the optimal policy. In Section III, we formulate the problem of single instance computation, and derive the resulting dynamic programming equation. We then propose the indexing policy and present a detailed proof of optimality, by induction on the number of nodes in the network. In Section IV, we consider the extension to the case of block computation, where each node has a block of measurements and we are allowed block coding. This problem is significantly harder, and we conjecture the structure of an optimal multi-round policy, building on the optimal policy for single instance computation. The the problem of worst-case block function computation with zero error was formulated in [1]. The authors identify two classes of symmetric functions namely type-sensitive functions exemplified by Mean, Median and Mode, and typethreshold functions, exemplified by Maximum and Minimum. The maximum rates for computation of type-sensitive and type-threshold functions in random planar networks are shown to be Θ( 1 log n ) and Θ( 1 log log n ) respectively, where n is the number of nodes. If we impose a probability distribution on the node measurements, one can show that the average case complexity of computing type-threshold functions is Θ(1) [2]. In this paper, we require that every node must compute the function. This approach naturally allows the use of tools from communication complexity [3]. In communication complexity [3], we seek to find the minimum number of bits that must be exchanged between two nodes to achieve worst-case zeroerror computation of a function of the node variables. The problem of worst-case Boolean function computation was first considered in [4], where the complexity of the Boolean AND function was shown to be log 2 3 bits. In [5], this was considerably generalized to derive the exact complexity of computing Boolean threshold functions. If the measurements are drawn from some joint probability distribution and one is allowed block computation, we arrive at a distributed source coding problem with a fidelity criterion that is function-dependent, for which little is known. One special case, a source coding problem for function computation with side information, has been studied in [6]. The problem of interactive function computation in collocated networks has been studied in [7]. The problem of minimizing the depth of decision trees for Boolean threshold queries is considered in [8]. In [9], an interesting problem in sequential decision making is studied, where, n nodes have i.i.d. measurements, and a central agent wishes to know the identities of the nodes with the k largest values. One is allowed questions of the type "Is X ≥ t", to which the central agent receives the list of all nodes which satisfy the condition. Under this framework, the optimal recursive strategy of querying the nodes is found. A key difference in our formulation is that we are only allowed to query particular nodes, and not all nodes at once. Consider a collocated network with nodes 1 through n, where each node i has a Boolean measurement X i ∈ {0, 1}. The X i s are independent of each other and drawn from a Bernoulli distribution with P(X i = 1) =: p i . Without loss of generality, we assume that We address the following problem. Every node wants to compute the same function f (X 1 , X 2 , . . . , X n ) of the measurements. We seek to find communication schemes which achieve correct function computation at each node, with minimum expected total number of bits exchanged. Throughout this paper, we consider the broadcast scenario where each node's transmission can be heard by every other node. We also suppose that collisions do not convey information thus restricting ourselves to collision-free strategies as in [1]. This means that for the k th bit b k , the identity of the transmitting node T k depends only on previously broadcast bits b 1 , b 2 , . . . , b k-1 , while the value of the bit it sends can depend arbitrarily on all previous broadcast bits as well as its own measurements X T k . First, we note that since each node has exactly one bit of information, it is optimal to set b k = X T k . Indeed, for any other choice b ′ k = g(b 1 , . . . , b k-1 , X T k ), the remaining nodes can reconstruct b ′ k since they already know b i , . . . , b k-1 . Thus the only freedom available is in choosing the transmitting node T k as a function of b 1 , b 2 , . . . , b k-1 , for otherwise the transmission itself could be avoided. We call this the ordering problem. Thus, by definition, the order can dynamically depend on the previous broadcast bits. In this paper, we address the ordering problem for a class of Boolean functions, namely threshold functions. Notation: The set of measurements of nodes 1 through n is denoted by (X 1 , X 2 , . . . , X n ) which is abbreviated as X n . In the sequel, we will use X n -i to denote the set of measurements (X 1 , . . . , X i-1 , X i+1 , . . . , X n ). As a natural extension, we use X n -(i, j) to denote the set of measurements (X 1 , . . . , X i-1 , X i+1 , . . . , X j-1 , X j+1 , . . . , X n ), where i < j. Given a function Π n-k (X n ), the ordering problem can indeed be solved using dynamic programming. Let C(Π n-k (X n ) denote the minimum expected number of bits required to compute Π n-k (X n ). The dynamic programming equation is However solving this equation is computationally complex. Further, it is unclear at the outset if the optimal strategy will depend only on the ordering of the p i s, or their particular values. This makes the explicit solution of (III-A) for all n, k and (p 1 , p 2 , . . . p n ) notoriously hard. We present a very simple characterization of the optimal strategy for each n and 0 ≤ k ≤ n -1 and show that this is independent of the particular values of the p i s, but only depends on the ordering. To begin with, we argue that solving the ordering problem for Boolean threshold functions, is equivalent to solving the following problem for each n and k: In the optimal strategy for computing Π n-k (X 1 , X 2 , . . . X n ) determine which node must transmit first. Indeed, if T (1) is the first node to transmit under the optimal strategy, then, depending on whether X T (1) = 0 or X T (1) = 1, the rest of the nodes would need to compute ). Since we solved the problem for all n and k, we can determine which node should transmit next in either case. Theorem 1: In order to compute the Boolean threshold function Π n-k (X n ), it is optimal for node k + 1 to transmit first. This result is true for all n and all 0 ≤ k ≤ n -1 and all probability distributions with p 1 ≤ p 2 ≤ . . . ≤ p n . Proof: Define C(Π n-k (X n )) := C(Π n-k (X n ))-1 for notational convenience. We also define the following expressions. T m,k,i is the difference between the expected number of bits when node k + 1 transmits first, and the expected number of bits when node i transmits first. We do not yet have an interpretation for S (1) m,k,i and S (2) m,k,i . However, we will use these expressions in the sequel. We establish the above theorem by induction on the number of nodes n. However, we need to load the induction hypothesis. Consider the following induction hypothesis. Observe that part (a) immediately establishes that k + 1 should transmit first in the optimal strategy for computing the function Π m-k (X m ). The basis step for m = 1, k = 1 is trivially true. Let us suppose the induction hypothesis is true for all m ≤ n. We now proceed to prove the hypothesis for m = n + 1. Lemma 1: For fixed k and i ≥ k + 2, we have S (1) n+1,k,i (X n+1 ) ≤ 0. Proof: See Appendix A. Lemma 2: For fixed k and i ≤ k, we have S (2) Lemmas 1 and 2 establish the induction step for parts (b) and (c) of the induction hypothesis. We now proceed to show the induction step for part (a). Lemma 3: For fixed k and i ≥ k + 2, we have Using Lemmas 3 and 4 together with Lemmas 1 and 2, we see that T n+1,k,i (X n+1 ) ≤ 0 for all 0 ≤ k ≤ n and i = k + 1. For the case i = k + 1, we have T (n + 1, k, k + 1) = 0 trivially. This completes the induction step for part (a), and the proof of the Theorem. ✷ IV. OPTIMAL ORDERING FOR BLOCK COMPUTATION We now shift attention to the case where we allow for nodes to accumulate a block of N measurements, and thus achieve improved efficiency by using block codes. We consider the class of all interactive strategies for computation, where the kth bit can depend arbitrarily on all previously broadcast bits. We require that all nodes compute the function with zero error for the block. We present a conjecture for the optimal strategy based on the insight gained from the single instance solution. Conjecture 1: In order to compute the Boolean threshold function Π n-k (X n ), it is optimal for node k + 1 to transmit first, using the Huffman code. This result is true for all n and all 0 ≤ k ≤ n -1 and all probability distributions with p 1 ≤ p 2 ≤ . . . ≤ p n . Observe that after node k + 1 transmits, we are left with two block computation problems. For the instances where X k+1 = 0, we need to compute Π n-k (X n -(k+1) ) and for the instances where X k+1 = 1, we need to compute Π n-k-1 (X n -(k+1) ). Thus the conjectured strategy can be recursively applied, yielding an interactive multi-round strategy. However, proving the optimality of this strategy is significantly harder. For worst case block computation, the lower bound is established using fooling sets [5]. Adapting this idea to the probabilistic scenario remains an interesting challenge for the future. We have considered a sequential decision problem, that arises in the context of optimal computation of Boolean threshold functions in collocated networks. For single instance computation, we show that the optimal strategy has an elegant structure, which depends only on the ordering of the marginal probabilities, and not on their exact values. The extension to the case of block computation is harder and remains a challenge for the future. It is also interesting to extend this result to the case of correlated measurements

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment