On Practical Algorithms for Entropy Estimation and the Improved Sample Complexity of Compressed Counting

The problem of "scaling up for high dimensional data and high speed data streams" is among the "ten challenging problems in data mining research" [39]. This paper is devoted to estimating entropy of data streams. Mining data streams [20,4,1,32] in (e.g.,) 100 TB scale databases has become an important area of research, e.g., [10,1], as network data can easily reach that scale [39]. Search engines are a typical source of data streams [4]. Consider the Turnstile stream model [32]. The input stream a t = (i t , I t ), i t ∈ [1, D] arriving sequentially describes the underlying signal A, meaning where the increment I t can be either positive (insertion) or negative (deletion). Restricting A t [i] ≥ 0 results in the strict-Turnstile model, which suffices for describing almost all natural phenomena [32]. This study focuses on the relaxed strict-Turnstile model and studies efficient algorithms for estimating the αth frequency moment of data streams ( We are particularly interested in the case of α → 1, which is very important for estimating Shannon entropy. The relaxed strict-Turnstile model only requires A t [i] ≥ 0 at the time t one cares about (e.g., the end of streams); and hence it is considerably more flexible than the strict-Turnstile model. A very useful (e.g., in Web and networks [12,25,40,30] and neural comptutations [33]) summary statistic is the Shannon entropy . Various generalizations of the Shannon entropy have been proposed. The Rényi entropy [34], denoted by H α , and the Tsallis entropy [19,36], denoted by T α , are respectively defined as , As α → 1, both Rényi entropy and Tsallis entropy converge to Shannon entropy: lim α→1 H α = lim α→1 T α = H. Thus, both Rényi entropy and Tsallis entropy can be computed from the αth frequency moment; and one can approximate Shannon entropy from either H α or T α by letting α ≈ 1. Several studies [40,18,17]) used this idea to approximate Shannon entropy, all of which relied critically on efficient algorithms for estimating the αth frequency moments (2) near α = 1. In fact, one can numerically verify that the α values proposed in [18,17] are extremely close to 1, for example, ∆ = |1 -α| < 10 -7 [18, Alg. 1] or ∆ < 10 -4 [17] are quite likely. 1 From the definition of the Rényi and Tsallis entropies, it is clear that, in order to achieve a ν-additive guarantee for the Shannon entropy, it suffices to estimate the αth frequency moment with an ǫ = ν∆ guarantee (for sufficiently small ∆). For example, suppose an estimator F(α) guarantees (with high probability) that (1 -ǫ)F (α) ≤ F(α) ≤ (1 + ǫ)F (α) , then the estimated Rényi entropy, denoted by Ĥα would satisfy H α -ν ≤ Ĥα ≤ H α + ν, assuming ∆ is sufficiently small. Another perspective is from the estimation variances. From the definitions of the Rényi and Tsallis entropies, it is clear that we need estimators of the frequency moments with variances proportional to O ∆ 2 in order to cancel the term 1 (1-α) 2 . The estimation variance, of course, is also closely related to the sample complexity. Suppose we have an unbiased estimator of F (α) whose variance is V k F 2 (α) , where k is the sample size. Then the sample complexity is essentially O (V F 2 (α) )/(ǫ 2 F 2 (α) ) = O V /ǫ 2 , using the standard argument popular in the theory literature, e.g., [23]. The space complexity (in terms of bits) will be O V /ǫ 2 log t s=1 |I s | . The drawback of this argument is that it does not fully specify the constants. In a summary, in order to provide a ν (e.g., 0.1) additive approximation of the Shannon entropy, one should use O V /ǫ 2 = O V /(ν∆) 2 samples for estimating the (1±∆)th frequency moments. This bound initially appears disappointing, because, if for example, V = O(1), ν = 0.1, ∆ = 10 -5 , then it requires O 10 12 samples, which is very likely impractical. Well-known algorithms based on symmetric stable random projections [21,26] indeed exhibit V = O(1). Network traffic is a typical example of high-rate data streams. An effective and reliable measurement of network traffic in real-time is crucial for anomaly detection and network diagnosis; and one such measurement metric is Shannon entropy [12,24,38,7,25,40]. The Turnstile data stream model ( 1) is naturally suitable for describing network traffic, especially when the goal is to characterize the statistical distribution of the traffic. In its empirical form, a statistical distribution is described by histograms, A t [i], i = 1 to D. It is possible that D = 2 64 (IPV6) if one is interested in measuring the traffic streams of unique source or destination. The Distributed Denial of Service (DDoS) attack is a representative example of network anomalies. A DDoS attack attempts to make computers unavailable to intended users, either by forcing users to reset the computers or by exhausting the resources of service-hosting sites. For example, hackers may maliciously saturate the victim machines by sending many external communication requests. DDoS attacks typically target sites such as banks, credit card payment gateways, or military sites. A DDoS attack changes the statistical distribution of network traffic. Therefore, a common practice to detect an attack is to monitor the network traffic using certain summary statistics. Since Shannon entropy is a well-suited for characterizing a distribution, a popular detection method is to measure the time-history of entropy and alarm anomalies when the entropy becomes abnormal [12,25]. Entropy measurements do not have to be "perfect" for detecting attacks. It is however crucial that the algorithm should be computationally efficient at low memory cost, because the traffic data generated by large high-speed networks are enormous and transient (e.g., 1 Gbits/second). Algorithms should be real-time and one-pass, as the traffic data will not be stored [4]. Many algorithms have been proposed for "sampling" the traffic data and estimating entropy over data streams [25,40,6,16,3,8,18,17], The recent work [30] was devoted to estimating the Shannon entropy of MSN search logs, to help answer some basic problems in Web search, such as, how big is the web? The search logs can be viewed as data streams, and [30] analyzed several "snapshots" of a sample of MSN search logs. The sample used in [30] contained 10 million triples; each triple corresponded to a click from a particular IP address on a particular URL for a particular query. [30] drew their important conclusions on this (hopefully) representative sample. Alternatively, one could apply data stream algorithms such as CC on the whole history of MSN (or other search engines). A workshop in NIPS'03 was devoted to entropy estimation, owing to the wide-spread use of Shannon entropy in Neural Computations [33]. (http://www.menem.com/ ˜ilya/pages/NIPS03) For example, one application of entropy is to study the underlying structure of spike trains. The problem of approximating F (α) has been very heavily studied in theoretical computer science and databases, since the pioneering work of [2], which studied α = 0, 2, and α > 2. [11,21,26] provided improved algorithms for 0 < α ≤ 2. [22] provided algorithms for α > 2 to achieve the lower bounds proved by [35,5,37]. [14] suggested using even more space to trade for some speedup in the processing time. Note that the first moment (i.e., the sum), F (1) , can be computed easily with a simple counter [31,13,2]. This important property was recently captured by the method of Compressed Counting (CC) [27], which was based on the maximally-skewed stable random projections. [27] provided two algorithms, based on the geometric mean and harmonic mean, 2 and proved some important theoretical results: • The geometric mean algorithm has the variance proportional to O(∆) in the neighborhood of α = 1, where ∆ = |1 -α|. This is the first algorithm that captured the intuition that, in the neighborhood of α = 1, the moment estimation algorithms should work better and better as α → 1, in a continuous fashion. The geometric mean algorithm, unfortunately, did not provide an adequate mechanism for entropy estimation. As previously discussed, this methods leads to an entropy estimation algorithm with complexity O 1/(ν 2 ∆) , which is actually quite intuitive from the definitions of the Tsallis entropy and Rényi entropy. Both entropies contain the 1 1-α = 1 ∆ terms, meaning that the variance will blow up as O 1/∆ 2 , which can not be canceled by O (∆). Note that [27] did not show the variance of the harmonic mean algorithm is also proportional to O (∆); this paper will provide the proof. • For fixed ǫ, as ∆ → 0, the sample complexity bound of the geometric mean algorithm is O (1/ǫ) with all constants specified. This result was a major improvement over the well-known O 1/ǫ 2 bound [37,21,26]. Note that the assumption of fixing ǫ and letting ∆ → 0 is needed for theoretical convenience in order to derive bounds with no unspecified constants. This study will continue to use this assumption. Our comments: When α = 1, the moment estimation problem is trivial and only requires one simple counter. Therefore, even intuitively, O (1/ǫ) can not possibly be the true complexity bound. We consider the relaxed strict-Turnstile model (1). Conceptually, we multiply the data stream vector A t ∈ R 1×D by a random projection matrix R ∈ R D×k . The resultant vector X = A t × R ∈ R k×1 is only of length k. More specifically, the entries of the projected vector X are r ij 's are random variables generated from the following (non-standard) skewed stable distribution [41]: where v ij ∼ U nif orm(0, π) (i.i.d.) and w ij ∼ Exp(1) (i.i.d.), an exponential distribution with mean 1. We use this formulation to avoid numerical problems and simplify the analysis. Of course, in data stream computations, the matrix R is never fully materialized. The standard procedure in data stream computations is to generate entries of R on-demand [21]. In other words, whenever an stream element a t = (i t , I t ) arrives, one updates entries of X as The proposed algorithm is defined as follows: The following Theorem proves that this new estimator is (asymptotically) unbiased with the variance proportional to O ∆ 2 . Note that ∆ ∆ → 1 as ∆ → 0. Theorem 1 V ar Proof: See Appendix A. In this paper, we only consider α = 1 -∆ < 1. This is because the maximally-skewed stable distributions have good theoretical properties when α < 1 [27]; for example, all negative moments exist; see Lemma 2. The standard procedure for sampling from skewed stable distributions is based on the Chambers-Mallows-Stuck method [9]. To generate a sample from S(α, β = 1, 1), i.e., α-stable, maximally-skewed (β = 1), with unit scale, one first generates an exponential random variable with mean 1, W ∼ Exp(1), and a uniform random variable where ρ = π 2 when α < 1 and ρ = π Note that cos π 2 α → 0 as α → 1. For convenience (and avoiding numerical problems), we will use In this study, we will only consider α = 1 -∆ < 1, i.e, ρ = π 2 . After simplification, we obtain where V = π 2 + U ∼ U nif orm(0, π). This explains (6). Lemma 1 shows log Z = O (|∆ log ∆|), which can be accurately represented using O (log 1/∆) bits. The proof is omitted since it is straightforward. Lemma 1 For any given V = 0, and W = 0, as ∆ → 0, Then by properties of stable distributions, entries of X are where Therefore, CC boils down to estimating F (α) from k i.i.d. stable samples. [27] provided two statistical estimators, the geometric mean and harmonic mean estimators, which are derived based on the following basic moment formula. Lemma 2 [27]. , then X > 0, and for any Assume x j , j = 1 to k, are i.i.d. samples from S(α, β = 1, F (α) cos απ 2 ). After simplifying the corresponding expression in [27], we obtain which is unbiased and has asymptotic variance As α → 1, the asymptotic variance approaches zero at the rate of only O (∆), which is not adequate. which is asymptotically unbiased and has variance Var [27] only graphically showed that the harmonic mean estimator is noticeably better than the geometric mean estimator. We prove the following Lemma, which says the variance of the harmonic mean is also proportional to O (∆). Thus, the harmonic mean estimator is not adequate for entropy estimation either. Proof: See Appendix B. Lemma 4 Suppose a random variable where Note that g (0+; ∆) = ∆α α/∆ ≈ ∆e -1 approaches zero as ∆ → 0. Thus, one might be wondering if we replace g (θ; ∆) by g (0+; ∆), the errors may be quite small. This conjecture is verified in Figure 1. . As ∆ → 0, the exact CDF (solid curves) is very close to the approximate CDF (dashed curves), which we obtain by replacing the exact g(θ; ∆) function in Lemma 4 with the limit g(0+; ∆). Basically, we derive the proposed estimator by "guessing." We first derive a maximum likelihood estimator (MLE) for a slightly different distribution based on the intuition from Lemma 4 and Figure 1. Then we verify that this MLE is actually a very good estimator (in terms of both the variances and tail bounds) for the stable distribution we care about. Here, we consider a random variable Y whose cumulative distribution function (CDF) is Similar to stable random projections, we are interested in estimating c α from k i.i.d. samples x j = cY j , j = 1 to k. Statistics theory tells us that the maximum likelihood estimator (MLE) has the (asymptotic) optimality. Because the distribution function of Y j is known, we can actually compute the MLE in this case. Theorem 2 Suppose Y j , j = 1 to k, are i.i.d. samples from a distribution whose CDF is given by (17). Let x j = cY j , where c > 0. Then the maximum likelihood estimator of c α is given by See Appendix D. Compared with the proposed estimator F(α) in ( 7), the MLE solution has the addition term of 1 α α . Note that, while both ∆ ∆ and α α approach 1, ∆ ∆ → 1 considerably slower than α α → 1, because For example, when ∆ = 0.1, ∆ ∆ = 0.7943, α α = 0.9095; when ∆ = 0.01, ∆ ∆ = 0.9550, α α = 0.9901. Therefore, while α α may be considered negligible, it may be preferable to keep ∆ ∆ . In fact, when proving that the proposed estimator F(α) is (asymptotically) unbiased (see Appendix A), we do need the ∆ ∆ term. Theorem 1 has proved that the proposed estimator is asymptotically unbiased with variance proportional to O(∆ 2 ). Using the standard argument, we know that the sample complexity bound must be We are, however, very interested in the precise complexity bounds, not just the orders. Normally, we would like to present the tail bounds as, e.g., Pr F(α) GR , which immediately leads to the statement that: With probability at least 1 -δ, it suffices to use k Ideally, we hope G R will be as small as possible. In fact, in order to achieve a ν-additive algorithm for entropy estimation, we need ǫ = ν∆ (where ∆ < 10 -4 or even much smaller). Therefore, we really need G R = O ∆ 2 . In this sense, it is no longer appropriate to treat G R as a "constant." Theorem 3 presents the tail bounds for F(α) . Theorem 3 For any ǫ > 0 and 0 < ∆ = 1 -α < 1, we have the right tail bound for the proposed estimator: For any 0 < ǫ < 1 and 0 < ∆ = 1 -α < 1, we have the left tail bound: Proof: See Appendix E. These bounds appear to be too complicated to gain insightful information. People may be even wondering about numerical stability of the infinite sums. First of all, we notice that when ∆ = 1 (i.e., α = 0), we can compute the tail bounds exactly, as presented in Lemma 5. Proof: . The conclusions follow easily. Next, we re-formulate the tail bounds to facilitate numerical evaluations. Our numerical results show that, when ∆ is small, G R ≈ (6 ∼ 9)∆ 2 and G L ≈ (4 ∼ 6)∆ 2 , for 0 < ν < 1. Thus, we indeed have an algorithm for entropy estimation with complexity O 1 ν 2 . The tail bounds (19) and (21) contain , which can be written as Therefore, according to the Stirling's series [15, 8.327] Thus, for numerical reasons, we can rewrite (19) and (21) as The infinite series always converge provided t R ≤ ∆ e and t L ≤ ∆ e . In fact, because the bounds hold for any t > 0 (not necessarily the optimal values, t R and t L ), we know ǫ 2 GR = O(1) and ǫ 2 GL = O(1) if using (e.g.,) t = 0.5 ∆ e . In other words, G R = ∆ 2 and G L = ∆ 2 , as desired. We state this as a Lemma. The tail bound constants (19) and ( 21) In other words Therefore, to estimate F (α) within a (1 ± ν∆) factor, it suffices to let the sample size k = O 1 ν 2 , using the proposed estimator F(α) . Figure 2 presents the values in terms of GR ∆ 2 and GL ∆ 2 for 0 < ν < 1 and ∆ = 10 -2 , 10 -4 , 10 -6 , together with the closed-form expressions for ∆ = 1 as obtained in Lemma 5. The values are pleasantly small. Thus, at least numerically, we can say, for example, when ∆ is small, In other words, with a probability at least 1 -δ, using the proposed estimator, one can achieve | F(α) -F (α) | ≤ (ν∆)F (α) by using k ≥ 9 log 2/δ ν 2 samples. And we know the constant 9 could be replaced by 6 if ν is small. approach 6 -4∆, as proved in Lemma 7. Also, note that the curves for ∆ = 10 -2 , ∆ = 10 -4 , and ∆ = 10 -6 largely overlap. Whenever possible, analytical expressions are always more desirable. In fact, when ν → 0, we can actually obtain the analytical expressions for G R and G L . Proof: See Appendix F. In the previous (unpublished) work [29], we proposed the sample minimum estimator, which allowed us to prove a much improved sample complexity bound than that in [27]. Interestingly, the proposed estimator F(α) in this paper actually converges to the sample minimum estimator, denoted by F(α),min , This fact is quite intuitive. As ∆ → 0, the smallest one of x j 's is amplified the most by x -α/∆ j . This is analogous to the well-know fact that, the l p norm approaches the l ∞ norm (which is the maximum element of the vector), as p → ∞. In [29], we proved the following (closed-form) sample complexity bound for F(α),min : Theorem 4 [29] As ∆ = 1 -α → 0+, for any fixed ǫ > 0, Basically, in terms of ǫ = ν∆, Theorem 4 is applicable when ν is large (ν ≫ 1) and ∆ is small. A simulation study in [29] demonstrated that the bound in Theorem 4 can be very sharp. Real-world data are often dynamic and can be modeled as data streams. Measuring summary statistics of data streams such as the Shannon entropy has become an important task in many applications, for example, detecting anomaly events in large-scale networks. One line of active research is to approximate the Shannon entropy using the αth frequency moments of the stream with α very close to 1 (e.g., ∆ = 1 -α < 10 -4 or even much smaller). Efficiently approximating the αth frequency moments of data streams has been very heavily studied in theoretical computer science and databases. When 0 < α ≤ 2, it is well-known that efficient O 1/ǫ 2 -space algorithms exist, for example, symmetric stable random projections [21,26], which however are impractical for estimating Shannon entropy using α extremely close to 1. Recently, [27] provided an algorithm to achieve the O (1/ǫ) bound in the neighborhood of α = 1, based on the idea of maximally-skewed stable random projections (also called Compressed Counting (CC)). The algorithms provided in [27], however, are still impractical. In this paper, we provide a truly practical algorithm for entropy estimation. We prove that its variance is proportional to O ∆ 2 whereas previous algorithms for CC developed in [27] have variances proportional only to O (∆). This new algorithm leads to an O 1/ν 2 algorithm for entropy estimation to achieve ν-additive accuracy, while previous algorithms must use O 1/(ν 2 ∆ 2 ) samples [21,26], or O 1/(ν 2 ∆) samples [27]. Note that because ∆ is so small, it is no longer appropriate to treat it as "constant." We also analyze the precise sample complexity bound of the proposed new estimator, both numerically (for general 0 < ν < 1) and analytically (for small ν), to demonstrate that the sample complexity bound of the new estimator is free of large constants. This further confirms that our proposed new estimator is practical. As defined in (7), the proposed estimator . According to Lemma 2, A bit more algebra can show Recall F(α) = Ĵ-∆ . We will basically proceed by using the "delta" method popular in statistics. We need to be a bit careful here as ∆ is small. Just to make sure the resultant higher-order terms are indeed negligible, we carry out the algebra. By the Taylor expansion about J, we obtain Taking expectations on both sides yields, Evaluating the higher-order moments yields The task is to show that, as Using properties of Gamma functions, for example, Γ(1 + x) = xΓ(x), we obtain Using the infinite product representation of the Gamma function [15, 8.322], we obtain Therefore, Suppose a random variable Z ∼ S α < 1, β = 1, cos π 2 α . We can show that the cumulative distribution function is Recall Z = sin(αV ) and W is exponential with mean 1. Therefore, For θ ∈ (0, π), let It is easy to show that, as θ → 0+, lim The proof of the monotonicity of g(θ, ∆) is omitted, because it is can be inferred from the proof of the convexity. To show g(θ; ∆) is a convex function θ, it suffices to show it is log-convex. Since Therefore, α sin(θ∆) -∆ sin(αθ) ≥ 0 and sin(θ∆) sin(αθ) is convex. Therefore, we have proved the convexity of g (θ; ∆). Given k i.i.d. samples x j = cY j , the task is to estimate c α using MLE. The CDF of Y j is given by By taking derivatives, the density function of x j is given by we obtain From the previous results, we know We first study the right tail bound. We can choose the optimal t to minimize this upper bound. Thus, Now, we look into the left tail bound. Again, we can choose the optimal t = t L to minimize this upper bound. Thus, We have derived ǫ 2 GR and ǫ 2 GL in Theorem 3. The task of this Lemma is to show that, as ν → 0, To proceed with the proof, we first assume that, as ν → 0, we have which can be later verified. With this assumption, we can expand ǫ 2 GL : Setting the first derivative to zero, e is indeed on the order of ν. Therefore, Thus, we have proved that GL ∆ 2 → 6 -4∆ as ν → 0. A similar procedure can also prove GR ∆ 2 → 6 -4∆. This section demonstrates that the proposed estimator F(α) in (7) for Compressed Counting (CC) is a truly practical algorithm, while the previously proposed geometric mean algorithm [27] for CC is inadequate for entropy estimation. We also demonstrate that algorithms based on symmetric stable random projections [21,27] are not suitable for entropy estimation. Since the estimation accuracy is what we are interested in, we can simply use static data instead of real data streams. This is because the projected data vector X = R T A t is the same at the end of the stream (i.e., time t), regardless whether it is computed at once (i.e., static) or incrementally (i.e., dynamic). Eight English words are selected from a chunk of Web crawl data. The words are selected fairly randomly, although we make sure they cover a whole range of data sparsity, from function words (e.g., "A"), to common words (e.g., "FRIDAY") to rare words (e.g., "TWIST"). Thus, as summarized in Table 1, our data set consists of 8 vectors and the entries are the numbers of word occurrences in each document. We estimate the αth frequency moments , for ∆ = 1 -α = 0.2, 0.1, ..., 10 -16 , using the proposed new estimator F(α) and the geometric mean estimator F(α),gm , as well as the geometric mean estimator for symmetric stable random projections proposed in [26]. Recall We find F(α) is numerically very stable, if we express it as , where F (1) , the first moment, can be computed exactly. Using Matlab (the 32-bit version), we find no numerical problems with F(α ) even for very small ∆ (e.g., ∆ = 10 -14 ; see Figure 3). However, we could not find a numerically very stable implementation of the geometric mean estimator F(α),gm , when ∆ < 10 -5 . We tried a variety of ways (including the tricks in implementing F(α) ) to implement F(α),gm and the Gamma functions (e.g., using "gammaln" instead of "gamma" in Matlab). Fortunately, we believe ∆ = 10 -5 is sufficiently small for comparing the two estimators. For F(α),gm and F(α) , we also plot their theoretical variances (dashed curves), which largely overlap the empirical MSEs whenever the algorithms are numerically stable. The proposed new estimator F(α) is numerically very stable even when ∆ = 10 -14 . In comparison, F(α),gm is not stable if ∆ < 10 -5 . We present results at the sample sizes k = 10, 100, and 1000. We experiment with three k values: 10, 100, and 1000; and we present the estimation errors in terms of the normalized mean square errors (MSE, normalized by the square of the true values). As ∆ decreases, the MSEs for the symmetric stable random projections (in the right panel of Figure 3) are roughly flat, verifying that algorithms based on symmetric stable random projections do not capture the fact that the first moment (α = 1) should be a trivial problem. Using Compressed Counting (CC), the geometric mean estimator, F(α),gm (in the left panel of Figure 3), and proposed new estimator, F(α) (in the middle panel), clearly exhibit the desired property that the MSEs decrease as ∆ decreases. Of course, as expected, F(α) , has a much faster rate of decreasing than F(α),gm ; the latter is also numerically much less stable when ∆ < 10 -5 . After we have estimated the frequency moments, we use them to estimate the Shannon entropies using Tsallis entropies. For the data vector "TWIST", we present results at sample sizes k = 3, 10, 100, 1000, and 10000. For all other vectors, we do not experiment with k = 10000. Figure 4 and Figure 5 present the normalized MSEs. Using CC and the proposed estimator F(α) (middle panels), only k = 10 samples already produces fairly accurate estimates. In fact, for some vectors (such as "A"), even k = 3 may provide reasonable estimates. We believe the performance of the new estimator is remarkable. Another nice property is that the estimation errors (MSEs) become stable after (e.g.,) ∆ < 10 -3 (or 10 -4 ). In comparisons, the performance of the geometric mean estimator (left panels) for CC is not satisfactory. This is because its variance only decreases only at the rate of O(∆), not O(∆ 2 ). Also clearly, using symmetric stable random projections (right panels) would not provide good estimates of the Shannon entropy (unless the sample size is extremely large (≫ 10000) and one could carefully choose a good ∆ to exploit the bias-variance trade-off). The geometric mean and harmonic mean algorithms could be empirically improved using another algorithm based on numerical optimizations[28], which is very difficult for precise theoretical analysis (variances and bounds). This section provides the distribution function of Z ∼ S α < 1, β = 1, cos π 2 α , which will be needed in deriving the proposed estimator(7).

On Practical Algorithms for Entropy Estimation and the Improved Sample Complexity of Compressed Counting

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment