What is Approximate Counting? Can approximate counting be efficient on hardware?

Arish Sateesan
7 min readSep 7, 2021

Counters are an indispensable part of any statistical measurement. However, counting and storage can be a bit difficult when dealing with large data streams/patterns, especially in big data, networking, and machine learning applications. The memory requirement would be too large, and accessing large external memories would eventually slow down the processing. Approximate counting is one of the solutions to reduce the storage/memory requirement of counters at the expense of a small error in accuracy. This error is negligible when considering a large counter value. It was first presented in 1977 by Robert Morris (cryptographer) Bell Labs, who proposed a probabilistic counting technique to increment the counter.

Morris Algorithm

The basic idea behind Morris algorithm is that the counter only stores an “order of magnitude approximation’’ of the actual count. Instead of keeping the whole value, only the magnitude of approximation is stored.

Let’s keep it simple; no puzzling math or incomprehensible equations. The equations that led to the actual Morris algorithm are explained well in many other blog posts and you can refer to one such post here.

The algorithm can be simply stated as:

Let n be the actual count and v be the value to be stored in the memory. vₙ denotes the value of v for a count n.

1. Initialize v=0 for n=0

2. With probability p=1/2ᵛ, increment v (If the value to be incremented by x, then p=x/2).

Here, r is a random probability generated from a uniformly distributed interval (0,1).

3. The estimated value n can be calculated as 2-1

Let’s explain it with an example. Assume that the present value of v is 7. The estimated value nᵥ=2ᵛ-1=127. Now, n has to be incremented by a value of 1, and n now becomes 128.

Let the probability p = 1/2ᵛ = 0.0078125. A random probability r is generated from a uniformly distributed interval (0,1). Consider both the scenarios when p<r and p>r.

  1. p<r : The value of v is unchanged and is 7. The new estimated value is nᵥ=2ᵛ-1=127.
  2. p>r : The value of v is unchanged and is 8. The new estimated value is nᵥ=2ᵛ-1=255.

The value 255 seems to double the actual value of 128, but bear in mind that the probability of having p<r is equal to 2ᵛ since r is uniformly distributed. Hence, the value of v will probably be changed within the range 2ᵛ and 2ᵛ⁺¹.

Can approximate counting algorithms be implemented efficiently on hardware?

Following the approach of the Morris algorithm, many improved probabilistic counting techniques such as Adaptive non-linear sampling method, DISCO, ICE Buckets, SA Counter, Additive-error counters, and many more. These algorithms significantly reduced the error, but with some memory and computational trade-offs. However, these algorithms are software-oriented and are not tailored for hardware. The operations such as logarithmic computation in fact made it far more difficult to implement on hardware. One exception was Small active counters (SAC) by Stanojevic. Even though SAC possesses comparatively low accuracy compared to some of the aforementioned algorithms, SAC is free from complex computations such as logarithms and is more hardware-friendly.

Small Active Counters (SAC)

In SAC, the memory is allocated for n counters. Each counter consists of two parts mode[i] and A[i], where mode[i] is the exponent part and A[i] is the estimation part of the iᵗʰ counter. l-bits are allocated for mode and k-bits are allocated for A[i]. A global parameter r is used as a scaling factor for a set of counters. The structure of this counter array is shown below.

SAC counter array structure

The estimated value can be calculated from the equation:

The pseudo-code of SAC is shown below. Here, inc is the value to be incremented.

Let’s consider an example to understand the algorithm better. Assume that r=1, mode[i]=3 and A[i]=17. The estimation Vᵢ = 17x2³ = 136. This value has to be incremented by inc=12; i.e.; Vᵢ=136+12=148.

UpdateCounter(i,inc):

A[i]=17+floor(12/2³) = 17+1=18

probability p = normalized(inc/2³) = 12/2³ — floor(12/2³) = 0.5

A random probability pᵣ ~ (0,1).

If p >pᵣ : A[i]=A[i}+1. Else A[i]=A[i]. Assuming that p >pᵣ , A[i] = 18+1 = 19. The estimated value becomes Vᵢ=19x2³=152.

Considering another scenario where l=2-bits and k=5-bits. Let r=1, mode[i] = 3, and A[i]=31,The estimated value Vᵢ would be 248. and inc=18. The new value would be 248+18=266. During the update, A[i] becomes 33, which is greater than 2ᵏ. Hence, the value of mode[i] must be updated.

UpdateCounter(i,inc):

A[i]=31+floor(18/2³) = 31+2=33

probability p = normalized(inc/2³) = 18/2³— floor(18/2³) = 0.25

A random probability pᵣ ~ (0,1).

If p >pᵣ : A[i]=A[i}+1. Else A[i]=A[i]. Assuming that p <pᵣ , A[i] = 33. Since 33>2⁵, mode[i] must be updated.

mode[i]=mode[i]+1=3+1=4

probability p = normalized(A[i]/2ʳ) = 33/2¹— floor(33/2¹) = 0.5.

A random probability pᵣ ~ (0,1).

A[i]=floor(A[i]/2^r))=floor(33/2)=16

Assuming that p >pᵣ , A[i] = 16+1=17

Now, mode[i]=4=2ˡ. Hence, renormalization is required.

RenormalizeCounters(r):

old_mode=4

mode[i]=ceil(4(r/r+1))=ceil(4(1/2))=2

r₁=mode[i].(r+1) — old_mode.(r)=4–4=0

probability p = normalized(A[i]/2^r1) = 17/2⁰— floor(17/2⁰) = 0.

A random probability pᵣ ~ (0,1).

A[i]=floor(A[i]/2^r1)=17

p=0, hence p<pᵣ : so, A[i] = A[i]=17.

Finally, r=r+1 = 1+1=2

Now we have the new values r=2, mode[i]=2, and A[i]=17. The estimated value Vᵢ=17 x 2⁴ = 272.

We can see that the error is not really significant and the estimated value is close to the actual value. Also, no complex computations such as logarithms. Yet, operations such as floating-point computations are not favouring hardware. Also, exhibit some flaws for some parameter values. For example, consider l=2 and let the current exponent value be mode[i]=3. Also, assume that r=4. At the next increment, the exponent becomes mode[i]=4=2ˡ, so re-normalization is required and the new value of mode[i] is changed to ceil(mode[i].(r/r+1)), which in fact is again 4 and the algorithm collapses.

Another issue with SAC array is that all counters use the same scaling factor r. In SAC, re-normalization is required for all counters each time the value of r is updated. This could suspend the update process and cause loss of information where the stream processing speed is high. When r is common for all the counters in an array, a large value in a single counter would cause a large relative error to all other counters consisting of small flows.

A hardware-oriented modification of SAC named HSAC is introduced [1] and it proved to be performing well with less computational complexities.

Hardware-oriented SAC (HSAC)

HSAC is actually a hardware-optimized version of SAC. The structure is slightly different than SAC. SAC had a common scaling parameter for all the counters in the array, but in HSAC scaling parameter is unique for each counter. That means the total number of bits q of the counter is split into 3 parts- scale (m-bit), exp(l-bit), and count(k-bit).

HSAC counter structure

In HSAC, all the operations are made powers of two, which replaced all the complex arithmetic operations with shifting. This also reduced the complex renormalization operation to a single shift operation without any change in accuracy. Also, drastically reduced the hardware resource requirements. The optimizations also helped the algorithm not to collapse for smaller parameter values.

The estimated value can be calculated using the equation:

It is optimal to keep the maximum value of 2^scale to 4 to limit the error to a minimum. Hence, only 2 bits are required to represent scale, which of course can support a value of 2^scale up to 8 (The maximum value that can be represented using 2 bits is 3, leading to a maximum value of 2^scale = 2³).

An extensive analysis on the accuracy and FPGA implementation results is presented in the paper [1].

[1] Sateesan, A., Vliegen, J., Scherrer, S., Hsiao, H.C., Perrig, A. and Mentens, N., Speed Records in Network Flow Measurement on FPGA. In Proceedings of the International Conference on Field-Programmable Logic (FPL), 2021, https://netsec.ethz.ch/publications/papers/sateesan2021speed.pdf

--

--