prolly v0.2.0 Prolly.HyperLogLog

Use HyperLogLog when you want to count the numer of distinct elements in a stream in sublinear memory

m = the number of registers, >= 16

a = the “alpha” corrective factor, varied by m

b = the number of least-significant bits that go toward the index. Must be log2(m), ie 64 registers means the 6 rightmost bits are the ones devoted to determining a registers

alpha_m_squared = a * m * m, memoized

Link to this section Summary

Functions

Get the count-distinct from a HyperLogLog

Create a new HyperLogLog

Update a HyperLogLog

Link to this section Types

Link to this section Functions

Link to this function count(loglog)
count(t) :: integer

Get the count-distinct from a HyperLogLog

Examples

iex> require Prolly.HyperLogLog, as: HLL
iex> hll = HLL.new(64, fn(value) -> :erlang.phash2(value) end)
iex> Enum.reduce(1..5800, hll, fn(val, acc) -> HLL.update(acc, val) end) |> HLL.count
5813
Link to this function new(m, hash_fn)
new(pos_integer, (Sting.t -> integer)) :: t

Create a new HyperLogLog

Examples

iex> require Prolly.HyperLogLog, as: HLL
iex> HLL.new(64, fn(value) -> :erlang.phash2(value) end).m
64

iex> require Prolly.HyperLogLog, as: HLL
iex> HLL.new(64, fn(value) -> :erlang.phash2(value) end).a
0.709

iex> require Prolly.HyperLogLog, as: HLL
iex> HLL.new(64, fn(value) -> :erlang.phash2(value) end).b
6

iex> require Prolly.HyperLogLog, as: HLL
iex> HLL.new(64, fn(value) -> :erlang.phash2(value) end).alpha_m_squared
2904.064

iex> require Prolly.HyperLogLog, as: HLL
iex> HLL.new(64, fn(value) -> :erlang.phash2(value) end).registers |> Vector.to_list
Enum.map(1..64, fn _ -> 0 end)
Link to this function update(loglog, value)
update(t, String.Chars) :: t

Update a HyperLogLog

Examples

# with a String
iex> require Prolly.HyperLogLog, as: HLL
iex> hll = HLL.new(64, fn(value) -> :erlang.phash2(value) end)
iex> HLL.update(hll, "hi").registers |> Vector.to_list
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0,
 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]

# with any term
iex> require Prolly.HyperLogLog, as: HLL
iex> hll = HLL.new(64, fn(value) -> :erlang.phash2(value) end)
iex> HLL.update(hll, 4242).registers |> Vector.to_list
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0]