prolly v0.2.0 Prolly.CountMinSketch
Use CountMinSketch when you want to count and query the approximate number of occurences of values in a stream using sublinear memory
For example, “how many times has the string foo
been in the stream so far?” is
a reasonable question for CountMinSketch.
A CountMinSketch will not undercount occurences, but may overcount occurences, reporting a count that is higher than the real number of occurences for a given value.
Link to this section Summary
Functions
Query a sketch for the count of a given value
Create a CountMinSketch
Union two sketches by cell-wise adding their counts
Update a sketch with a value
Link to this section Types
Link to this section Functions
Query a sketch for the count of a given value
Examples
iex> require Prolly.CountMinSketch, as: Sketch
iex> Sketch.new(3, 5,
...> [fn(value) -> :crypto.hash(:sha, value) |> :crypto.bytes_to_integer() end,
...> fn(value) -> :crypto.hash(:md5, value) |> :crypto.bytes_to_integer() end,
...> fn(value) -> :crypto.hash(:sha256, value) |> :crypto.bytes_to_integer() end])
...> |> Sketch.update("hi") |> Sketch.get_count("hi")
1
iex> require Prolly.CountMinSketch, as: Sketch
iex> sketch = Sketch.new(3, 5,
...> [fn(value) -> :crypto.hash(:sha, value) |> :crypto.bytes_to_integer() end,
...> fn(value) -> :crypto.hash(:md5, value) |> :crypto.bytes_to_integer() end,
...> fn(value) -> :crypto.hash(:sha256, value) |> :crypto.bytes_to_integer() end])
...> |> Sketch.update("hi")
...> |> Sketch.update("hi")
...> |> Sketch.update("hi")
iex> Sketch.get_count(sketch, "hi")
3
iex> require Prolly.CountMinSketch, as: Sketch
iex> sketch = Sketch.new(3, 5,
...> [fn(value) -> :crypto.hash(:sha, value) |> :crypto.bytes_to_integer() end,
...> fn(value) -> :crypto.hash(:md5, value) |> :crypto.bytes_to_integer() end,
...> fn(value) -> :crypto.hash(:sha256, value) |> :crypto.bytes_to_integer() end])
...> |> Sketch.update([77, "list"])
...> |> Sketch.update([77, "list"])
...> |> Sketch.update([77, "list"])
...> |> Sketch.update([77, "list"])
...> |> Sketch.update([77, "list"])
iex> Sketch.get_count(sketch, [77, "list"])
5
Link to this function
new(width, depth, hash_fns)
Create a CountMinSketch
Examples
iex> require Prolly.CountMinSketch, as: Sketch
iex> Sketch.new(3, 5,
...> [fn(value) -> :crypto.hash(:sha, value) |> :crypto.bytes_to_integer() end,
...> fn(value) -> :crypto.hash(:md5, value) |> :crypto.bytes_to_integer() end,
...> fn(value) -> :crypto.hash(:sha256, value) |> :crypto.bytes_to_integer() end]).matrix
...> |> Enum.map(&Vector.to_list(&1))
[[0, 0, 0, 0, 0], [0, 0, 0, 0, 0], [0, 0, 0, 0, 0]]
Union two sketches by cell-wise adding their counts
Examples
iex> require Prolly.CountMinSketch, as: Sketch
iex> sketch1 = Sketch.new(3, 5,
...> [fn(value) -> :crypto.hash(:sha, value) |> :crypto.bytes_to_integer() end,
...> fn(value) -> :crypto.hash(:md5, value) |> :crypto.bytes_to_integer() end,
...> fn(value) -> :crypto.hash(:sha256, value) |> :crypto.bytes_to_integer() end])
...> |> Sketch.update("hi")
iex> sketch2 = Sketch.new(3, 5,
...> [fn(value) -> :crypto.hash(:sha, value) |> :crypto.bytes_to_integer() end,
...> fn(value) -> :crypto.hash(:md5, value) |> :crypto.bytes_to_integer() end,
...> fn(value) -> :crypto.hash(:sha256, value) |> :crypto.bytes_to_integer() end])
...> |> Sketch.update("hi")
iex> Sketch.union(sketch1, sketch2).matrix |> Enum.map(&Vector.to_list(&1))
[[0, 2, 0, 0, 0], [0, 0, 2, 0, 0], [0, 2, 0, 0, 0]]
Update a sketch with a value
Examples
iex> require Prolly.CountMinSketch, as: Sketch
iex> sketch = Sketch.new(3, 5,
...> [fn(value) -> :crypto.hash(:sha, value) |> :crypto.bytes_to_integer() end,
...> fn(value) -> :crypto.hash(:md5, value) |> :crypto.bytes_to_integer() end,
...> fn(value) -> :crypto.hash(:sha256, value) |> :crypto.bytes_to_integer() end])
...> |> Sketch.update("hi")
iex> sketch.matrix |> Enum.map(&Vector.to_list(&1))
[[0, 1, 0, 0, 0], [0, 0, 1, 0, 0], [0, 1, 0, 0, 0]]
iex> require Prolly.CountMinSketch, as: Sketch
iex> sketch = Sketch.new(3, 5,
...> [fn(value) -> :crypto.hash(:sha, value) |> :crypto.bytes_to_integer() end,
...> fn(value) -> :crypto.hash(:md5, value) |> :crypto.bytes_to_integer() end,
...> fn(value) -> :crypto.hash(:sha256, value) |> :crypto.bytes_to_integer() end])
...> |> Sketch.update(["a", "list", "of", "things"])
iex> sketch.matrix |> Enum.map(&Vector.to_list(&1))
[[0, 0, 0, 0, 1], [0, 0, 1, 0, 0], [0, 0, 1, 0, 0]]