prolly v0.2.0 Prolly.CountMinSketch

Use CountMinSketch when you want to count and query the approximate number of occurences of values in a stream using sublinear memory

For example, “how many times has the string foo been in the stream so far?” is a reasonable question for CountMinSketch.

A CountMinSketch will not undercount occurences, but may overcount occurences, reporting a count that is higher than the real number of occurences for a given value.

Link to this section Summary

Functions

Query a sketch for the count of a given value

Create a CountMinSketch

Union two sketches by cell-wise adding their counts

Update a sketch with a value

Link to this section Types

Link to this section Functions

Link to this function get_count(sketch, value)
get_count(t, String.Chars) :: integer

Query a sketch for the count of a given value

Examples

iex> require Prolly.CountMinSketch, as: Sketch
iex> Sketch.new(3, 5,
...> [fn(value) -> :crypto.hash(:sha, value) |> :crypto.bytes_to_integer() end,
...>  fn(value) -> :crypto.hash(:md5, value) |> :crypto.bytes_to_integer() end,
...>  fn(value) -> :crypto.hash(:sha256, value) |> :crypto.bytes_to_integer() end])
...> |> Sketch.update("hi") |> Sketch.get_count("hi")
1

iex> require Prolly.CountMinSketch, as: Sketch
iex> sketch = Sketch.new(3, 5,
...> [fn(value) -> :crypto.hash(:sha, value) |> :crypto.bytes_to_integer() end,
...>  fn(value) -> :crypto.hash(:md5, value) |> :crypto.bytes_to_integer() end,
...>  fn(value) -> :crypto.hash(:sha256, value) |> :crypto.bytes_to_integer() end])
...> |> Sketch.update("hi")
...> |> Sketch.update("hi")
...> |> Sketch.update("hi")
iex> Sketch.get_count(sketch, "hi")
3

iex> require Prolly.CountMinSketch, as: Sketch
iex> sketch = Sketch.new(3, 5,
...> [fn(value) -> :crypto.hash(:sha, value) |> :crypto.bytes_to_integer() end,
...>  fn(value) -> :crypto.hash(:md5, value) |> :crypto.bytes_to_integer() end,
...>  fn(value) -> :crypto.hash(:sha256, value) |> :crypto.bytes_to_integer() end])
...> |> Sketch.update([77, "list"])
...> |> Sketch.update([77, "list"])
...> |> Sketch.update([77, "list"])
...> |> Sketch.update([77, "list"])
...> |> Sketch.update([77, "list"])
iex> Sketch.get_count(sketch, [77, "list"])
5
Link to this function new(width, depth, hash_fns)
new(pos_integer, pos_integer, [(String.t -> integer)]) :: t

Create a CountMinSketch

Examples

iex> require Prolly.CountMinSketch, as: Sketch
iex> Sketch.new(3, 5,
...> [fn(value) -> :crypto.hash(:sha, value) |> :crypto.bytes_to_integer() end,
...>  fn(value) -> :crypto.hash(:md5, value) |> :crypto.bytes_to_integer() end,
...>  fn(value) -> :crypto.hash(:sha256, value) |> :crypto.bytes_to_integer() end]).matrix
...> |> Enum.map(&Vector.to_list(&1))
[[0, 0, 0, 0, 0], [0, 0, 0, 0, 0], [0, 0, 0, 0, 0]]
Link to this function union(sketch1, sketch2)
union(t, t) :: t

Union two sketches by cell-wise adding their counts

Examples

iex> require Prolly.CountMinSketch, as: Sketch
iex> sketch1 = Sketch.new(3, 5,
...> [fn(value) -> :crypto.hash(:sha, value) |> :crypto.bytes_to_integer() end,
...>  fn(value) -> :crypto.hash(:md5, value) |> :crypto.bytes_to_integer() end,
...>  fn(value) -> :crypto.hash(:sha256, value) |> :crypto.bytes_to_integer() end])
...> |> Sketch.update("hi")
iex> sketch2 = Sketch.new(3, 5,
...> [fn(value) -> :crypto.hash(:sha, value) |> :crypto.bytes_to_integer() end,
...>  fn(value) -> :crypto.hash(:md5, value) |> :crypto.bytes_to_integer() end,
...>  fn(value) -> :crypto.hash(:sha256, value) |> :crypto.bytes_to_integer() end])
...> |> Sketch.update("hi")
iex> Sketch.union(sketch1, sketch2).matrix |> Enum.map(&Vector.to_list(&1))
[[0, 2, 0, 0, 0], [0, 0, 2, 0, 0], [0, 2, 0, 0, 0]]
Link to this function update(sketch, value)
update(t, String.Chars) :: t

Update a sketch with a value

Examples

iex> require Prolly.CountMinSketch, as: Sketch
iex> sketch = Sketch.new(3, 5,
...> [fn(value) -> :crypto.hash(:sha, value) |> :crypto.bytes_to_integer() end,
...>  fn(value) -> :crypto.hash(:md5, value) |> :crypto.bytes_to_integer() end,
...>  fn(value) -> :crypto.hash(:sha256, value) |> :crypto.bytes_to_integer() end])
...> |> Sketch.update("hi")
iex> sketch.matrix |> Enum.map(&Vector.to_list(&1))
[[0, 1, 0, 0, 0], [0, 0, 1, 0, 0], [0, 1, 0, 0, 0]]

iex> require Prolly.CountMinSketch, as: Sketch
iex> sketch = Sketch.new(3, 5,
...> [fn(value) -> :crypto.hash(:sha, value) |> :crypto.bytes_to_integer() end,
...>  fn(value) -> :crypto.hash(:md5, value) |> :crypto.bytes_to_integer() end,
...>  fn(value) -> :crypto.hash(:sha256, value) |> :crypto.bytes_to_integer() end])
...> |> Sketch.update(["a", "list", "of", "things"])
iex> sketch.matrix |> Enum.map(&Vector.to_list(&1))
[[0, 0, 0, 0, 1], [0, 0, 1, 0, 0], [0, 0, 1, 0, 0]]