KafkaBatcher.Collector (kafka_batcher v1.0.0)

Implementation of collector for incoming events. The collector accumulates events in accordance with a given strategy using accumulators supervised by AccumulatorsPoolSupervisor. The strategy is specified by the following parameters:

  • :partition_strategy, allows values: :random, :md5 or function (e.g. fn _topic, _partitions_count, key, _value -> key end)
  • :partition_fn, a function that takes 4 arguments and returns a number of partition (see example below)
  • :collect_by_partition, if set to true, producer accumulates messages separately for each partition of topic
  • :batch_size, count of messages to be accumulated before producing
  • :max_wait_time, max interval between producings in milliseconds. The batch will be produced to Kafka either by batch_size or by max_wait_time parameter.
  • :batch_flusher, a module implementing a flush?/2 function. If the function returns true, the current batch will be sent to Kafka immediately.
  • :min_delay - optional parameter. Set minimal delay before send events. This parameter allowed to increase max throughput
  • :max_batch_bytesize - optional parameter. Allows to set a limit on the maximum batch size in bytes.

A collector can be described as follows (for example):

defmodule KafkaBatcher.Test.Handler1 do
  use KafkaBatcher.Collector,
    collect_by_partition: true,
    topic_key: :topic1,
    partition_fn: &KafkaBatcher.Test.Handler1.calculate_partition/4,
    required_acks: -1,
    batch_size: 30,
    max_wait_time: 20_000,
    min_delay: 0

  def calculate_partition(_topic, partitions_count, _key, value) do
    val = value["client_id"] || value["device_id"]
    :erlang.phash2(val, partitions_count)
  end
end

A collector can save events that cannot be sent to Kafka to external storage, such as a database. A storage is specified in the config.exs like this:

config :kafka_batcher,
  storage_impl: KafkaBatcher.Storage.YourTempStorage