KafkaBatcher.Collector (kafka_batcher v1.0.1)
Implementation of collector for incoming events. The collector accumulates events in accordance with a given strategy using accumulators supervised by AccumulatorsPoolSupervisor. The strategy is specified by the following parameters:
:partition_strategy
, allows values: :random, :md5 or function (e.g.fn _topic, _partitions_count, key, _value -> key end
):partition_fn
, a function that takes 4 arguments and returns a number of partition (see example below):collect_by_partition
, if set totrue
, producer accumulates messages separately for each partition of topic:batch_size
, count of messages to be accumulated before producing:max_wait_time
, max interval between producings in milliseconds. The batch will be produced to Kafka either bybatch_size
or bymax_wait_time
parameter.:batch_flusher
, a module implementing aflush?/2
function. If the function returns true, the current batch will be sent to Kafka immediately.:min_delay
- optional parameter. Set minimal delay before send events. This parameter allowed to increase max throughput:max_batch_bytesize
- optional parameter. Allows to set a limit on the maximum batch size in bytes.
A collector can be described as follows (for example):
defmodule KafkaBatcher.Test.Handler1 do
use KafkaBatcher.Collector,
collect_by_partition: true,
topic_key: :topic1,
partition_fn: &KafkaBatcher.Test.Handler1.calculate_partition/4,
required_acks: -1,
batch_size: 30,
max_wait_time: 20_000,
min_delay: 0
def calculate_partition(_topic, partitions_count, _key, value) do
val = value["client_id"] || value["device_id"]
:erlang.phash2(val, partitions_count)
end
end
A collector can save events that cannot be sent to Kafka to external storage, such as a database. A storage is specified in the config.exs like this:
config :kafka_batcher,
storage_impl: KafkaBatcher.Storage.YourTempStorage