kafka_ex v0.6.2 KafkaEx
KafkaEx
KafkaEx is an Elixir client for Apache Kafka with support for Kafka versions 0.8.0 and newer.
See http://hexdocs.pm/kafka_ex/ for documentation, https://github.com/kafkaex/kafka_ex/ for code.
KakfaEx supports the following Kafka features:
- Broker and Topic Metadata
- Produce Messages
- Fetch Messages
- Message Compression with Snappy and gzip
- Offset Management (fetch / commit / autocommit)
See Kafka Protocol Documentation and A Guide to the Kafka Protocol for details of these features.
KafkaEx does support consumer groups for message consumption. This feature was added in Kafka 0.8.2. This translates to providing a consumer group name when committing offsets. It is up to the client to assign partitions to workers in this mode of operation.
KafkaEx currently provides limited support for the Kafka ConsumerGroup API that was added in Kafka 0.9.0. Most of the protocol requests are implemented in KafkaEx, but we do not yet support automatic joining and management of consumer group memebership (e.g., automatically assigning partitions to clients). We are actively working on an implementation for automatic consumer group management.
Using KafkaEx in an Elixir project
The standard approach for adding dependencies to an Elixir application applies: add KafkaEx to the deps and applications lists in your project’s mix.exs file. You may also optionally add snappy-erlang-nif (required only if you want to use snappy compression).
# mix.exs
defmodule MyApp.Mixfile do
# ...
def application do
[
mod: {MyApp, []},
applications: [
# add to existing apps - :logger, etc..
:kafka_ex,
:snappy # if using snappy compression
]
]
end
defp deps do
[
# add to your existing deps
{:kafka_ex, "~> 0.6.2"},
# if using snappy compression
{:snappy, git: "https://github.com/fdmanana/snappy-erlang-nif"}
]
end
end
Then run mix deps.get
to fetch dependencies.
Configuration
See config/config.exs or KafkaEx.Config for a description of configuration variables, including the Kafka broker list and default consumer group.
You can also override options when creating a worker, see below.
Usaga Examples
Create a KafkaEx Worker
KafkaEx worker processes manage the state of the connection to the Kafka broker.
iex> KafkaEx.create_worker(:pr) # where :pr is the process name of the created worker
{:ok, #PID<0.171.0>}
With custom options:
iex> uris = [{"localhost", 9092}, {"localhost", 9093}, {"localhost", 9094}]
[{"localhost", 9092}, {"localhost", 9093}, {"localhost", 9094}]
iex> KafkaEx.create_worker(:pr, [uris: uris, consumer_group: "kafka_ex", consumer_group_update_interval: 100])
{:ok, #PID<0.172.0>}
Create an unnamed KafkaEx worker
You may find you want to create many workers, say in conjunction with
a poolboy
pool. In this scenario you usually won’t want to name these worker processes.
To create an unnamed worked with create_worker
:
iex> KafkaEx.create_worker(:no_name) # indicates to the server process not to name the process
{:ok, #PID<0.171.0>}
Use KafkaEx with a pooling library
Note that KafkaEx has a supervisor to manage its workers. If you are using Poolboy or a similar
library, you will want to manually create a worker so that it is not supervised by KafkaEx.Supervisor
.
To do this, you will need to call:
GenServer.start_link(KafkaEx.Server,
[
[uris: Application.get_env(:kafka_ex, :brokers),
consumer_group: Application.get_env(:kafka_ex, :consumer_group)],
:no_name
]
)
Retrieve kafka metadata
For all metadata
iex> KafkaEx.metadata
%KafkaEx.Protocol.Metadata.Response{brokers: [%KafkaEx.Protocol.Metadata.Broker{host:
"192.168.59.103",
node_id: 49162, port: 49162, socket: nil}],
topic_metadatas: [%KafkaEx.Protocol.Metadata.TopicMetadata{error_code: :no_error,
partition_metadatas: [%KafkaEx.Protocol.Metadata.PartitionMetadata{error_code: :no_error,
isrs: [49162], leader: 49162, partition_id: 0, replicas: [49162]}],
topic: "LRCYFQDVWUFEIUCCTFGP"},
%KafkaEx.Protocol.Metadata.TopicMetadata{error_code: :no_error,
partition_metadatas: [%KafkaEx.Protocol.Metadata.PartitionMetadata{error_code: :no_error,
isrs: [49162], leader: 49162, partition_id: 0, replicas: [49162]}],
topic: "JSIMKCLQYTWXMSIGESYL"},
%KafkaEx.Protocol.Metadata.TopicMetadata{error_code: :no_error,
partition_metadatas: [%KafkaEx.Protocol.Metadata.PartitionMetadata{error_code: :no_error,
isrs: [49162], leader: 49162, partition_id: 0, replicas: [49162]}],
topic: "SCFRRXXLDFPOWSPQQMSD"},
%KafkaEx.Protocol.Metadata.TopicMetadata{error_code: :no_error,
...
For a specific topic
iex> KafkaEx.metadata(topic: "foo")
%KafkaEx.Protocol.Metadata.Response{brokers: [%KafkaEx.Protocol.Metadata.Broker{host: "192.168.59.103",
node_id: 49162, port: 49162, socket: nil}],
topic_metadatas: [%KafkaEx.Protocol.Metadata.TopicMetadata{error_code: :no_error,
partition_metadatas: [%KafkaEx.Protocol.Metadata.PartitionMetadata{error_code: :no_error,
isrs: [49162], leader: 49162, partition_id: 0, replicas: [49162]}],
topic: "foo"}]}
Retrieve offset from a particular time
Kafka will get the starting offset of the log segment that is created no later than the given timestamp. Due to this, and since the offset request is served only at segment granularity, the offset fetch request returns less accurate results for larger segment sizes.
iex> KafkaEx.offset("foo", 0, {{2015, 3, 29}, {23, 56, 40}}) # Note that the time specified should match/be ahead of time on the server that kafka runs
[%KafkaEx.Protocol.Offset.Response{partition_offsets: [%{error_code: :no_error, offset: [256], partition: 0}], topic: "foo"}]
Retrieve the latest offset
iex> KafkaEx.latest_offset("foo", 0) # where 0 is the partition
[%KafkaEx.Protocol.Offset.Response{partition_offsets: [%{error_code: :no_error, offsets: [16], partition: 0}], topic: "foo"}]
Retrieve the earliest offset
iex> KafkaEx.earliest_offset("foo", 0) # where 0 is the partition
[%KafkaEx.Protocol.Offset.Response{partition_offsets: [%{error_code: :no_error, offset: [0], partition: 0}], topic: "foo"}]
Fetch kafka logs
NOTE You must pass auto_commit: false
in the options for fetch/3
when using Kafka < 0.8.2 or when using :no_consumer_group
.
iex> KafkaEx.fetch("foo", 0, offset: 5) # where 0 is the partition and 5 is the offset we want to start fetching from
[%KafkaEx.Protocol.Fetch.Response{partitions: [%{error_code: :no_error,
hw_mark_offset: 115,
message_set: [
%KafkaEx.Protocol.Fetch.Message{attributes: 0, crc: 4264455069, key: nil, offset: 5, value: "hey"},
%KafkaEx.Protocol.Fetch.Message{attributes: 0, crc: 4264455069, key: nil, offset: 6, value: "hey"},
%KafkaEx.Protocol.Fetch.Message{attributes: 0, crc: 4264455069, key: nil, offset: 7, value: "hey"},
%KafkaEx.Protocol.Fetch.Message{attributes: 0, crc: 4264455069, key: nil, offset: 8, value: "hey"},
%KafkaEx.Protocol.Fetch.Message{attributes: 0, crc: 4264455069, key: nil, offset: 9, value: "hey"}
...], partition: 0}], topic: "foo"}]
Produce kafka logs
iex> KafkaEx.produce("foo", 0, "hey") # where "foo" is the topic and "hey" is the message
:ok
Stream kafka logs
NOTE You must pass auto_commit: false
in the options for stream/3
when using Kafka < 0.8.2 or when using :no_consumer_group
.
iex> KafkaEx.create_worker(:stream, [uris: [{"localhost", 9092}]])
{:ok, #PID<0.196.0>}
iex> KafkaEx.produce("foo", 0, "hey", worker_name: :stream)
:ok
iex> KafkaEx.produce("foo", 0, "hi", worker_name: :stream)
:ok
iex> KafkaEx.stream("foo", 0, offset: 0) |> Enum.take(2)
[%{attributes: 0, crc: 4264455069, key: nil, offset: 0, value: "hey"},
%{attributes: 0, crc: 4251893211, key: nil, offset: 1, value: "hi"}]
As mentioned, for Kafka < 0.8.2 the stream/3
requires autocommit: false
iex> KafkaEx.stream("foo", 0, offset: 0, auto_commit: false) |> Enum.take(2)
Compression
Snappy and gzip compression is supported. Example usage for producing compressed messages:
message1 = %KafkaEx.Protocol.Produce.Message{value: "value 1"}
message2 = %KafkaEx.Protocol.Produce.Message{key: "key 2", value: "value 2"}
messages = [message1, message2]
#snappy
produce_request = %KafkaEx.Protocol.Produce.Request{
topic: "test_topic",
partition: 0,
required_acks: 1,
compression: :snappy,
messages: messages}
KafkaEx.produce(produce_request)
#gzip
produce_request = %KafkaEx.Protocol.Produce.Request{
topic: "test_topic",
partition: 0,
required_acks: 1,
compression: :gzip,
messages: messages}
KafkaEx.produce(produce_request)
Compression is handled automatically on the consuming/fetching end.
Testing
It is strongly recommended to test using the Dockerized test cluster described below. This is required for contributions to KafkaEx.
NOTE You may have to run the test suite twice to get tests to pass. Due to asynchronous issues, the test suite sometimes fails on the first try.
Dockerized Test Cluster
Testing KafkaEx requires a local SSL-enabled Kafka cluster with 3 nodes: one node listening on each port 9092, 9093, and 9093. The easiest way to do this is using the scripts in this repository that utilize Docker and Docker Compose (both of which are freely available). This is the method we use for our CI testing of KafkaEx.
To launch the included test cluster, run
./scripts/docker_up.sh
The docker_up.sh
script will attempt to determine an IP address for your
computer on an active network interface. If it has trouble with this, you can
try manually specifying a network interface in the IP_IFACE
environment
variable:
IP_IFACE=eth0 ./scripts/docker_up.sh
The test cluster runs Kafka 0.9.2.
Running the KafkaEx Tests
The KafkaEx tests are split up using tags to handle testing multiple scenarios and Kafka versions.
Unit tests
These tests do not require a Kafka cluster to be running.
mix test --no-start
Integration tests
If you are not using the Docker test cluster, you may need to modify
config/config.exs
for your set up.
The full test suite requires Kafka 0.9+.
Kafka >= 0.9.0
The 0.9 client includes functionality that cannot be tested with older clusters.
mix test --include integration --include consumer_group --include server_0_p_9_p_0
Kafka >= 0.8.2 and < 0.9.0
Kafka 0.8.2 introduced the consumer group API.
mix test --include consumer_group --include integration
Kafka < 0.8.2
If your test cluster is older, the consumer group tests must be omitted.
mix test --include integration
Static analysis
This requires Elixir 1.3.2+.
mix dialyzer
Contributing
All contributions are managed through the kafkaex github repo.
If you find a bug or would like to contribute, please open an issue or submit a pull request. Please refer to CONTRIBUTING.md for our contribution process.
KafkaEx has a Slack channel: #kafkaex on elixir-lang.slack.com. You can request an invite via http://bit.ly/slackelixir. The Slack channel is appropriate for quick questions or general design discussions. The Slack discussion is archived at http://slack.elixirhq.com/kafkaex.
Summary
Functions
Returns the name of the consumer group for the given worker
create_worker creates KafkaEx workers
Get the offset of the earliest message still persistent in Kafka
Fetch a set of messages from Kafka from the given topic and partition ID
Get the offset of the latest message written to Kafka
Return metadata for the given topic; returns for all topics if topic is empty string
Get the offset of the message sent at the specified date/time
Produces batch messages to kafka logs
Produces messages to kafka logs (this is deprecated, use KafkaEx.produce/2 instead) Optional arguments(KeywordList)
- worker_name: the worker we want to run this metadata request through, when none is provided the default worker
:kafka_ex
is used - key: is used for partition assignment, can be nil, when none is provided it is defaulted to nil
- required_acks: indicates how many acknowledgements the servers should receive before responding to the request. If it is 0 the server will not send any response (this is the only case where the server will not reply to a request). If it is 1, the server will wait the data is written to the local log before sending a response. If it is -1 the server will block until the message is committed by all in sync replicas before sending a response. For any number > 1 the server will block waiting for this number of acknowledgements to occur (but the server will never wait for more acknowledgements than there are in-sync replicas), default is 0
- timeout: provides a maximum time in milliseconds the server can await the receipt of the number of acknowledgements in RequiredAcks, default is 100 milliseconds
- compression: specifies the compression type (:none, :snappy, :gzip)
Example
iex> KafkaEx.produce("bar", 0, "hey")
:ok
iex> KafkaEx.produce("foo", 0, "hey", [worker_name: :pr, required_acks: 1])
{:ok, 9771}
Called when an application is started
Returns a stream that consumes fetched messages. This puts the specified worker in streaming mode and blocks the worker indefinitely. The handler is a normal GenEvent handler so you can supply a custom handler, otherwise a default handler is used
Returns true if the input is a valid consumer group or :no_consumer_group
Types
ssl_options() :: [cacertfile: binary, certfile: binary, keyfile: binary, password: binary]
worker_setting :: {:uris, uri} | {:consumer_group, binary | :no_consumer_group} | {:sync_timeout, non_neg_integer} | {:metadata_update_interval, non_neg_integer} | {:consumer_group_update_interval, non_neg_integer} | {:ssl_options, ssl_options}
Functions
consumer_group(atom | pid) :: binary | :no_consumer_group
Returns the name of the consumer group for the given worker.
Worker may be an atom or pid. The default worker is used by default.
consumer_group_metadata(atom, binary) :: KafkaEx.Protocol.ConsumerMetadata.Response.t
create_worker(atom, KafkaEx.worker_init) :: Supervisor.on_start_child
create_worker creates KafkaEx workers
Optional arguments(KeywordList)
- consumer_group: Name of the group of consumers,
:no_consumer_group
should be passed for Kafka < 0.8.2, defaults toApplication.get_env(:kafka_ex, :consumer_group)
- uris: List of brokers in
{"host", port}
form, defaults toApplication.get_env(:kafka_ex, :brokers)
- metadata_update_interval: How often
kafka_ex
would update the Kafka cluster metadata information in milliseconds, default is 30000 - consumer_group_update_interval: How often
kafka_ex
would update the Kafka cluster consumer_groups information in milliseconds, default is 30000 - sync_timeout: Timeout for synchronous requests to kafka in milliseconds, default is 1000
- use_ssl: Boolean flag specifying if ssl should be used for the connection by the worker to kafka, default is false
- ssl_options: see SSL OPTION DESCRIPTIONS - CLIENT SIDE at http://erlang.org/doc/man/ssl.html, default is []
Returns {:error, error_description}
on invalid arguments
Example
iex> KafkaEx.create_worker(:pr) # where :pr is the name of the worker created
{:ok, #PID<0.171.0>}
iex> KafkaEx.create_worker(:pr, uris: [{"localhost", 9092}])
{:ok, #PID<0.172.0>}
iex> KafkaEx.create_worker(:pr, [uris: [{"localhost", 9092}], consumer_group: "foo"])
{:ok, #PID<0.173.0>}
iex> KafkaEx.create_worker(:pr, [uris: [{"localhost", 9092}], consumer_group: "foo", sync_timeout: 2000])
{:ok, #PID<0.173.0>}
iex> KafkaEx.create_worker(:pr, consumer_group: nil)
{:error, :invalid_consumer_group}
earliest_offset(binary, integer, atom | pid) :: [KafkaEx.Protocol.Offset.Response.t] | :topic_not_found
Get the offset of the earliest message still persistent in Kafka
Example
iex> KafkaEx.earliest_offset("foo", 0)
[%KafkaEx.Protocol.Offset.Response{partition_offsets: [%{error_code: 0, offset: [0], partition: 0}], topic: "foo"}]
fetch(binary, number, Keyword.t) :: [KafkaEx.Protocol.Fetch.Response.t] | :topic_not_found
Fetch a set of messages from Kafka from the given topic and partition ID
Optional arguments(KeywordList)
- offset: When supplied the fetch would start from this offset, otherwise would start from the last committed offset of the consumer_group the worker belongs to. For Kafka < 0.8.2 you should explicitly specify this.
- worker_name: the worker we want to run this fetch request through. Default is :kafka_ex
- wait_time: maximum amount of time in milliseconds to block waiting if insufficient data is available at the time the request is issued. Default is 10
- min_bytes: minimum number of bytes of messages that must be available to give a response. If the client sets this to 0 the server will always respond immediately, however if there is no new data since their last request they will just get back empty message sets. If this is set to 1, the server will respond as soon as at least one partition has at least 1 byte of data or the specified timeout occurs. By setting higher values in combination with the timeout the consumer can tune for throughput and trade a little additional latency for reading only large chunks of data (e.g. setting wait_time to 100 and setting min_bytes 64000 would allow the server to wait up to 100ms to try to accumulate 64k of data before responding). Default is 1
- max_bytes: maximum bytes to include in the message set for this partition. This helps bound the size of the response. Default is 1,000,000
- auto_commit: specifies if the last offset should be commited or not. Default is true. You must set this to false when using Kafka < 0.8.2 or
:no_consumer_group
.
Example
iex> KafkaEx.fetch("foo", 0, offset: 0)
[
%KafkaEx.Protocol.Fetch.Response{partitions: [
%{error_code: 0, hw_mark_offset: 1, message_set: [
%{attributes: 0, crc: 748947812, key: nil, offset: 0, value: "hey foo"}
], partition: 0}
], topic: "foo"}
]
latest_offset(binary, integer, atom | pid) :: [KafkaEx.Protocol.Offset.Response.t] | :topic_not_found
Get the offset of the latest message written to Kafka
Example
iex> KafkaEx.latest_offset("foo", 0)
[%KafkaEx.Protocol.Offset.Response{partition_offsets: [%{error_code: 0, offsets: [16], partition: 0}], topic: "foo"}]
Return metadata for the given topic; returns for all topics if topic is empty string
Optional arguments(KeywordList)
- worker_name: the worker we want to run this metadata request through, when none is provided the default worker
:kafka_ex
is used - topic: name of the topic for which metadata is requested, when none is provided all metadata is retrieved
Example
iex> KafkaEx.create_worker(:mt)
iex> KafkaEx.metadata(topic: "foo", worker_name: :mt)
%KafkaEx.Protocol.Metadata.Response{brokers: [%KafkaEx.Protocol.Metadata.Broker{host: "192.168.59.103",
node_id: 49162, port: 49162, socket: nil}],
topic_metadatas: [%KafkaEx.Protocol.Metadata.TopicMetadata{error_code: 0,
partition_metadatas: [%KafkaEx.Protocol.Metadata.PartitionMetadata{error_code: 0,
isrs: [49162], leader: 49162, partition_id: 0, replicas: [49162]}],
topic: "foo"}]}
offset(binary, number, :calendar.datetime | :earliest | :latest, atom | pid) :: [KafkaEx.Protocol.Offset.Response.t] | :topic_not_found
Get the offset of the message sent at the specified date/time
Example
iex> KafkaEx.offset("foo", 0, {{2015, 3, 29}, {23, 56, 40}}) # Note that the time specified should match/be ahead of time on the server that kafka runs
[%KafkaEx.Protocol.Offset.Response{partition_offsets: [%{error_code: 0, offset: [256], partition: 0}], topic: "foo"}]
offset_commit(atom, KafkaEx.Protocol.OffsetCommit.Request.t) :: KafkaEx.Protocol.OffsetCommit.Response.t
offset_fetch(atom, KafkaEx.Protocol.OffsetFetch.Request.t) :: [KafkaEx.Protocol.OffsetFetch.Response.t] | :topic_not_found
produce(KafkaEx.Protocol.Produce.Request.t, Keyword.t) :: nil | :ok | {:ok, integer} | {:error, :closed} | {:error, :inet.posix} | {:error, any} | iodata | :leader_not_available
Produces batch messages to kafka logs
Optional arguments(KeywordList)
- worker_name: the worker we want to run this metadata request through, when none is provided the default worker
:kafka_ex
is used
Example
iex> KafkaEx.produce(%KafkaEx.Protocol.Produce.Request{topic: "foo", partition: 0, required_acks: 1, messages: [%KafkaEx.Protocol.Produce.Message{value: "hey"}]})
{:ok, 9772}
iex> KafkaEx.produce(%KafkaEx.Protocol.Produce.Request{topic: "foo", partition: 0, required_acks: 1, messages: [%KafkaEx.Protocol.Produce.Message{value: "hey"}]}, worker_name: :pr)
{:ok, 9773}
produce(binary, number, binary, Keyword.t) :: nil | :ok | {:ok, integer} | {:error, :closed} | {:error, :inet.posix} | {:error, any} | iodata | :leader_not_available
Produces messages to kafka logs (this is deprecated, use KafkaEx.produce/2 instead) Optional arguments(KeywordList)
- worker_name: the worker we want to run this metadata request through, when none is provided the default worker
:kafka_ex
is used - key: is used for partition assignment, can be nil, when none is provided it is defaulted to nil
- required_acks: indicates how many acknowledgements the servers should receive before responding to the request. If it is 0 the server will not send any response (this is the only case where the server will not reply to a request). If it is 1, the server will wait the data is written to the local log before sending a response. If it is -1 the server will block until the message is committed by all in sync replicas before sending a response. For any number > 1 the server will block waiting for this number of acknowledgements to occur (but the server will never wait for more acknowledgements than there are in-sync replicas), default is 0
- timeout: provides a maximum time in milliseconds the server can await the receipt of the number of acknowledgements in RequiredAcks, default is 100 milliseconds
- compression: specifies the compression type (:none, :snappy, :gzip)
Example
iex> KafkaEx.produce("bar", 0, "hey")
:ok
iex> KafkaEx.produce("foo", 0, "hey", [worker_name: :pr, required_acks: 1])
{:ok, 9771}
Called when an application is started.
This function is called when an the application is started using
Application.start/2
(and functions on top of that, such as
Application.ensure_started/2
). This function should start the top-level
process of the application (which should be the top supervisor of the
application’s supervision tree if the application follows the OTP design
principles around supervision).
start_type
defines how the application is started:
:normal
- used if the startup is a normal startup or if the application is distributed and is started on the current node because of a failover from another mode and the application specification key:start_phases
is:undefined
.{:takeover, node}
- used if the application is distributed and is started on the current node because of a failover on the nodenode
.{:failover, node}
- used if the application is distributed and is started on the current node because of a failover on nodenode
, and the application specification key:start_phases
is not:undefined
.
start_args
are the arguments passed to the application in the :mod
specification key (e.g., mod: {MyApp, [:my_args]}
).
This function should either return {:ok, pid}
or {:ok, pid, state}
if
startup is successful. pid
should be the PID of the top supervisor. state
can be an arbitrary term, and if omitted will default to []
; if the
application is later stopped, state
is passed to the stop/1
callback (see
the documentation for the c:stop/1
callback for more information).
use Application
provides no default implementation for the start/2
callback.
Callback implementation for Application.start/2
.
Returns a stream that consumes fetched messages. This puts the specified worker in streaming mode and blocks the worker indefinitely. The handler is a normal GenEvent handler so you can supply a custom handler, otherwise a default handler is used.
This function should be used with care as the queue is unbounded and can cause OOM.
Optional arguments(KeywordList)
- worker_name: the worker we want to run this metadata request through, when none is provided the default worker
:kafka_ex
is used - offset: When supplied the fetch would start from this offset, otherwise would start from the last committed offset of the consumer_group the worker belongs to. For Kafka < 0.8.2 you should explicitly specify this.
- handler: the handler we want to handle the streaming events, when none is provided the default KafkaEx.Handler is used
- handler_init: initial state for the handler - leave the default value [] when using the default handler
- auto_commit: specifies if the last offset should be commited or not. Default is true. You must set this to false when using Kafka < 0.8.2 or
:no_consumer_group
.
Example
iex> KafkaEx.create_worker(:stream, [{"localhost", 9092}])
{:ok, #PID<0.196.0>}
iex> KafkaEx.produce("foo", 0, "hey", worker_name: :stream)
iex> KafkaEx.produce("foo", 0, "hi", worker_name: :stream)
iex> KafkaEx.stream("foo", 0) |> Enum.take(2)
[%{attributes: 0, crc: 4264455069, key: nil, offset: 0, value: "hey"},
%{attributes: 0, crc: 4251893211, key: nil, offset: 1, value: "hi"}]