kafka_ex v0.6.0 KafkaEx

KafkaEx

Build Status Hex.pm version Hex.pm downloads License API Docs

Apache Kafka (>= 0.8.0) client for Elixir/Erlang.

Usage

Add KafkaEx to your mix.exs dependencies:

defp deps do
  [{:kafka_ex, "~> 0.5.0"}]
end

Add KafkaEx to your mix.exs applications:

def application do
  [applications: [:kafka_ex]]
end

And run:

mix deps.get

Note If you wish to use snappy for compression or decompression, you must add snappy-erlang-nif to your project’s mix.exs. Also add snappy your application list, e.g:

def application do
  [applications: [:kafka_ex, :snappy]]
end

and to your deps list, e.g:

defp deps do
  [applications: [
   {:kafka_ex, "0.5.0"},
   {:snappy, git: "https://github.com/fdmanana/snappy-erlang-nif"}
  ]]
end

Configuration

See config/config.exs for a description of configuration variables, including the Kafka broker list and default consumer group. See http://elixir-lang.org/getting-started/mix-otp/distributed-tasks-and-configuration.html#application-environment-and-configuration for general info if you are unfamiliar with OTP application environments.

You can also override options when creating a worker, see below.

Create KafkaEx worker

iex> KafkaEx.create_worker(:pr) # where :pr is the process name of the created worker
{:ok, #PID<0.171.0>}

With custom options:

iex> uris = [{"localhost", 9092}, {"localhost", 9093}, {"localhost", 9094}]
[{"localhost", 9092}, {"localhost", 9093}, {"localhost", 9094}]
iex> KafkaEx.create_worker(:pr, [uris: uris, consumer_group: "kafka_ex", consumer_group_update_interval: 100])
{:ok, #PID<0.172.0>}

Create an unnamed KafkaEx worker

You may find you want to create many workers, say in conjunction with a poolboy pool. In this scenario you usually won’t want to name these worker processes.

To create an unnamed worked with create_worker:

iex> KafkaEx.create_worker(:no_name) # indicates to the server process not to name the process
{:ok, #PID<0.171.0>}

Using KafkaEx with a pooling library

Note that KafkaEx has a supervisor to manage its workers. If you are using Poolboy or a similar library, you will want to manually create a worker so that it is not supervised by KafkaEx.Supervisor. To do this, you will need to call:

GenServer.start_link(KafkaEx.Server,
  [
    [uris: Application.get_env(:kafka_ex, :brokers),
     consumer_group: Application.get_env(:kafka_ex, :consumer_group)],
    :no_name
  ]
)

Retrieve kafka metadata

For all metadata

iex> KafkaEx.metadata
%KafkaEx.Protocol.Metadata.Response{brokers: [%KafkaEx.Protocol.Metadata.Broker{host:
 "192.168.59.103",
   node_id: 49162, port: 49162, socket: nil}],
 topic_metadatas: [%KafkaEx.Protocol.Metadata.TopicMetadata{error_code: :no_error,
   partition_metadatas: [%KafkaEx.Protocol.Metadata.PartitionMetadata{error_code: :no_error,
     isrs: [49162], leader: 49162, partition_id: 0, replicas: [49162]}],
   topic: "LRCYFQDVWUFEIUCCTFGP"},
  %KafkaEx.Protocol.Metadata.TopicMetadata{error_code: :no_error,
   partition_metadatas: [%KafkaEx.Protocol.Metadata.PartitionMetadata{error_code: :no_error,
     isrs: [49162], leader: 49162, partition_id: 0, replicas: [49162]}],
   topic: "JSIMKCLQYTWXMSIGESYL"},
  %KafkaEx.Protocol.Metadata.TopicMetadata{error_code: :no_error,
   partition_metadatas: [%KafkaEx.Protocol.Metadata.PartitionMetadata{error_code: :no_error,
     isrs: [49162], leader: 49162, partition_id: 0, replicas: [49162]}],
   topic: "SCFRRXXLDFPOWSPQQMSD"},
  %KafkaEx.Protocol.Metadata.TopicMetadata{error_code: :no_error,
...

For a specific topic

iex> KafkaEx.metadata(topic: "foo")
%KafkaEx.Protocol.Metadata.Response{brokers: [%KafkaEx.Protocol.Metadata.Broker{host: "192.168.59.103",
   node_id: 49162, port: 49162, socket: nil}],
 topic_metadatas: [%KafkaEx.Protocol.Metadata.TopicMetadata{error_code: :no_error,
   partition_metadatas: [%KafkaEx.Protocol.Metadata.PartitionMetadata{error_code: :no_error,
     isrs: [49162], leader: 49162, partition_id: 0, replicas: [49162]}],
   topic: "foo"}]}

Retrieve offset from a particular time

Kafka will get the starting offset of the log segment that is created no later than the given timestamp. Due to this, and since the offset request is served only at segment granularity, the offset fetch request returns less accurate results for larger segment sizes.

iex> KafkaEx.offset("foo", 0, {{2015, 3, 29}, {23, 56, 40}}) # Note that the time specified should match/be ahead of time on the server that kafka runs
[%KafkaEx.Protocol.Offset.Response{partition_offsets: [%{error_code: :no_error, offset: [256], partition: 0}], topic: "foo"}]

Retrieve the latest offset

iex> KafkaEx.latest_offset("foo", 0) # where 0 is the partition
[%KafkaEx.Protocol.Offset.Response{partition_offsets: [%{error_code: :no_error, offsets: [16], partition: 0}], topic: "foo"}]

Retrieve the earliest offset

iex> KafkaEx.earliest_offset("foo", 0) # where 0 is the partition
[%KafkaEx.Protocol.Offset.Response{partition_offsets: [%{error_code: :no_error, offset: [0], partition: 0}], topic: "foo"}]

Fetch kafka logs

NOTE You must pass auto_commit: false in the options for fetch/3 when using Kafka < 0.8.2 or when using :no_consumer_group.

iex> KafkaEx.fetch("foo", 0, offset: 5) # where 0 is the partition and 5 is the offset we want to start fetching from
[%KafkaEx.Protocol.Fetch.Response{partitions: [%{error_code: :no_error,
     hw_mark_offset: 115,
     message_set: [
      %KafkaEx.Protocol.Fetch.Message{attributes: 0, crc: 4264455069, key: nil, offset: 5, value: "hey"},
      %KafkaEx.Protocol.Fetch.Message{attributes: 0, crc: 4264455069, key: nil, offset: 6, value: "hey"},
      %KafkaEx.Protocol.Fetch.Message{attributes: 0, crc: 4264455069, key: nil, offset: 7, value: "hey"},
      %KafkaEx.Protocol.Fetch.Message{attributes: 0, crc: 4264455069, key: nil, offset: 8, value: "hey"},
      %KafkaEx.Protocol.Fetch.Message{attributes: 0, crc: 4264455069, key: nil, offset: 9, value: "hey"}
...], partition: 0}], topic: "foo"}]

Produce kafka logs

iex> KafkaEx.produce("foo", 0, "hey") # where "foo" is the topic and "hey" is the message
:ok

Stream kafka logs

NOTE You must pass auto_commit: false in the options for stream/3 when using Kafka < 0.8.2 or when using :no_consumer_group.

iex> KafkaEx.create_worker(:stream, [uris: [{"localhost", 9092}]])
{:ok, #PID<0.196.0>}
iex> KafkaEx.produce("foo", 0, "hey", worker_name: :stream)
:ok
iex> KafkaEx.produce("foo", 0, "hi", worker_name: :stream)
:ok
iex> KafkaEx.stream("foo", 0, offset: 0) |> Enum.take(2)
[%{attributes: 0, crc: 4264455069, key: nil, offset: 0, value: "hey"},
 %{attributes: 0, crc: 4251893211, key: nil, offset: 1, value: "hi"}]

As mentioned, for Kafka < 0.8.2 the stream/3 requires autocommit: false

iex> KafkaEx.stream("foo", 0, offset: 0, auto_commit: false) |> Enum.take(2)

Compression

Snappy and gzip compression is supported. Example usage for producing compressed messages:

message1 = %KafkaEx.Protocol.Produce.Message{value: "value 1"}
message2 = %KafkaEx.Protocol.Produce.Message{key: "key 2", value: "value 2"}
messages = [message1, message2]

#snappy
produce_request = %KafkaEx.Protocol.Produce.Request{
  topic: "test_topic",
  partition: 0,
  required_acks: 1,
  compression: :snappy,
  messages: messages}
KafkaEx.produce(produce_request)

#gzip
produce_request = %KafkaEx.Protocol.Produce.Request{
  topic: "test_topic",
  partition: 0,
  required_acks: 1,
  compression: :gzip,
  messages: messages}
KafkaEx.produce(produce_request)

Compression is handled automatically on the consuming/fetching end.

Test

Unit tests

mix test --no-start

Integration tests

Add the broker config to config/config.exs and run:

Kafka >= 0.8.2
mix test --only consumer_group --only integration
Kafka < 0.8.2
mix test --only integration

All tests

Kafka >= 0.8.2
mix test --include consumer_group --include integration
Kafka < 0.8.2
mix test --include integration

Static analysis

mix dialyze --unmatched-returns --error-handling --race-conditions --underspecs

Contributing

All contributions are managed through the kafkaex github repo.

If you find a bug or would like to contribute, please open an issue or submit a pull request. Please refer to CONTRIBUTING.md for our contribution process.

KafkaEx has a Slack channel: #kafkaex on elixir-lang.slack.com. You can request an invite via http://bit.ly/slackelixir. The Slack channel is appropriate for quick questions or general design discussions. The Slack discussion is archived at http://slack.elixirhq.com/kafkaex.

Summary

Functions

Returns the name of the consumer group for the given worker

create_worker creates KafkaEx workers

Get the offset of the earliest message still persistent in Kafka

Fetch a set of messages from Kafka from the given topic and partition ID

Get the offset of the latest message written to Kafka

Return metadata for the given topic; returns for all topics if topic is empty string

Get the offset of the message sent at the specified date/time

Produces batch messages to kafka logs

Produces messages to kafka logs (this is deprecated, use KafkaEx.produce/2 instead) Optional arguments(KeywordList)

  • worker_name: the worker we want to run this metadata request through, when none is provided the default worker :kafka_ex is used
  • key: is used for partition assignment, can be nil, when none is provided it is defaulted to nil
  • required_acks: indicates how many acknowledgements the servers should receive before responding to the request. If it is 0 the server will not send any response (this is the only case where the server will not reply to a request). If it is 1, the server will wait the data is written to the local log before sending a response. If it is -1 the server will block until the message is committed by all in sync replicas before sending a response. For any number > 1 the server will block waiting for this number of acknowledgements to occur (but the server will never wait for more acknowledgements than there are in-sync replicas), default is 0
  • timeout: provides a maximum time in milliseconds the server can await the receipt of the number of acknowledgements in RequiredAcks, default is 100 milliseconds
  • compression: specifies the compression type (:none, :snappy, :gzip)

Example

iex> KafkaEx.produce("bar", 0, "hey")
:ok
iex> KafkaEx.produce("foo", 0, "hey", [worker_name: :pr, required_acks: 1])
{:ok, 9771}

Callback implementation for c:Application.start/2

Returns a stream that consumes fetched messages. This puts the specified worker in streaming mode and blocks the worker indefinitely. The handler is a normal GenEvent handler so you can supply a custom handler, otherwise a default handler is used

Returns true if the input is a valid consumer group or :no_consumer_group

Types

ssl_options :: [ssl_ca_cert_file: binary, ssl_cert_file: binary, ssl_cert_key_file: binary, password: binary]
uri :: [{binary | charlist, number}]
worker_setting ::
  {:uris, uri} |
  {:consumer_group, binary | :no_consumer_group} |
  {:sync_timeout, non_neg_integer} |
  {:metadata_update_interval, non_neg_integer} |
  {:consumer_group_update_interval, non_neg_integer} |
  {:ssl_options, ssl_options}

Functions

consumer_group(worker \\ Config.default_worker())

Specs

consumer_group(atom | pid) ::
  binary |
  :no_consumer_group

Returns the name of the consumer group for the given worker.

Worker may be an atom or pid. The default worker is used by default.

consumer_group_metadata(worker_name, supplied_consumer_group)

Specs

consumer_group_metadata(atom, binary) :: KafkaEx.Protocol.ConsumerMetadata.t
create_worker(name, worker_init \\ [])

Specs

create_worker creates KafkaEx workers

Optional arguments(KeywordList)

  • consumer_group: Name of the group of consumers, :no_consumer_group should be passed for Kafka < 0.8.2, defaults to Application.get_env(:kafka_ex, :consumer_group)
  • uris: List of brokers in {"host", port} form, defaults to Application.get_env(:kafka_ex, :brokers)
  • metadata_update_interval: How often kafka_ex would update the Kafka cluster metadata information in milliseconds, default is 30000
  • consumer_group_update_interval: How often kafka_ex would update the Kafka cluster consumer_groups information in milliseconds, default is 30000
  • sync_timeout: Timeout for synchronous requests to kafka in milliseconds, default is 1000
  • use_ssl: Boolean flag specifying if ssl should be used for the connection by the worker to kafka, default is false
  • ssl_options: see SSL OPTION DESCRIPTIONS - CLIENT SIDE at http://erlang.org/doc/man/ssl.html, default is []

Returns {:error, error_description} on invalid arguments

Example

iex> KafkaEx.create_worker(:pr) # where :pr is the name of the worker created
{:ok, #PID<0.171.0>}
iex> KafkaEx.create_worker(:pr, uris: [{"localhost", 9092}])
{:ok, #PID<0.172.0>}
iex> KafkaEx.create_worker(:pr, [uris: [{"localhost", 9092}], consumer_group: "foo"])
{:ok, #PID<0.173.0>}
iex> KafkaEx.create_worker(:pr, [uris: [{"localhost", 9092}], consumer_group: "foo", sync_timeout: 2000])
{:ok, #PID<0.173.0>}
iex> KafkaEx.create_worker(:pr, consumer_group: nil)
{:error, :invalid_consumer_group}
earliest_offset(topic, partition, name \\ Config.default_worker())

Specs

earliest_offset(binary, integer, atom | pid) ::
  [KafkaEx.Protocol.Offset.Response.t] |
  :topic_not_found

Get the offset of the earliest message still persistent in Kafka

Example

iex> KafkaEx.earliest_offset("foo", 0)
[%KafkaEx.Protocol.Offset.Response{partition_offsets: [%{error_code: 0, offset: [0], partition: 0}], topic: "foo"}]
fetch(topic, partition, opts \\ [])

Specs

fetch(binary, number, Keyword.t) ::
  [KafkaEx.Protocol.Fetch.Response.t] |
  :topic_not_found

Fetch a set of messages from Kafka from the given topic and partition ID

Optional arguments(KeywordList)

  • offset: When supplied the fetch would start from this offset, otherwise would start from the last committed offset of the consumer_group the worker belongs to. For Kafka < 0.8.2 you should explicitly specify this.
  • worker_name: the worker we want to run this fetch request through. Default is :kafka_ex
  • wait_time: maximum amount of time in milliseconds to block waiting if insufficient data is available at the time the request is issued. Default is 10
  • min_bytes: minimum number of bytes of messages that must be available to give a response. If the client sets this to 0 the server will always respond immediately, however if there is no new data since their last request they will just get back empty message sets. If this is set to 1, the server will respond as soon as at least one partition has at least 1 byte of data or the specified timeout occurs. By setting higher values in combination with the timeout the consumer can tune for throughput and trade a little additional latency for reading only large chunks of data (e.g. setting wait_time to 100 and setting min_bytes 64000 would allow the server to wait up to 100ms to try to accumulate 64k of data before responding). Default is 1
  • max_bytes: maximum bytes to include in the message set for this partition. This helps bound the size of the response. Default is 1,000,000
  • auto_commit: specifies if the last offset should be commited or not. Default is true. You must set this to false when using Kafka < 0.8.2 or :no_consumer_group.

Example

iex> KafkaEx.fetch("foo", 0, offset: 0)
[
  %KafkaEx.Protocol.Fetch.Response{partitions: [
    %{error_code: 0, hw_mark_offset: 1, message_set: [
      %{attributes: 0, crc: 748947812, key: nil, offset: 0, value: "hey foo"}
    ], partition: 0}
  ], topic: "foo"}
]
latest_offset(topic, partition, name \\ Config.default_worker())

Specs

latest_offset(binary, integer, atom | pid) ::
  [KafkaEx.Protocol.Offset.Response.t] |
  :topic_not_found

Get the offset of the latest message written to Kafka

Example

iex> KafkaEx.latest_offset("foo", 0)
[%KafkaEx.Protocol.Offset.Response{partition_offsets: [%{error_code: 0, offsets: [16], partition: 0}], topic: "foo"}]
metadata(opts \\ [])

Specs

metadata(Keyword.t) :: KafkaEx.Protocol.Metadata.Response.t

Return metadata for the given topic; returns for all topics if topic is empty string

Optional arguments(KeywordList)

  • worker_name: the worker we want to run this metadata request through, when none is provided the default worker :kafka_ex is used
  • topic: name of the topic for which metadata is requested, when none is provided all metadata is retrieved

Example

iex> KafkaEx.create_worker(:mt)
iex> KafkaEx.metadata(topic: "foo", worker_name: :mt)
%KafkaEx.Protocol.Metadata.Response{brokers: [%KafkaEx.Protocol.Metadata.Broker{host: "192.168.59.103",
   node_id: 49162, port: 49162, socket: nil}],
 topic_metadatas: [%KafkaEx.Protocol.Metadata.TopicMetadata{error_code: 0,
   partition_metadatas: [%KafkaEx.Protocol.Metadata.PartitionMetadata{error_code: 0,
     isrs: [49162], leader: 49162, partition_id: 0, replicas: [49162]}],
   topic: "foo"}]}
offset(topic, partition, time, name \\ Config.default_worker())

Specs

offset(binary, number, :calendar.datetime | atom, atom | pid) ::
  [KafkaEx.Protocol.Offset.Response.t] |
  :topic_not_found

Get the offset of the message sent at the specified date/time

Example

iex> KafkaEx.offset("foo", 0, {{2015, 3, 29}, {23, 56, 40}}) # Note that the time specified should match/be ahead of time on the server that kafka runs
[%KafkaEx.Protocol.Offset.Response{partition_offsets: [%{error_code: 0, offset: [256], partition: 0}], topic: "foo"}]
offset_commit(worker_name, offset_commit_request)

Specs

offset_commit(atom, OffsetCommitRequest.t) :: KafkaEx.Protocol.OffsetCommit.Response.t
offset_fetch(worker_name, offset_fetch_request)

Specs

offset_fetch(atom, KafkaEx.Protocol.OffsetFetch.Request.t) ::
  [KafkaEx.Protocol.OffsetFetch.Response.t] |
  :topic_not_found
produce(produce_request, opts \\ [])

Specs

produce(KafkaEx.Protocol.Produce.Request.t, Keyword.t) ::
  nil |
  :ok |
  {:ok, integer} |
  {:error, :closed} |
  {:error, :inet.posix} |
  {:error, any} |
  iodata |
  :leader_not_available

Produces batch messages to kafka logs

Optional arguments(KeywordList)

  • worker_name: the worker we want to run this metadata request through, when none is provided the default worker :kafka_ex is used

Example

iex> KafkaEx.produce(%KafkaEx.Protocol.Produce.Request{topic: "foo", partition: 0, required_acks: 1, messages: [%KafkaEx.Protocol.Produce.Message{value: "hey"}]})
{:ok, 9772}
iex> KafkaEx.produce(%KafkaEx.Protocol.Produce.Request{topic: "foo", partition: 0, required_acks: 1, messages: [%KafkaEx.Protocol.Produce.Message{value: "hey"}]}, worker_name: :pr)
{:ok, 9773}
produce(topic, partition, value, opts \\ [])

Specs

produce(binary, number, binary, Keyword.t) ::
  nil |
  :ok |
  {:ok, integer} |
  {:error, :closed} |
  {:error, :inet.posix} |
  {:error, any} |
  iodata |
  :leader_not_available

Produces messages to kafka logs (this is deprecated, use KafkaEx.produce/2 instead) Optional arguments(KeywordList)

  • worker_name: the worker we want to run this metadata request through, when none is provided the default worker :kafka_ex is used
  • key: is used for partition assignment, can be nil, when none is provided it is defaulted to nil
  • required_acks: indicates how many acknowledgements the servers should receive before responding to the request. If it is 0 the server will not send any response (this is the only case where the server will not reply to a request). If it is 1, the server will wait the data is written to the local log before sending a response. If it is -1 the server will block until the message is committed by all in sync replicas before sending a response. For any number > 1 the server will block waiting for this number of acknowledgements to occur (but the server will never wait for more acknowledgements than there are in-sync replicas), default is 0
  • timeout: provides a maximum time in milliseconds the server can await the receipt of the number of acknowledgements in RequiredAcks, default is 100 milliseconds
  • compression: specifies the compression type (:none, :snappy, :gzip)

Example

iex> KafkaEx.produce("bar", 0, "hey")
:ok
iex> KafkaEx.produce("foo", 0, "hey", [worker_name: :pr, required_acks: 1])
{:ok, 9771}
start(type, args)

Callback implementation for c:Application.start/2.

stop_streaming(opts \\ [])

Specs

stop_streaming(Keyword.t) :: :stop_streaming
stream(topic, partition, opts \\ [])

Specs

stream(binary, number, Keyword.t) :: GenEvent.Stream.t

Returns a stream that consumes fetched messages. This puts the specified worker in streaming mode and blocks the worker indefinitely. The handler is a normal GenEvent handler so you can supply a custom handler, otherwise a default handler is used.

This function should be used with care as the queue is unbounded and can cause OOM.

Optional arguments(KeywordList)

  • worker_name: the worker we want to run this metadata request through, when none is provided the default worker :kafka_ex is used
  • offset: When supplied the fetch would start from this offset, otherwise would start from the last committed offset of the consumer_group the worker belongs to. For Kafka < 0.8.2 you should explicitly specify this.
  • handler: the handler we want to handle the streaming events, when none is provided the default KafkaEx.Handler is used
  • handler_init: initial state for the handler - leave the default value [] when using the default handler
  • auto_commit: specifies if the last offset should be commited or not. Default is true. You must set this to false when using Kafka < 0.8.2 or :no_consumer_group.

Example

iex> KafkaEx.create_worker(:stream, [{"localhost", 9092}])
{:ok, #PID<0.196.0>}
iex> KafkaEx.produce("foo", 0, "hey", worker_name: :stream)
iex> KafkaEx.produce("foo", 0, "hi", worker_name: :stream)
iex> KafkaEx.stream("foo", 0) |> Enum.take(2)
[%{attributes: 0, crc: 4264455069, key: nil, offset: 0, value: "hey"},
 %{attributes: 0, crc: 4251893211, key: nil, offset: 1, value: "hi"}]
valid_consumer_group?(b)

Specs

valid_consumer_group?(any) :: boolean

Returns true if the input is a valid consumer group or :no_consumer_group