kafka_ex v0.9.0 KafkaEx View Source

KafkaEx

Build Status Coverage Status Hex.pm version Hex.pm downloads License API Docs

KafkaEx is an Elixir client for Apache Kafka with support for Kafka versions 0.8.0 and newer. KafkaEx requires Elixir 1.1.1+ and Erlang OTP 18+.

See http://hexdocs.pm/kafka_ex/ for documentation, https://github.com/kafkaex/kafka_ex/ for code.

KakfaEx supports the following Kafka features:

  • Broker and Topic Metadata
  • Produce Messages
  • Fetch Messages
  • Message Compression with Snappy and gzip
  • Offset Management (fetch / commit / autocommit)
  • Consumer Groups

See Kafka Protocol Documentation and A Guide to the Kafka Protocol for details of these features.

Using KafkaEx in an Elixir project

The standard approach for adding dependencies to an Elixir application applies: add KafkaEx to the deps list in your project’s mix.exs file. You may also optionally add snappy-erlang-nif (required only if you want to use snappy compression).

# mix.exs
defmodule MyApp.Mixfile do
  # ...

  defp deps do
    [
      # add to your existing deps
      {:kafka_ex, "~> 0.9.0"},
      # if using snappy compression
      {:snappy, git: "https://github.com/fdmanana/snappy-erlang-nif"}
    ]
  end
end

Then run mix deps.get to fetch dependencies.

Adding kafka_ex application

When using elixir < 1.4, you will need to add kafka_ex to the applications list of your mix.exs file.

# mix.exs
defmodule MyApp.Mixfile do
  # ...

  def application do
    [
      mod: {MyApp, []},
      applications: [
        # add to existing apps - :logger, etc..
        :kafka_ex,
        :snappy # if using snappy compression
      ]
    ]
  end
end

Configuration

See config/config.exs or KafkaEx.Config for a description of configuration variables, including the Kafka broker list and default consumer group.

You can also override options when creating a worker, see below.

Usage Examples

Consumer Groups

To use a consumer group, first implement a handler module using KafkaEx.GenConsumer.

defmodule ExampleGenConsumer do
  use KafkaEx.GenConsumer

  alias KafkaEx.Protocol.Fetch.Message

  require Logger

  # note - messages are delivered in batches
  def handle_message_set(message_set, state) do
    for %Message{value: message} <- message_set do
      Logger.debug(fn -> "message: " <> inspect(message) end)
    end
    {:async_commit, state}
  end
end

Then add a KafkaEx.ConsumerGroup to your application’s supervision tree and configure it to use the implementation module.

See the KafkaEx.GenConsumer and KafkaEx.ConsumerGroup documentation for details.

Create a KafkaEx Worker

KafkaEx worker processes manage the state of the connection to the Kafka broker.

iex> KafkaEx.create_worker(:pr) # where :pr is the process name of the created worker
{:ok, #PID<0.171.0>}

With custom options:

iex> uris = [{"localhost", 9092}, {"localhost", 9093}, {"localhost", 9094}]
[{"localhost", 9092}, {"localhost", 9093}, {"localhost", 9094}]
iex> KafkaEx.create_worker(:pr, [uris: uris, consumer_group: "kafka_ex", consumer_group_update_interval: 100])
{:ok, #PID<0.172.0>}

Create an unnamed KafkaEx worker

You may find you want to create many workers, say in conjunction with a poolboy pool. In this scenario you usually won’t want to name these worker processes.

To create an unnamed worked with create_worker:

iex> KafkaEx.create_worker(:no_name) # indicates to the server process not to name the process
{:ok, #PID<0.171.0>}

Use KafkaEx with a pooling library

Note that KafkaEx has a supervisor to manage its workers. If you are using Poolboy or a similar library, you will want to manually create a worker so that it is not supervised by KafkaEx.Supervisor. To do this, you will need to call:

GenServer.start_link(KafkaEx.Config.server_impl,
  [
    [uris: KafkaEx.Config.brokers(),
     consumer_group: Application.get_env(:kafka_ex, :consumer_group)],
    :no_name
  ]
)

Retrieve kafka metadata

For all metadata

iex> KafkaEx.metadata
%KafkaEx.Protocol.Metadata.Response{brokers: [%KafkaEx.Protocol.Metadata.Broker{host:
 "192.168.59.103",
   node_id: 49162, port: 49162, socket: nil}],
 topic_metadatas: [%KafkaEx.Protocol.Metadata.TopicMetadata{error_code: :no_error,
   partition_metadatas: [%KafkaEx.Protocol.Metadata.PartitionMetadata{error_code: :no_error,
     isrs: [49162], leader: 49162, partition_id: 0, replicas: [49162]}],
   topic: "LRCYFQDVWUFEIUCCTFGP"},
  %KafkaEx.Protocol.Metadata.TopicMetadata{error_code: :no_error,
   partition_metadatas: [%KafkaEx.Protocol.Metadata.PartitionMetadata{error_code: :no_error,
     isrs: [49162], leader: 49162, partition_id: 0, replicas: [49162]}],
   topic: "JSIMKCLQYTWXMSIGESYL"},
  %KafkaEx.Protocol.Metadata.TopicMetadata{error_code: :no_error,
   partition_metadatas: [%KafkaEx.Protocol.Metadata.PartitionMetadata{error_code: :no_error,
     isrs: [49162], leader: 49162, partition_id: 0, replicas: [49162]}],
   topic: "SCFRRXXLDFPOWSPQQMSD"},
  %KafkaEx.Protocol.Metadata.TopicMetadata{error_code: :no_error,
...

For a specific topic

iex> KafkaEx.metadata(topic: "foo")
%KafkaEx.Protocol.Metadata.Response{brokers: [%KafkaEx.Protocol.Metadata.Broker{host: "192.168.59.103",
   node_id: 49162, port: 49162, socket: nil}],
 topic_metadatas: [%KafkaEx.Protocol.Metadata.TopicMetadata{error_code: :no_error,
   partition_metadatas: [%KafkaEx.Protocol.Metadata.PartitionMetadata{error_code: :no_error,
     isrs: [49162], leader: 49162, partition_id: 0, replicas: [49162]}],
   topic: "foo"}]}

Retrieve offset from a particular time

Kafka will get the starting offset of the log segment that is created no later than the given timestamp. Due to this, and since the offset request is served only at segment granularity, the offset fetch request returns less accurate results for larger segment sizes.

iex> KafkaEx.offset("foo", 0, {{2015, 3, 29}, {23, 56, 40}}) # Note that the time specified should match/be ahead of time on the server that kafka runs
[%KafkaEx.Protocol.Offset.Response{partition_offsets: [%{error_code: :no_error, offset: [256], partition: 0}], topic: "foo"}]

Retrieve the latest offset

iex> KafkaEx.latest_offset("foo", 0) # where 0 is the partition
[%KafkaEx.Protocol.Offset.Response{partition_offsets: [%{error_code: :no_error, offsets: [16], partition: 0}], topic: "foo"}]

Retrieve the earliest offset

iex> KafkaEx.earliest_offset("foo", 0) # where 0 is the partition
[%KafkaEx.Protocol.Offset.Response{partition_offsets: [%{error_code: :no_error, offset: [0], partition: 0}], topic: "foo"}]

Fetch kafka logs

NOTE You must pass auto_commit: false in the options for fetch/3 when using Kafka < 0.8.2 or when using :no_consumer_group.

iex> KafkaEx.fetch("foo", 0, offset: 5) # where 0 is the partition and 5 is the offset we want to start fetching from
[%KafkaEx.Protocol.Fetch.Response{partitions: [%{error_code: :no_error,
     hw_mark_offset: 115,
     message_set: [
      %KafkaEx.Protocol.Fetch.Message{attributes: 0, crc: 4264455069, key: nil, offset: 5, value: "hey"},
      %KafkaEx.Protocol.Fetch.Message{attributes: 0, crc: 4264455069, key: nil, offset: 6, value: "hey"},
      %KafkaEx.Protocol.Fetch.Message{attributes: 0, crc: 4264455069, key: nil, offset: 7, value: "hey"},
      %KafkaEx.Protocol.Fetch.Message{attributes: 0, crc: 4264455069, key: nil, offset: 8, value: "hey"},
      %KafkaEx.Protocol.Fetch.Message{attributes: 0, crc: 4264455069, key: nil, offset: 9, value: "hey"}
...], partition: 0}], topic: "foo"}]

Produce kafka logs

iex> KafkaEx.produce("foo", 0, "hey") # where "foo" is the topic and "hey" is the message
:ok

Stream kafka logs

See the KafkaEx.stream/3 documentation for details on streaming.

iex> KafkaEx.produce("foo", 0, "hey")
:ok
iex> KafkaEx.produce("foo", 0, "hi")
:ok
iex> KafkaEx.stream("foo", 0, offset: 0) |> Enum.take(2)
[%{attributes: 0, crc: 4264455069, key: nil, offset: 0, value: "hey"},
 %{attributes: 0, crc: 4251893211, key: nil, offset: 1, value: "hi"}]

For Kafka < 0.8.2 the stream/3 requires auto_commit: false

iex> KafkaEx.stream("foo", 0, offset: 0, auto_commit: false) |> Enum.take(2)

Compression

Snappy and gzip compression is supported. Example usage for producing compressed messages:

message1 = %KafkaEx.Protocol.Produce.Message{value: "value 1"}
message2 = %KafkaEx.Protocol.Produce.Message{key: "key 2", value: "value 2"}
messages = [message1, message2]

#snappy
produce_request = %KafkaEx.Protocol.Produce.Request{
  topic: "test_topic",
  partition: 0,
  required_acks: 1,
  compression: :snappy,
  messages: messages}
KafkaEx.produce(produce_request)

#gzip
produce_request = %KafkaEx.Protocol.Produce.Request{
  topic: "test_topic",
  partition: 0,
  required_acks: 1,
  compression: :gzip,
  messages: messages}
KafkaEx.produce(produce_request)

Compression is handled automatically on the consuming/fetching end.

Testing

It is strongly recommended to test using the Dockerized test cluster described below. This is required for contributions to KafkaEx.

NOTE You may have to run the test suite twice to get tests to pass. Due to asynchronous issues, the test suite sometimes fails on the first try.

Dockerized Test Cluster

Testing KafkaEx requires a local SSL-enabled Kafka cluster with 3 nodes: one node listening on each port 9092, 9093, and 9093. The easiest way to do this is using the scripts in this repository that utilize Docker and Docker Compose (both of which are freely available). This is the method we use for our CI testing of KafkaEx.

To launch the included test cluster, run

./scripts/docker_up.sh

The docker_up.sh script will attempt to determine an IP address for your computer on an active network interface. If it has trouble with this, you can try manually specifying a network interface in the IP_IFACE environment variable:

IP_IFACE=eth0 ./scripts/docker_up.sh

The test cluster runs Kafka 0.10.1.0.

Running the KafkaEx Tests

The KafkaEx tests are split up using tags to handle testing multiple scenarios and Kafka versions.

Unit tests

These tests do not require a Kafka cluster to be running (see test/test_helper.exs:3 for the tags excluded when running this).

mix test --no-start

Integration tests

If you are not using the Docker test cluster, you may need to modify config/config.exs for your set up.

The full test suite requires Kafka 0.10.1.0+.

Kafka >= 0.9.0

The 0.9 client includes functionality that cannot be tested with older clusters.

./all_tests.sh
Kafka >= 0.9.0

The 0.9 client includes functionality that cannot be tested with older clusters.

mix test --include integration --include consumer_group --include server_0_p_9_p_0
Kafka >= 0.8.2 and < 0.9.0

Kafka 0.8.2 introduced the consumer group API.

mix test --include consumer_group --include integration
Kafka < 0.8.2

If your test cluster is older, the consumer group tests must be omitted.

mix test --include integration --include server_0_p_8_p_0

Static analysis

This requires Elixir 1.3.2+.

mix dialyzer

Contributing

All contributions are managed through the kafkaex github repo.

If you find a bug or would like to contribute, please open an issue or submit a pull request. Please refer to CONTRIBUTING.md for our contribution process.

KafkaEx has a Slack channel: #kafkaex on elixir-lang.slack.com. You can request an invite via http://bit.ly/slackelixir. The Slack channel is appropriate for quick questions or general design discussions. The Slack discussion is archived at http://slack.elixirhq.com/kafkaex.

Link to this section Summary

Functions

Retrieve supported api versions for each api key

Builds options to be used with workers

Returns the name of the consumer group for the given worker

Create topics. Must provide a list of CreateTopicsRequest, each containing all the information needed for the creation of a new topic

create_worker creates KafkaEx workers

Get the offset of the earliest message still persistent in Kafka

Fetch a set of messages from Kafka from the given topic and partition ID

Sends a heartbeat to maintain membership in a consumer group

Sends a request to join a consumer group

Get the offset of the latest message written to Kafka

Sends a request to leave a consumer group

Return metadata for the given topic; returns for all topics if topic is empty string

Get the offset of the message sent at the specified date/time

Produces batch messages to kafka logs

Produces messages to kafka logs (this is deprecated, use KafkaEx.produce/2 instead) Optional arguments(KeywordList)

  • worker_name: the worker we want to run this metadata request through, when none is provided the default worker :kafka_ex is used
  • key: is used for partition assignment, can be nil, when none is provided it is defaulted to nil
  • required_acks: indicates how many acknowledgements the servers should receive before responding to the request. If it is 0 the server will not send any response (this is the only case where the server will not reply to a request). If it is 1, the server will wait the data is written to the local log before sending a response. If it is -1 the server will block until the message is committed by all in sync replicas before sending a response. For any number > 1 the server will block waiting for this number of acknowledgements to occur (but the server will never wait for more acknowledgements than there are in-sync replicas), default is 0
  • timeout: provides a maximum time in milliseconds the server can await the receipt of the number of acknowledgements in RequiredAcks, default is 100 milliseconds
  • compression: specifies the compression type (:none, :snappy, :gzip)

Called when an application is started

Stop a worker created with create_worker/2

Returns a streamable struct that may be used for consuming messages

Sends a request to synchronize with a consumer group

Returns true if the input is a valid consumer group or :no_consumer_group

Link to this section Types

Link to this type ssl_options() View Source
ssl_options() :: [
  cacertfile: binary(),
  certfile: binary(),
  keyfile: binary(),
  password: binary()
]
Link to this type uri() View Source
uri() :: [{binary() | [char()], number()}]
Link to this type worker_init() View Source
worker_init() :: [worker_setting()]
Link to this type worker_setting() View Source
worker_setting() ::
  {:uris, uri()}
  | {:consumer_group, binary() | :no_consumer_group}
  | {:metadata_update_interval, non_neg_integer()}
  | {:consumer_group_update_interval, non_neg_integer()}
  | {:ssl_options, ssl_options()}

Link to this section Functions

Link to this function api_versions(opts \\ []) View Source
api_versions(Keyword.t()) :: KafkaEx.Protocol.ApiVersions.Response.t()

Retrieve supported api versions for each api key.

Link to this function build_worker_options(worker_init) View Source
build_worker_options(worker_init()) ::
  {:ok, worker_init()} | {:error, :invalid_consumer_group}

Builds options to be used with workers

Merges the given options with defaults from the application env config. Returns {:error, :invalid_consumer_options} if the consumer group configuation is invalid, and {:ok, merged_options} otherwise.

Note this happens automatically when using KafkaEx.create_worker.

Link to this function consumer_group(worker \\ Config.default_worker()) View Source
consumer_group(atom() | pid()) :: binary() | :no_consumer_group

Returns the name of the consumer group for the given worker.

Worker may be an atom or pid. The default worker is used by default.

Link to this function consumer_group_metadata(worker_name, supplied_consumer_group) View Source
consumer_group_metadata(atom(), binary()) ::
  KafkaEx.Protocol.ConsumerMetadata.Response.t()
Link to this function create_topics(requests, opts \\ []) View Source
create_topics([KafkaEx.Protocol.CreateTopics.Request.t()], Keyword.t()) ::
  KafkaEx.Protocol.CreateTopics.Response.t()

Create topics. Must provide a list of CreateTopicsRequest, each containing all the information needed for the creation of a new topic.

Link to this function create_worker(name, worker_init \\ []) View Source

create_worker creates KafkaEx workers

Optional arguments(KeywordList)

  • consumer_group: Name of the group of consumers, :no_consumer_group should be passed for Kafka < 0.8.2, defaults to Application.get_env(:kafka_ex, :consumer_group)
  • uris: List of brokers in {"host", port} or comma separated value "host:port,host:port" form, defaults to Application.get_env(:kafka_ex, :brokers)
  • metadata_update_interval: How often kafka_ex would update the Kafka cluster metadata information in milliseconds, default is 30000
  • consumer_group_update_interval: How often kafka_ex would update the Kafka cluster consumer_groups information in milliseconds, default is 30000
  • use_ssl: Boolean flag specifying if ssl should be used for the connection by the worker to kafka, default is false
  • ssl_options: see SSL OPTION DESCRIPTIONS - CLIENT SIDE at http://erlang.org/doc/man/ssl.html, default is []

Returns {:error, error_description} on invalid arguments

Example

iex> KafkaEx.create_worker(:pr) # where :pr is the name of the worker created
{:ok, #PID<0.171.0>}
iex> KafkaEx.create_worker(:pr, uris: [{"localhost", 9092}])
{:ok, #PID<0.172.0>}
iex> KafkaEx.create_worker(:pr, [uris: [{"localhost", 9092}], consumer_group: "foo"])
{:ok, #PID<0.173.0>}
iex> KafkaEx.create_worker(:pr, consumer_group: nil)
{:error, :invalid_consumer_group}
Link to this function earliest_offset(topic, partition, name \\ Config.default_worker()) View Source
earliest_offset(binary(), integer(), atom() | pid()) ::
  [KafkaEx.Protocol.Offset.Response.t()] | :topic_not_found

Get the offset of the earliest message still persistent in Kafka

Example

iex> KafkaEx.earliest_offset("foo", 0)
[%KafkaEx.Protocol.Offset.Response{partition_offsets: [%{error_code: 0, offset: [0], partition: 0}], topic: "foo"}]
Link to this function fetch(topic, partition, opts \\ []) View Source
fetch(binary(), number(), Keyword.t()) ::
  [KafkaEx.Protocol.Fetch.Response.t()] | :topic_not_found

Fetch a set of messages from Kafka from the given topic and partition ID

Optional arguments(KeywordList)

  • offset: When supplied the fetch would start from this offset, otherwise would start from the last committed offset of the consumer_group the worker belongs to. For Kafka < 0.8.2 you should explicitly specify this.
  • worker_name: the worker we want to run this fetch request through. Default is :kafka_ex
  • wait_time: maximum amount of time in milliseconds to block waiting if insufficient data is available at the time the request is issued. Default is 10
  • min_bytes: minimum number of bytes of messages that must be available to give a response. If the client sets this to 0 the server will always respond immediately, however if there is no new data since their last request they will just get back empty message sets. If this is set to 1, the server will respond as soon as at least one partition has at least 1 byte of data or the specified timeout occurs. By setting higher values in combination with the timeout the consumer can tune for throughput and trade a little additional latency for reading only large chunks of data (e.g. setting wait_time to 100 and setting min_bytes 64000 would allow the server to wait up to 100ms to try to accumulate 64k of data before responding). Default is 1
  • max_bytes: maximum bytes to include in the message set for this partition. This helps bound the size of the response. Default is 1,000,000
  • auto_commit: specifies if the last offset should be commited or not. Default is true. You must set this to false when using Kafka < 0.8.2 or :no_consumer_group.

Example

iex> KafkaEx.fetch("foo", 0, offset: 0)
[
  %KafkaEx.Protocol.Fetch.Response{partitions: [
    %{error_code: 0, hw_mark_offset: 1, message_set: [
      %{attributes: 0, crc: 748947812, key: nil, offset: 0, value: "hey foo"}
    ], partition: 0}
  ], topic: "foo"}
]
Link to this function heartbeat(request, opts \\ []) View Source
heartbeat(KafkaEx.Protocol.Heartbeat.Request.t(), Keyword.t()) ::
  KafkaEx.Protocol.Heartbeat.Response.t()

Sends a heartbeat to maintain membership in a consumer group.

Link to this function join_group(request, opts \\ []) View Source
join_group(KafkaEx.Protocol.JoinGroup.Request.t(), Keyword.t()) ::
  KafkaEx.Protocol.JoinGroup.Response.t()

Sends a request to join a consumer group.

Link to this function latest_offset(topic, partition, name \\ Config.default_worker()) View Source
latest_offset(binary(), integer(), atom() | pid()) ::
  [KafkaEx.Protocol.Offset.Response.t()] | :topic_not_found

Get the offset of the latest message written to Kafka

Example

iex> KafkaEx.latest_offset("foo", 0)
[%KafkaEx.Protocol.Offset.Response{partition_offsets: [%{error_code: 0, offsets: [16], partition: 0}], topic: "foo"}]
Link to this function leave_group(request, opts \\ []) View Source
leave_group(KafkaEx.Protocol.LeaveGroup.Request.t(), Keyword.t()) ::
  KafkaEx.Protocol.LeaveGroup.Response.t()

Sends a request to leave a consumer group.

Link to this function metadata(opts \\ []) View Source
metadata(Keyword.t()) :: KafkaEx.Protocol.Metadata.Response.t()

Return metadata for the given topic; returns for all topics if topic is empty string

Optional arguments(KeywordList)

  • worker_name: the worker we want to run this metadata request through, when none is provided the default worker :kafka_ex is used
  • topic: name of the topic for which metadata is requested, when none is provided all metadata is retrieved

Example

iex> KafkaEx.create_worker(:mt)
iex> KafkaEx.metadata(topic: "foo", worker_name: :mt)
%KafkaEx.Protocol.Metadata.Response{brokers: [%KafkaEx.Protocol.Metadata.Broker{host: "192.168.59.103",
   node_id: 49162, port: 49162, socket: nil}],
 topic_metadatas: [%KafkaEx.Protocol.Metadata.TopicMetadata{error_code: 0,
   partition_metadatas: [%KafkaEx.Protocol.Metadata.PartitionMetadata{error_code: 0,
     isrs: [49162], leader: 49162, partition_id: 0, replicas: [49162]}],
   topic: "foo"}]}
Link to this function offset(topic, partition, time, name \\ Config.default_worker()) View Source
offset(
  binary(),
  number(),
  :calendar.datetime() | :earliest | :latest,
  atom() | pid()
) :: [KafkaEx.Protocol.Offset.Response.t()] | :topic_not_found

Get the offset of the message sent at the specified date/time

Example

iex> KafkaEx.offset("foo", 0, {{2015, 3, 29}, {23, 56, 40}}) # Note that the time specified should match/be ahead of time on the server that kafka runs
[%KafkaEx.Protocol.Offset.Response{partition_offsets: [%{error_code: 0, offset: [256], partition: 0}], topic: "foo"}]
Link to this function offset_commit(worker_name, offset_commit_request) View Source
offset_commit(atom(), KafkaEx.Protocol.OffsetCommit.Request.t()) :: [
  KafkaEx.Protocol.OffsetCommit.Response.t()
]
Link to this function offset_fetch(worker_name, offset_fetch_request) View Source
offset_fetch(atom(), KafkaEx.Protocol.OffsetFetch.Request.t()) ::
  [KafkaEx.Protocol.OffsetFetch.Response.t()] | :topic_not_found
Link to this function produce(produce_request, opts \\ []) View Source
produce(KafkaEx.Protocol.Produce.Request.t(), Keyword.t()) ::
  nil
  | :ok
  | {:ok, integer()}
  | {:error, :closed}
  | {:error, :inet.posix()}
  | {:error, any()}
  | iodata()
  | :leader_not_available

Produces batch messages to kafka logs

Optional arguments(KeywordList)

  • worker_name: the worker we want to run this metadata request through, when none is provided the default worker :kafka_ex is used

Example

iex> KafkaEx.produce(%KafkaEx.Protocol.Produce.Request{topic: "foo", partition: 0, required_acks: 1, messages: [%KafkaEx.Protocol.Produce.Message{value: "hey"}]})
{:ok, 9772}
iex> KafkaEx.produce(%KafkaEx.Protocol.Produce.Request{topic: "foo", partition: 0, required_acks: 1, messages: [%KafkaEx.Protocol.Produce.Message{value: "hey"}]}, worker_name: :pr)
{:ok, 9773}
Link to this function produce(topic, partition, value, opts \\ []) View Source
produce(binary(), number(), binary(), Keyword.t()) ::
  nil
  | :ok
  | {:ok, integer()}
  | {:error, :closed}
  | {:error, :inet.posix()}
  | {:error, any()}
  | iodata()
  | :leader_not_available

Produces messages to kafka logs (this is deprecated, use KafkaEx.produce/2 instead) Optional arguments(KeywordList)

  • worker_name: the worker we want to run this metadata request through, when none is provided the default worker :kafka_ex is used
  • key: is used for partition assignment, can be nil, when none is provided it is defaulted to nil
  • required_acks: indicates how many acknowledgements the servers should receive before responding to the request. If it is 0 the server will not send any response (this is the only case where the server will not reply to a request). If it is 1, the server will wait the data is written to the local log before sending a response. If it is -1 the server will block until the message is committed by all in sync replicas before sending a response. For any number > 1 the server will block waiting for this number of acknowledgements to occur (but the server will never wait for more acknowledgements than there are in-sync replicas), default is 0
  • timeout: provides a maximum time in milliseconds the server can await the receipt of the number of acknowledgements in RequiredAcks, default is 100 milliseconds
  • compression: specifies the compression type (:none, :snappy, :gzip)

Example

iex> KafkaEx.produce("bar", 0, "hey")
:ok
iex> KafkaEx.produce("foo", 0, "hey", [worker_name: :pr, required_acks: 1])
{:ok, 9771}

Called when an application is started.

This function is called when an application is started using Application.start/2 (and functions on top of that, such as Application.ensure_started/2). This function should start the top-level process of the application (which should be the top supervisor of the application’s supervision tree if the application follows the OTP design principles around supervision).

start_type defines how the application is started:

  • :normal - used if the startup is a normal startup or if the application is distributed and is started on the current node because of a failover from another node and the application specification key :start_phases is :undefined.
  • {:takeover, node} - used if the application is distributed and is started on the current node because of a failover on the node node.
  • {:failover, node} - used if the application is distributed and is started on the current node because of a failover on node node, and the application specification key :start_phases is not :undefined.

start_args are the arguments passed to the application in the :mod specification key (e.g., mod: {MyApp, [:my_args]}).

This function should either return {:ok, pid} or {:ok, pid, state} if startup is successful. pid should be the PID of the top supervisor. state can be an arbitrary term, and if omitted will default to []; if the application is later stopped, state is passed to the stop/1 callback (see the documentation for the c:stop/1 callback for more information).

use Application provides no default implementation for the start/2 callback.

Callback implementation for Application.start/2.

Link to this function stop_worker(worker) View Source
stop_worker(atom() | pid()) ::
  :ok | {:error, :not_found} | {:error, :simple_one_for_one}

Stop a worker created with create_worker/2

Returns :ok on success or :error if worker is not a valid worker

Link to this function stream(topic, partition, opts \\ []) View Source
stream(binary(), integer(), Keyword.t()) :: KafkaEx.Stream.t()

Returns a streamable struct that may be used for consuming messages.

The returned struct is compatible with the Stream and Enum modules. Some important usage notes follow; see below for a detailed list of options.

iex> KafkaEx.produce("foo", 0, "hey")
:ok
iex> KafkaEx.produce("foo", 0, "hi")
:ok
iex> stream = KafkaEx.stream("foo", 0)
%KafkaEx.Stream{...}
iex> Enum.take(stream, 2)
[%KafkaEx.Protocol.Fetch.Message{attributes: 0, crc: 1784030606, key: "",
    offset: 0, value: "hey"},
 %KafkaEx.Protocol.Fetch.Message{attributes: 0, crc: 3776653906, key: "",
     offset: 1, value: "hi"}]
iex> stream |> Stream.map(fn(msg) -> IO.puts(msg.value) end) |> Stream.run
"hey"
"hi"
#  NOTE this will block!  See below.

Reusing streams

Reusing the same KafkaEx.Stream struct results in consuming the same messages multiple times. This is by design and mirrors the functionality of File.stream!/3. If you want to reuse the same stream struct, update its :offset before reuse.

iex> stream = KafkaEx.stream("foo", 0)
iex> [m1, m2] = Enum.take(stream, 2)
iex> [m1, m2] = Enum.take(stream, 2)   # these will be the same messages
iex> stream = %{stream | fetch_request: %{stream.fetch_request | offset: m2.offset + 1}}
iex> [m3, m4] = Enum.take(stream, 2)   # new messages

Streams block at log end

By default, the stream consumes indefinitely and will block at log end until new messages are available. Use the no_wait_at_logend: true option to have the stream halt when no more messages are available. This mirrors the command line arguments of SimpleConsumerShell.

Note that this means that fetches will return up to as many messages as are immediately available in the partition, regardless of arguments.

iex> Enum.map(1..3, fn(ix) -> KafkaEx.produce("bar", 0, "Msg #{ix}") end)
iex> stream = KafkaEx.stream("bar", 0, no_wait_at_logend: true, offset: 0)
iex> Enum.map(stream, fn(m) -> m.value end) # does not block
["Msg 1", "Msg 2", "Msg 3"]
iex> stream |> Stream.map(fn(m) -> m.value end) |> Enum.take(10)
# only 3 messages are available
["Msg 1", "Msg 2", "Msg 3"]

Consumer group and auto commit

If you pass a value for the consumer_group option and true for auto_commit, the offset of the last message consumed will be committed to the broker during each cycle.

For example, suppose we start at the beginning of a partition with millions of messages and the max_bytes setting is such that each fetch request gets 25 messages. In this setting, we will (roughly) be committing offsets 25, 50, 75, etc.

Note that offsets are committed immediately after messages are retrieved and before you know if you have successfully consumed them. It is therefore possible that you could miss messages if your consumer crashes in the middle of consuming a batch, effectively losing the guarantee of at-least-once delivery. If you need this guarantee, we recommend that you construct a GenServer-based consumer module and manage your commits manually.

iex> Enum.map(1..10, fn(ix) -> KafkaEx.produce("baz", 0, "Msg #{ix}") end)
iex> stream = KafkaEx.stream("baz", 0, consumer_group: "my_consumer", auto_commit: true)
iex> stream |> Enum.take(2) |> Enum.map(fn(msg) -> msg.value end)
["Msg 1", "Msg 2"]
iex> stream |> Enum.take(2) |> Enum.map(fn(msg) -> msg.value end)
["Msg 1", "Msg 2"]  # same values
iex> stream2 = KafkaEx.stream("baz", 0, consumer_group: "my_consumer", auto_commit: true)
iex> stream2 |> Enum.take(1) |> Enum.map(fn(msg) -> msg.value end)
["Msg 3"] # stream2 got the next available offset

Options

KafkaEx.stream/3 accepts a keyword list of options for the third argument.

  • no_wait_at_logend (boolean): Set this to true to halt the stream when there are no more messages available. Defaults to false, i.e., the stream blocks to wait for new messages.

  • worker_name (term): The KafkaEx worker to use for communication with the brokers. Defaults to :kafka_ex (the default worker).

  • consumer_group (string): Name of the consumer group used for the initial offset fetch and automatic offset commit (if auto_commit is true). Omit this value or use :no_consumer_group to not use a consumer group (default). Consumer groups are not compatible with Kafka < 0.8.2.

  • offset (integer): The offset from which to start fetching. By default, this is the last available offset of the partition when no consumer group is specified. When a consumer group is specified, the next message after the last committed offset is used. For Kafka < 0.8.2 you must explicitly specify an offset.

  • auto_commit (boolean): If true, the stream automatically commits offsets of fetched messages. See discussion above.

Link to this function sync_group(request, opts \\ []) View Source
sync_group(KafkaEx.Protocol.SyncGroup.Request.t(), Keyword.t()) ::
  KafkaEx.Protocol.SyncGroup.Response.t()

Sends a request to synchronize with a consumer group.

Link to this function valid_consumer_group?(b) View Source
valid_consumer_group?(any()) :: boolean()

Returns true if the input is a valid consumer group or :no_consumer_group