Xandra v0.13.0 Xandra.Cluster View Source

Connection to a Cassandra cluster.

This module is a "proxy" connection with support for connecting to multiple nodes in a Cassandra cluster and executing queries on such nodes based on a given strategy.

Usage

This module manages connections to different nodes in a Cassandra cluster. Each connection to a node is a Xandra connection (so it can also be a pool of connections). When a Xandra.Cluster connection is started, one Xandra pool of connections will be started for each node specified in the :nodes option plus for autodiscovered nodes if the :autodiscovery option is true.

The API provided by this module mirrors the API provided by the Xandra module. Queries executed through this module will be "routed" to nodes in the provided list of nodes based on a strategy. See the "Load balancing strategies" section below

Note that regardless of the underlying pool, Xandra.Cluster will establish one extra connection to each node in the specified list of :nodes (used for internal purposes).

Here is an example of how one could use Xandra.Cluster to connect to multiple nodes:

Xandra.Cluster.start_link(
  nodes: ["cassandra1.example.net", "cassandra2.example.net"],
  pool_size: 10,
)

The code above will establish a pool of ten connections to each of the nodes specified in :nodes, for a total of twenty connections going out of the current machine, plus two extra connections (one per node) used for internal purposes.

Autodiscovery

When the :autodiscovery option is true (which is the default), Xandra.Cluster discovers nodes in the same cluster as the nodes specified in the :nodes option. The nodes in :nodes act as "seed" nodes. When nodes in the cluster are discovered, a Xandra pool of connections is started for each node that is in the same datacenter as one of the nodes in :nodes. For now, there is no limit on how many nodes in the same datacenter Xandra.Cluster discovers and connects to.

As mentioned before, a "control connection" for internal purposes is established to each node in :nodes. These control connections are not established for autodiscovered nodes. This means that if you only have one seed node in :nodes, there will only be one control connection: if that control connection goes down for some reason, you won't receive cluster change events anymore. This will cause disconnections but will not technically break anything.

Load balancing strategies

For now, there are two load balancing "strategies" implemented:

  • :random - it will choose one of the connected nodes at random and execute the query on that node.

  • :priority - it will choose a node to execute the query according to the order nodes appear in :nodes. Not supported when :autodiscovery is true.

Disconnections and reconnections

Xandra.Cluster also supports nodes disconnecting and reconnecting: if Xandra detects one of the nodes in :nodes going down, it will not execute queries against it anymore, but will start executing queries on it as soon as it detects such node is back up.

If all specified nodes happen to be down when a query is executed, a Xandra.ConnectionError with reason {:cluster, :not_connected} will be returned.

Link to this section Summary

Functions

Returns a specification to start this module under a supervisor.

Same as execute/4 but with optional arguments.

Executes a query on a node in the cluster.

Same as execute/3 but returns the result directly or raises in case of errors.

Same as execute/4 but returns the result directly or raises in case of errors.

Same as prepare/3 but raises in case of errors.

Runs a function with a given connection.

Starts a cluster connection.

Link to this section Types

Link to this section Functions

Returns a specification to start this module under a supervisor.

See Supervisor.

Link to this function

execute(cluster, query, params_or_options \\ []) View Source
execute(cluster(), Xandra.statement() | Xandra.Prepared.t(), Xandra.values()) ::
  {:ok, Xandra.result()} | {:error, Xandra.error()}
execute(cluster(), Xandra.Batch.t(), keyword()) ::
  {:ok, Xandra.Void.t()} | {:error, Xandra.error()}

Same as execute/4 but with optional arguments.

Link to this function

execute(cluster, query, params, options) View Source

Executes a query on a node in the cluster.

This function executes a query on a node in the cluster. The node is chosen based on the load balancing strategy given in start_link/1.

Supports the same options as Xandra.execute/4. In particular, the :retry_strategy option is cluster-aware, meaning that queries are retried on possibly different nodes in the cluster.

Link to this function

execute!(cluster, query, params_or_options \\ []) View Source

Same as execute/3 but returns the result directly or raises in case of errors.

Link to this function

execute!(cluster, query, params, options) View Source

Same as execute/4 but returns the result directly or raises in case of errors.

Link to this function

prepare(cluster, statement, options \\ []) View Source
prepare(cluster(), Xandra.statement(), keyword()) ::
  {:ok, Xandra.Prepared.t()} | {:error, Xandra.error()}

Same as Xandra.prepare/3.

Preparing a query through Xandra.Cluster will prepare it only on one node, according to the load balancing strategy chosen in start_link/1. To prepare and execute a query on the same node, you could use run/3:

Xandra.Cluster.run(cluster, fn conn ->
  # "conn" is the pool of connections for a specific node.
  prepared = Xandra.prepare!(conn, "SELECT * FROM system.local")
  Xandra.execute!(conn, prepared, _params = [])
end)

Thanks to the prepared query cache, we can always reprepare the query and execute it because after the first time (on each node) the prepared query will be fetched from the cache. However, if a prepared query is unknown on a node, Xandra will prepare it on that node on the fly, so we can simply do this as well:

prepared = Xandra.Cluster.prepare!(cluster, "SELECT * FROM system.local")
Xandra.Cluster.execute!(cluster, prepared, _params = [])

Note that this goes through the cluster twice, so there's a high chance that the query will be prepared on one node and then executed on another node. This is however useful if you want to use the :retry_strategy option in execute!/4: in the run/3 example above, if you use :retry_strategy with Xandra.execute!/3, the query will be retried on the same pool of connections to the same node. execute!/4 will retry queries going through the cluster again instead.

Link to this function

prepare!(cluster, statement, options \\ []) View Source

Same as prepare/3 but raises in case of errors.

If the function is successful, the prepared query is returned directly instead of in an {:ok, prepared} tuple like in prepare/3.

Link to this function

run(cluster, options \\ [], fun) View Source
run(cluster(), keyword(), (Xandra.conn() -> result)) :: result when result: var

Runs a function with a given connection.

The connection that is passed to fun is a Xandra connection, not a cluster. This means that you should call Xandra functions on it. Since the connection is a single connection, it means that it's a connection to a specific node, so you can do things like prepare a query and then execute it because you can be sure it's prepared on the same node where you're executing it.

Examples

query = "SELECT * FROM system_schema.keyspaces"

Xandra.Cluster.run(cluster, fn conn ->
  prepared = Xandra.prepare!(conn, query)
  Xandra.execute!(conn, prepared, _params = [])
end)
Link to this function

start_link(options) View Source
start_link([Xandra.start_option() | {:load_balancing, atom()}]) ::
  GenServer.on_start()

Starts a cluster connection.

Note that a cluster connection starts an additional connection for each node specified in :nodes. Such "control connection" is used for monitoring cluster updates.

Options

This function accepts all options accepted by Xandra.start_link/1 and and forwards them to each connection or pool of connections. The following options are specific to this function:

  • :load_balancing - (atom) load balancing "strategy". Either :random or :priority. See the "Load balancing strategies" section in the module documentation. If :autodiscovery is true, the only supported strategy is :random. Defaults to :random.

  • :nodes - (list of strings) a list of nodes to use as seed nodes when setting up the cluster. The behaviour of this option depends on the :autodiscovery option. See the "Autodiscovery" section below. If the :autodiscovery option is false, the cluster only connects to the nodes in :nodes and sets up one additional control connection for each one of these nodes. Defaults to ["127.0.0.1"].

  • :autodiscovery - (boolean) whether to autodiscover nodes in the cluster. See the "Autodiscovery" section in the module documentation. Defaults to true.

  • :autodiscovered_nodes_port - (integer) the port to use when connecting to autodiscovered nodes. Cassandra does not advertise the port of nodes when discovering them, so you'll need to specify one explicitly. This might get fixed in future Cassandra versions. Defaults to 9042.

Examples

Starting a cluster connection to two specific nodes in the cluster:

{:ok, cluster} =
  Xandra.Cluster.start_link(
    nodes: ["cassandra1.example.net", "cassandra2.example.net"],
    autodiscovery: false
  )

Starting a pool of five connections to nodes in the same cluster as the given "seed" node:

{:ok, cluster} =
  Xandra.Cluster.start_link(
    autodiscovery: true,
    nodes: ["cassandra-seed.example.net"]
    pool_size: 5
  )

Passing options down to each connection:

{:ok, cluster} =
  Xandra.Cluster.start_link(
    nodes: ["cassandra.example.net"],
    after_connect: &Xandra.execute!(&1, "USE my_keyspace")
  )
Link to this function

stream_pages!(cluster, query, params, options \\ []) View Source

Returns a stream of pages.

When streaming pages through a cluster, the streaming is done from a single node, that is, this function just calls out to Xandra.stream_pages!/4 after choosing a node appropriately.

All options are forwarded to Xandra.stream_pages!/4, including retrying options.