View Source EctoBackfiller behaviour (EctoBackfiller v0.5.0)

Orchestrator of a back-pressured backfill strategy for Ecto repos.

Starts a producer process and dynamically start consumers, the amount of consumers is determined by the availability of resources on your infrastructure, such as available database connections or I/O usage.

Define a module to execute the backfill, which must use EctoBackfiller and implement its callbacks.

Lets imagine a silly example to illustrate the use of the library. Suppose you have a User schema described as:

defmodule MyApp.Users.User do
  use Ecto.Schema

  schema do
    field :email_verified_at, :naive_datetime
  end
end

And later on, your business requirements takes you to add email_verified field as a boolean on the schema representing if the user has verified the email. Then you write up the migration and have to update the new column all existing users before execution of the migration.

To do so, you can write a module using EctoBackfiller as:

defmodule MyApp.Backfills.UserEmailVerifiedBackfill do
  use EctoBackfiller, repo: MyApp.Repo

  alias MyApp.Users
  alias MyApp.Users.User

  @impl true
  def query, do: Ecto.Queryable.to_query(User)

  @impl true
  def step, do: 5

  @impl true
  def handle_batch(users) do
    Enum.each(users, fn user ->
      if is_nil(user.email_verified_at) do
        {:ok, user} = Users.update(user, %{email_verified: false})
      else
        {:ok, user} = Users.update(user, %{email_verified: true})
      end
    end)
  end
end

Please mind that the handle_batch/1 callback MUST NOT modify the results of the query, as it will be used to determine the next batch of data to be fetched.

You also need to guarantee the ordering of the data fetched, since the backfill is based on querying the next batch using the last result seek column value. If the data is not ordered, you may end up with duplicated events and/or missing data.

Now you are ready to start executing it and to do so you must start the Supervisor, which will be named as the backfill module's name, or in other words, it is a unique proccess per backfill module.

The example below will affect users with ID greater than 100 (not including 100) and will backfill data until the id column reaches 1_000. Inside the application IEx session:

alias MyApp.Backfills.UserEmailVerifiedBackfill

last_seeked_val = 100
stop_seek_val = 1_000
UserEmailVerifiedBackfill.start_link(last_seeked_val, stop_seek_val)
:ok

UserEmailVerifiedBackfill.add_consumer()
:ok

UserEmailVerifiedBackfill.start()
:ok

You can tweak the start_link/2 function to start from the beggining by setting last_seeked_val to nil, or to stop when all users are backfilled setting the stop_seek_val to nil.

You may add more consumers on the fly, based on how the application performs based on the step used and the number of consumers subscribed.

Link to this section Summary

Callbacks

Handles the backfill logic given a list of data

Queryable used on Repo.all/2 to fetch chunks of data

Column used to determine what to seek

Amount of data fetched per step

Link to this section Callbacks

@callback handle_batch([struct()]) :: :ok

Handles the backfill logic given a list of data

@callback query() :: Ecto.Query.t()

Queryable used on Repo.all/2 to fetch chunks of data

@callback seek_col() :: atom()

Column used to determine what to seek

@callback step() :: pos_integer()

Amount of data fetched per step