View Source EctoBackfiller behaviour (EctoBackfiller v0.5.0)
Orchestrator of a back-pressured backfill strategy for Ecto
repos.
Starts a producer process and dynamically start consumers, the amount of consumers is determined by the availability of resources on your infrastructure, such as available database connections or I/O usage.
Define a module to execute the backfill, which must use EctoBackfiller
and implement its callbacks.
Lets imagine a silly example to illustrate the use of the library. Suppose you have a User
schema described as:
defmodule MyApp.Users.User do
use Ecto.Schema
schema do
field :email_verified_at, :naive_datetime
end
end
And later on, your business requirements takes you to add email_verified
field as a boolean on
the schema representing if the user has verified the email. Then you write up the migration and
have to update the new column all existing users before execution of the migration.
To do so, you can write a module using EctoBackfiller
as:
defmodule MyApp.Backfills.UserEmailVerifiedBackfill do
use EctoBackfiller, repo: MyApp.Repo
alias MyApp.Users
alias MyApp.Users.User
@impl true
def query, do: Ecto.Queryable.to_query(User)
@impl true
def step, do: 5
@impl true
def handle_batch(users) do
Enum.each(users, fn user ->
if is_nil(user.email_verified_at) do
{:ok, user} = Users.update(user, %{email_verified: false})
else
{:ok, user} = Users.update(user, %{email_verified: true})
end
end)
end
end
Please mind that the handle_batch/1
callback MUST NOT modify the results of the query, as it
will be used to determine the next batch of data to be fetched.
You also need to guarantee the ordering of the data fetched, since the backfill is based on querying the next batch using the last result seek column value. If the data is not ordered, you may end up with duplicated events and/or missing data.
Now you are ready to start executing it and to do so you must start the Supervisor, which will be named as the backfill module's name, or in other words, it is a unique proccess per backfill module.
The example below will affect users with ID greater than 100 (not including 100) and will
backfill data until the id
column reaches 1_000
. Inside the application IEx session:
alias MyApp.Backfills.UserEmailVerifiedBackfill
last_seeked_val = 100
stop_seek_val = 1_000
UserEmailVerifiedBackfill.start_link(last_seeked_val, stop_seek_val)
:ok
UserEmailVerifiedBackfill.add_consumer()
:ok
UserEmailVerifiedBackfill.start()
:ok
You can tweak the start_link/2
function to start from the beggining by setting last_seeked_val
to nil
, or to stop when all users are backfilled setting the stop_seek_val
to nil
.
You may add more consumers on the fly, based on how the application performs based on the step used and the number of consumers subscribed.
Link to this section Summary
Callbacks
Handles the backfill logic given a list of data
Queryable used on Repo.all/2
to fetch chunks of data
Column used to determine what to seek
Amount of data fetched per step
Link to this section Callbacks
@callback handle_batch([struct()]) :: :ok
Handles the backfill logic given a list of data
@callback query() :: Ecto.Query.t()
Queryable used on Repo.all/2
to fetch chunks of data
@callback seek_col() :: atom()
Column used to determine what to seek
@callback step() :: pos_integer()
Amount of data fetched per step