ExSystolic.Backend.Partitioned (ex_systolic v0.2.0)

Copy Markdown View Source

A tile-based parallel execution backend.

The partitioned backend divides the array into rectangular tiles and dispatches tile computations in parallel. Each tick follows the Bulk Synchronous Parallel (BSP) model:

  1. Inject -- push external input streams into boundary links
  2. Read -- read all link buffers (globally, before any PE executes)
  3. Dispatch -- submit tile PE computations in parallel
  4. Collect -- gather all tile results
  5. Write -- write all outputs into link buffers
  6. Record -- merge trace events

Dispatch strategies

Two strategies are available, both deterministic:

  • :tasks (default) -- uses Task.Supervisor.async_stream/4 against ExSystolic.TaskSupervisor with ordered: true. Spawns one supervised task per tile per tick.
  • :pool -- uses the :systolic_pool Poolex pool of ExSystolic.Backend.PoolexWorker GenServers. Reuses long-lived workers, eliminating per-tick task spawn overhead. Select via dispatch: :pool.

Determinism guarantee

Even though tiles execute in parallel within a tick, the BSP barrier ensures that all tiles see the same frozen inputs for a given tick. No tile reads data produced by another tile in the same tick. Trace events are sorted by {tick, coord} before recording so the trace list is byte-identical across runs and dispatch strategies.

Options

  • :ticks -- number of ticks to run (required)
  • :tile_rows -- rows per tile (default: array rows, i.e. single tile)
  • :tile_cols -- cols per tile (default: array cols, i.e. single tile)
  • :dispatch -- :tasks (default) or :pool

When to use

Use the partitioned backend when:

  • The array is large enough that parallelism helps (> 8x8)
  • You need multi-core throughput
  • You have confirmed determinism parity with the interpreted backend

For small arrays or debugging, the interpreted backend is simpler and has less overhead.

Examples

iex> alias ExSystolic.{Array, Backend.Partitioned, PE.MAC}
iex> array = Array.new(rows: 2, cols: 2) |> Array.fill(MAC) |> Array.connect(:west_to_east) |> Array.connect(:north_to_south)
iex> array = Array.input(array, :west, [{{0,0}, [1,2]}, {{1,0}, [3,4]}])
iex> array = Array.input(array, :north, [{{0,0}, [5,7]}, {{0,1}, [6,8]}])
iex> result = Partitioned.run(array, ticks: 5)
iex> result.tick
5

Summary

Functions

Runs the array for the given number of ticks using tile-based parallel execution.

Executes a single tick using the partitioned backend.

Functions

run(array, opts)

Runs the array for the given number of ticks using tile-based parallel execution.

Returns the final array state, which includes the updated PEs, links, tick counter, and trace.

step(array, opts \\ [])

Executes a single tick using the partitioned backend.

This is the BSP step: inject, read, dispatch, collect, write, record, advance. Links are managed globally (like the interpreted backend); only PE execution is parallelized across tiles.

Examples

iex> alias ExSystolic.{Array, Backend.Partitioned, PE.MAC}
iex> array = Array.new(rows: 2, cols: 1) |> Array.fill(MAC) |> Array.connect(:west_to_east)
iex> array = Array.input(array, :west, [{{0,0}, [10]}, {{1,0}, [20]}])
iex> array = Partitioned.step(array)
iex> array.tick
1