A tile-based parallel execution backend.
The partitioned backend divides the array into rectangular tiles and dispatches tile computations in parallel. Each tick follows the Bulk Synchronous Parallel (BSP) model:
- Inject -- push external input streams into boundary links
- Read -- read all link buffers (globally, before any PE executes)
- Dispatch -- submit tile PE computations in parallel
- Collect -- gather all tile results
- Write -- write all outputs into link buffers
- Record -- merge trace events
Dispatch strategies
Two strategies are available, both deterministic:
:tasks(default) -- usesTask.Supervisor.async_stream/4againstExSystolic.TaskSupervisorwithordered: true. Spawns one supervised task per tile per tick.:pool-- uses the:systolic_poolPoolex pool ofExSystolic.Backend.PoolexWorkerGenServers. Reuses long-lived workers, eliminating per-tick task spawn overhead. Select viadispatch: :pool.
Determinism guarantee
Even though tiles execute in parallel within a tick, the BSP barrier
ensures that all tiles see the same frozen inputs for a given tick.
No tile reads data produced by another tile in the same tick. Trace
events are sorted by {tick, coord} before recording so the trace
list is byte-identical across runs and dispatch strategies.
Options
:ticks-- number of ticks to run (required):tile_rows-- rows per tile (default: array rows, i.e. single tile):tile_cols-- cols per tile (default: array cols, i.e. single tile):dispatch--:tasks(default) or:pool
When to use
Use the partitioned backend when:
- The array is large enough that parallelism helps (> 8x8)
- You need multi-core throughput
- You have confirmed determinism parity with the interpreted backend
For small arrays or debugging, the interpreted backend is simpler and has less overhead.
Examples
iex> alias ExSystolic.{Array, Backend.Partitioned, PE.MAC}
iex> array = Array.new(rows: 2, cols: 2) |> Array.fill(MAC) |> Array.connect(:west_to_east) |> Array.connect(:north_to_south)
iex> array = Array.input(array, :west, [{{0,0}, [1,2]}, {{1,0}, [3,4]}])
iex> array = Array.input(array, :north, [{{0,0}, [5,7]}, {{0,1}, [6,8]}])
iex> result = Partitioned.run(array, ticks: 5)
iex> result.tick
5
Summary
Functions
Runs the array for the given number of ticks using tile-based parallel execution.
Executes a single tick using the partitioned backend.
Functions
@spec run( ExSystolic.Array.t(), keyword() ) :: ExSystolic.Array.t()
Runs the array for the given number of ticks using tile-based parallel execution.
Returns the final array state, which includes the updated PEs, links, tick counter, and trace.
@spec step( ExSystolic.Array.t(), keyword() ) :: ExSystolic.Array.t()
Executes a single tick using the partitioned backend.
This is the BSP step: inject, read, dispatch, collect, write, record, advance. Links are managed globally (like the interpreted backend); only PE execution is parallelized across tiles.
Examples
iex> alias ExSystolic.{Array, Backend.Partitioned, PE.MAC}
iex> array = Array.new(rows: 2, cols: 1) |> Array.fill(MAC) |> Array.connect(:west_to_east)
iex> array = Array.input(array, :west, [{{0,0}, [10]}, {{1,0}, [20]}])
iex> array = Partitioned.step(array)
iex> array.tick
1