All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
[0.2.0] - 2025-05-03
Added
ExSystolic.Backend.Partitioned-- tile-based parallel backend usingTask.Supervisor(default) orPoolexworker pool (dispatch: :pool)ExSystolic.Backend.PoolexWorker-- GenServer worker for Poolex-based dispatchExSystolic.Backend.LinkOps-- shared link buffer operations (inject_streams/2,drain_links/2,write_pe_outputs/2), eliminating triple-duplicated code across Clock, Interpreted, and PartitionedExSystolic.Tile-- tile data structure for partitioned execution with documented invariantsExSystolic.TilePartitioner-- splits arrays into rectangular tiles with validation (tile_rows > 0,tile_cols > 0)ExSystolic.Application-- supervision tree withExSystolic.TaskSupervisorand:systolic_poolPoolex poolExSystolic.Backendbehaviour -- documents the 6-step BSP tick contract (INJECT, READ, EXECUTE, COLLECT, WRITE, RECORD)Space.links/2callback -- Space modules now produce boundary + internal links for a given direction;Grid2Dimplements it:west_to_eastand:north_to_southArray.result_map/1-- returnscoord => statefor any Space implementationPE.value/2andPE.present?/2-- helpers for handling:empty/nilPE inputs idiomaticallyArray.input/3-- accepts any atom port (not just:west/:north); raisesArgumentErroron duplicate{coord, port}keys- Architecture diagram in
ExSystolicmoduledoc showing BSP dataflow - Determinism contract in
PEmoduledoc forbidding side effects (Process, Agent, ETS, IO, timers) :emptydocumented as reserved atom inLinkandPEmoduledocs- Backend conformance test (
test/ex_systolic/backend/conformance_test.exs) -- result parity and trace parity across backends - Cross-backend trace parity test (sorted
{tick, coord, state_before, state_after}comparison) - Link FIFO interleaved property test
- Capacity > 1 link tests (multiple writes, FIFO order,
:fullon overflow) - Poolex pool dispatch test (
dispatch: :poolparity with interpreted) - Application supervision tests (TaskSupervisor alive, Poolex pool accepts calls)
- Custom space end-to-end execution test through Clock
- Full PE state parity test beyond
result_matrix - Doctest directives in 6 test files (34 doctests, up from 12)
Changed
Array.connect/2delegates toSpace.links/2; Grid2D-specific link functions removed fromarray.exArray.result_matrix/1raisesArgumentErrorfor non-Grid2D spaces (was silently producing nil cells)Partitionedbackend usesTask.Supervisor.async_streamwithordered: trueandtimeout: :infinity; trace events sorted by{tick, coord}after collectionPartitionedacceptsdispatch: :tasks | :pooloption (default:tasks)Clock.run/2raisesArgumentErrorfor unknown backends (wasCaseClauseError)Clock.step/1usesLinkOpshelpers instead of inline link operationsInterpreted.write_outputs/3delegates toLinkOps.write_pe_outputs+LinkOps.inject_streamsMAC.step/4usesPE.value/2andPE.present?/2instead of inline:emptychecksTrace.at/2andTrace.for_coord/2return events in chronological order (filter-then-reverse); ordering documentedTrace.Eventhas a single canonical typespec with@type event :: Event.t()convenience alias- All ranges use
//1step syntax for Elixir 1.20-rc compatibility (15 occurrences) Link.read/1typespec changed to{term() | :empty, t()}Link.write/2andLink.read/1doctests fixed (bind-then-assert pattern, no bare_)Trace.record/2doctest now verifies event content (state_after,tick)Link.tick/1doctest now demonstrates buffer preservationGrid.member?/2doctest includes negative-coordinate edge caseTilePartitionermoduledoc accurately describes boundary-link handling (excluded from tiles, managed globally)Applicationhas full@moduledocand@doconstart/2- Backend table in
ExSystolicmoduledoc lists all 19 modules
Removed
Interpreted.read_inputs/2-- dead code that violated buffer drainage contract; tests removedArray.input/3deadcase directionbranch (identity mapping:west -> :west,:north -> :north)Application.start/1deadCode.ensure_loaded?(Poolex)conditional (Poolex is always a dependency)Tile.boundary_inputsfield -- dead struct field never read by Partitioned- Grid2D-specific
build_direction_linksclauses and helper functions fromarray.ex(~90 LOC)
Fixed
- Trace event ordering was non-deterministic in partitioned backend (
ordered: falseinTask.async_stream) - Link doctests for
write/2andread/2never executed (bare_in expected output) result_matrix/1silently returned nil cells for non-Grid2D spacesArray.input/3silently overwrote duplicate{coord, port}streams- Range syntax incompatibility with Elixir 1.20-rc (bare
0..(n-1)where n could be 0) PoolexWorker.handle_calltimeout risk -- callers now pass:infinity- Dialyzer opaque type violation in
TilePartitioner(MapSet.member?replaced withMap.has_key?)
Performance
LinkOps.drain_links/2builds endpoint index map for O(1) lookup (mutation still list-based; see review item 2.1 for full indexed-map refactoring)
Known Limitations
- Trace.events list grows unbounded (no
max_eventsor sampling) - Link mutation still uses
List.replace_at(O(n)); indexed map storage not yet implemented - Latency > 1 links are not yet implemented in the backend
- No sparse data support; every PE executes every tick
- No semi-ring abstraction; MAC hard-codes arithmetic operations
[0.1.0] - 2025-05-01
Added
ExSystolic.Grid-- rectangular coordinate geometry with neighbour lookups (north, south, east, west)ExSystolic.Link-- FIFO communication channels between PE ports with configurable capacity and latencyExSystolic.PE-- behaviour for processing elements withinit/1andstep/4callbacksExSystolic.PE.MAC-- multiply-accumulate PE implementing the classic Kung-Leiserson systolic GEMM PEExSystolic.Array-- array construction API:new,fill,connect,input,trace,result_matrixExSystolic.Clock-- tick-by-tick execution driver withrun/2andstep/1ExSystolic.Trace-- optional execution trace recording with per-tick and per-coordinate queryingExSystolic.Backend.Interpreted-- single-process reference backend implementing the strict tick order (inject, read, execute, write, record)ExSystolic.Examples.GEMM-- general matrix multiply using systolic wavefront with skewed input streams- Tutorial livebook:
notebooks/introduction_to_systolic_arrays.livemd - 86 unit tests + 9 property-based tests (95.4% coverage)
- CI pipeline: Elixir 1.18/OTP 27, 1.19/OTP 28, 1.20/OTP 29 (experimental)
- Publish workflow: Hex.pm release on
v*tag push - Quality toolchain: credo, dialyxir, excoveralls, stream_data, mix_audit
Design Principles
- Everything is pure Elixir; no GenServer, no Nx, no NIFs
- Execution is deterministic and reproducible
- All data structures are immutable
- All public functions have
@doc, typespecs, and doctests
Known Limitations
- Interpreted backend has no parallelism; large arrays will be slow
- No sparse data support; every PE executes every tick
- Latency > 1 links are not yet implemented in the backend
- Trace is held entirely in memory
- No semi-ring abstraction; MAC hard-codes arithmetic operations
Performance
No benchmarks have been run. The interpreted backend is chosen for architectural clarity and correctness, not for speed. Performance claims will be made only after benchmarks exist.