Statwise is an Elixir statistics library that aims for idiomatic Elixir APIs with results checked against well-known Python references.

This first milestone includes:

  • Descriptive statistics for lists and one-dimensional Nx tensors.
  • Normal and Student's t distribution helpers.
  • One-sample, paired, Welch, and pooled t-tests.
  • Average-rank utilities.
  • Asymptotic and exact Mann-Whitney U tests.
  • Dataframe-style column wrappers for running tests from maps or Explorer dataframes.
  • Visualization builders for histograms, ECDFs, QQ plots, box plots, scatter plots, line plots, summary bars and points with intervals, count plots, strip plots, and heatmaps with Vega-Lite-compatible output.
  • Committed JSONL fixtures generated from pinned Python references.

Examples

Statwise.Descriptive.mean([1, 2, 3])
#=> 2.0

Statwise.TTest.independent([1.2, 1.9, 2.4], [2.2, 3.0, 3.4],
  variance: :welch
)
#=> %Statwise.TestResult{}

Statwise.MannWhitney.test([1, 3, 5], [2, 4],
  alternative: :two_sided,
  method: :asymptotic
)
#=> %Statwise.TestResult{}

Statwise.Visualization.histogram([1, 2, 2, 3], bins: 10)
|> Statwise.Visualization.to_vega_lite()
#=> %{"$schema" => "https://vega.github.io/schema/vega-lite/v5.json", ...}

# In Livebook with :jason, :vega_lite, and :kino_vega_lite installed:
Statwise.Visualization.histogram([1, 2, 2, 3], bins: 10)
|> Statwise.Visualization.with_style(width: 420, color: "#2563eb")
|> Statwise.Visualization.show()
rows = [
  %{site: :north, treatment: :control, time: 1, score: 1.2},
  %{site: :north, treatment: :control, time: 2, score: 1.8},
  %{site: :south, treatment: :treated, time: 1, score: 2.4},
  %{site: :south, treatment: :treated, time: 2, score: 2.9}
]

rows
|> Statwise.Visualization.plot(x: :time, y: :score, color: :treatment)
|> Statwise.Visualization.add(:point)
|> Statwise.Visualization.add(:line)
|> Statwise.Visualization.facet(column: :site)
|> Statwise.Visualization.show()

rows
|> Statwise.Visualization.box_plot(x: :treatment, y: :score)
|> Statwise.Visualization.with_test(:t_test, groups: {:control, :treated})
|> Statwise.Visualization.show()

T-Tests

Statwise.TTest.one_sample([2.5, 3.1, 3.6, 4.0], mean: 3.0)

Statwise.TTest.paired(
  [10.2, 11.5, 12.1, 13.8],
  [9.9, 10.8, 11.2, 12.6],
  alternative: :greater
)

Statwise.TTest.independent(
  [1.2, 1.9, 2.4, 2.9],
  [2.2, 3.0, 3.4, 4.1, 4.8],
  variance: :welch,
  alternative: :less,
  null_difference: 0.0,
  confidence_level: 0.95,
  effect_size: true
)

The test APIs can also pull samples from dataframe-like column data. Statwise does not depend on Explorer, but if your application has Explorer loaded, Explorer.DataFrame columns are accepted. Maps of columns work too:

df = %{
  before: [10.2, 11.5, 12.1, 13.8],
  after: [9.9, 10.8, 11.2, 12.6],
  control: [1.2, 1.9, 2.4, 2.9],
  treatment: [2.2, 3.0, 3.4, 4.1]
}

Statwise.TTest.one_sample(df, columns: [:before, :after], mean: 10.0)
#=> %{before: %Statwise.TestResult{}, after: %Statwise.TestResult{}}

Statwise.TTest.paired(df, columns: [:before, :after])
#=> %Statwise.TestResult{}

Statwise.TTest.independent(df, columns: [:control, :treatment], variance: :welch)
#=> %Statwise.TestResult{}

Column extraction defaults to ordinary lists. Pass input: :tensor to extract map or Explorer columns as one-dimensional f64 tensors. With Explorer loaded, Statwise uses Explorer.Series.to_tensor/2 when it is available:

Statwise.TTest.one_sample(df,
  columns: [:before, :after],
  mean: 10.0,
  input: :tensor,
  backend: :tensor
)

Use pairs: to run several two-sample tests in one call:

Statwise.TTest.paired(df,
  pairs: [
    before: :after,
    control: :treatment
  ]
)
#=> %{{:before, :after} => %Statwise.TestResult{}, ...}

Supported alternatives are :two_sided, :greater, and :less. Independent t-tests support variance: :welch and variance: :pooled. T-test results include confidence intervals by default. Pass effect_size: true to include Cohen's d and Hedges' g.

Nonparametric Tests

Statwise.Nonparametric.Rank.ranks([10, 20, 20, 30])
#=> [1.0, 2.5, 2.5, 4.0]

Statwise.MannWhitney.test(
  [1.0, 3.0, 5.0],
  [2.0, 4.0],
  alternative: :two_sided,
  method: :auto,
  continuity: true
)

Dataframe columns are supported with the same columns: and pairs: options:

Statwise.MannWhitney.test(df, columns: [:control, :treatment], method: :auto)

Statwise.MannWhitney.test(df,
  pairs: [
    control: :treatment,
    before: :after
  ],
  method: :auto
)

Ranking currently supports SciPy-compatible average ranks for ties. Mann-Whitney U supports method: :asymptotic, method: :exact, and method: :auto. Like SciPy, explicit method: :exact does not apply a tie correction. :auto uses exact p-values when there are no ties and the smaller sample has at most 8 observations; otherwise it uses the asymptotic normal approximation. Mann-Whitney results include common-language and rank-biserial effect sizes. effect_size.cliffs_delta is also provided as an alias of rank-biserial.

Stage-one behavior is intentionally strict: raw samples must be finite numeric lists or one-dimensional Nx tensors. Test APIs can also extract raw samples from dataframe-style columns with columns: or pairs:. Tensor-native Nx reductions are opt-in with backend: :tensor; the default path still favors the fastest scalar implementation for the current Nx binary backend. NaN behavior is controlled with nan_policy: :raise | :propagate | :omit; see docs/compatibility.md. Degenerate t-tests with zero standard error return explicit :nan, :infinity, or :neg_infinity statistics according to the compatibility contract.

Python Compatibility

The Elixir tests use committed fixtures from:

  • NumPy 2.3.0 for descriptive statistics.
  • SciPy 1.16.0 for distributions and Mann-Whitney U.
  • Statsmodels 0.14.6 for independent t-tests.

Python is not required for the normal test suite. To intentionally refresh fixtures:

cd reference/python
uv sync
uv run python generate_fixtures.py
cd ../..
mix test

Review fixture diffs before committing refreshed values.

For randomized pre-release checks against Python references:

cd reference/python
uv sync
uv run python differential_check.py --cases 250 --seed 202607

See docs/release_checklist.md for the release readiness checklist.

For runnable tutorials, see docs/statistical_tests_gallery.livemd and docs/visualization_gallery.livemd.

CI

Run:

mix format --check-formatted
mix compile --warnings-as-errors
mix test