Current priority is numerical compatibility and predictable edge behavior.
Run the Python-reference comparison benchmark with:
cd reference/python
uv run python benchmark.py --mode quick
The benchmark generates deterministic sample data, times the pinned Python
references, and then runs Statwise against the same data via
reference/elixir/benchmark.exs. Results are reported as microseconds per
operation using the best timed trial from several full repeat batches to reduce
scheduler noise. Use --trials N to override the default trial count,
--json-output benchmark_baseline_quick.json to refresh the tracked baseline,
or --baseline benchmark_baseline_quick.json --fail-ratio 2.0 to check for
regressions.
Implemented now:
- Descriptive statistics accept one-dimensional Nx tensors.
- Tensor-native descriptive reductions are available with
backend: :tensor; the default tensor path still normalizes through the scalar implementation because it benchmarks faster onNx.BinaryBackendfor the current workloads. - List-backed descriptive statistics use direct Elixir reductions to avoid building Nx tensors for scalar results.
- Ranking and Mann-Whitney U use ordinary Elixir control flow because tie grouping and exact distribution logic are easier to audit this way.
- Mann-Whitney U computes rank sums and tie correction from one sorted pass, and caches exact U distributions by sample-size pair.
- T-tests use scalar formulas after one-dimensional input normalization.
- T-tests reuse single-pass sample summaries instead of recomputing mean/variance/standard error through repeated normalization.
- Student's t quantiles stop bisection after double-precision convergence instead of running a fixed long iteration count.
- Dataframe-style test APIs extract columns first, then reuse the same raw sample implementations.
- Dataframe-style test APIs support
input: :tensor, includingExplorer.Series.to_tensor/2when Explorer is loaded by the caller.
Before optimizing:
- Benchmark list input versus tensor input.
- Benchmark dataframe column extraction overhead separately from test computation.
- Identify hot paths with representative sample sizes.
- Preserve fixture compatibility before and after optimization.
- Prefer
Nx.Defnonly when the algorithm maps cleanly to tensor operations.
Candidate future work:
- Batched descriptive statistics with
axis. - Batched one-sample and independent t-tests.
- Faster ranking for very large Mann-Whitney samples.
- Faster Student's t CDF approximations for t-test p-values.
- Optional EXLA benchmarks to determine when
backend: :tensorbecomes a net win.