Compatibility Contract

Copy Markdown View Source

Statwise is a one-dimensional statistics library with results checked against NumPy, SciPy, and Statsmodels fixtures. The public API is Elixir-native; Python libraries are behavioral references, not API templates.

Shared Input Rules

  • Raw samples are finite numeric lists or one-dimensional Nx.Tensors.
  • Integers are cast to f64.
  • Multidimensional tensors raise ArgumentError.
  • Infinite values (:infinity and :neg_infinity from Nx special values) raise ArgumentError.
  • NaN behavior is controlled with nan_policy.

The inferential test APIs also accept dataframe-style column inputs through columns: and pairs: options:

  • A map of columns may be passed directly, with atom or string column keys.
  • An Explorer.DataFrame may be passed when Explorer is loaded by the caller's application. Explorer is optional and is not a Statwise dependency.
  • Extracted columns must contain raw sample values supported by Statwise.
  • nil column values are treated as :nan and then handled by the selected nan_policy.
  • Column extraction defaults to input: :list. Pass input: :tensor to convert map columns to one-dimensional f64 tensors or, for Explorer columns, to call Explorer.Series.to_tensor/2 when available.

For two-sample tests, columns: [:x, :y] returns one result. Passing pairs: [x: :y, before: :after] returns a map keyed by {left_column, right_column}. For one-sample t-tests, columns: :x returns one result and columns: [:x, :y] returns a map keyed by column.

Tensor-native reductions are opt-in with backend: :tensor. Without this option, tensor inputs are normalized through the same scalar path as lists, which is currently faster for many small and mid-sized operations on Nx.BinaryBackend.

NaN Policy

Supported values:

  • :raise rejects NaN inputs. This is the default.
  • :propagate returns NaN statistics/p-values for inferential tests or NaN values for descriptive/ranking operations where applicable.
  • :omit removes NaNs before computing.

Paired t-tests apply :omit pairwise: a pair is removed when either side is NaN. Independent tests and Mann-Whitney U apply :omit per sample.

If omission leaves too few observations, the function raises the same insufficient-sample error it would raise for a too-small original sample.

Descriptive Statistics

Reference: NumPy 2.3.0.

Functions:

Variance defaults to sample variance with correction: 1. Population variance is available with correction: 0.

T-Tests

References:

  • Statsmodels 0.14.6 for independent t-tests.
  • SciPy 1.16.0 for one-sample and paired t-tests.

Functions:

Supported alternatives are :two_sided, :greater, and :less.

Independent tests support:

  • variance: :welch
  • variance: :pooled
  • null_difference: float
  • confidence_level: float, defaulting to 0.95
  • effect_size: boolean, defaulting to false

Confidence intervals are returned in result.confidence_interval.

  • One-sample t-tests report intervals for the sample mean, matching SciPy's TtestResult.confidence_interval.
  • Paired t-tests report intervals for the mean paired difference.
  • Independent t-tests report intervals for mean_x - mean_y.
  • One-sided alternatives use one infinite bound, represented as :infinity or :neg_infinity.

When effect_size: true, t-test results include:

  • cohens_d
  • hedges_g

One-sample and paired tests use the sample standard deviation as the standardizer. Independent tests use the pooled standard deviation as the standardizer for both Welch and pooled tests. Hedges' g uses the small-sample correction 1 - 3 / (4 * df - 1).

Zero standard-error cases are explicit:

  • If the observed difference is zero, statistic, p_value, and Welch df values that are undefined are returned as :nan.
  • If the observed difference is positive with zero standard error, the statistic is :infinity.
  • If the observed difference is negative with zero standard error, the statistic is :neg_infinity.
  • Pooled independent t-tests keep their finite degrees of freedom in this case. Welch independent t-tests return df: :nan when both samples have zero variance, matching Statsmodels' degenerate-output shape.

Ranking

Reference: SciPy 1.16.0 rankdata(method="average").

Function:

Only average tie ranking is currently supported. Other tie methods are intentionally deferred.

Mann-Whitney U

Reference: SciPy 1.16.0 mannwhitneyu.

Function:

Supported alternatives are :two_sided, :greater, and :less.

Supported methods:

  • :asymptotic
  • :exact
  • :auto

Like SciPy, explicit method: :exact does not apply a tie correction. :auto uses exact p-values when there are no ties and the smaller sample has at most 8 observations; otherwise it uses the asymptotic normal approximation.

The returned statistic is U1, the U statistic for the first sample. U1 and U2 are also available in result metadata.

Mann-Whitney U results include:

  • effect_size.common_language, computed as U1 / (n_x * n_y).
  • effect_size.rank_biserial, computed as 2 * common_language - 1.
  • effect_size.cliffs_delta, an alias of rank_biserial.

Deferred Compatibility Areas

  • Weighted tests.
  • Multidimensional axis behavior.
  • Missing-data policies beyond nan_policy for the current functions.
  • Masked arrays.
  • Permutation tests.
  • Additional rank tie methods.