Chi2fit.Statistics.empirical_cdf

You're seeing just the function empirical_cdf, go back to Chi2fit.Statistics module for more information.
Link to this function

empirical_cdf(data, bin \\ {1.0, 0.5}, algorithm \\ :wilson, correction \\ 0)

View Source

Specs

empirical_cdf(
  [{float(), number()}],
  {number(), number()},
  algorithm(),
  integer()
) ::
  {cdf(), bins :: [float()], numbins :: pos_integer(), sum :: float()}

Generates an empirical Cumulative Distribution Function from sample data.

Three parameters determine the resulting empirical distribution:

  1. algorithm for assigning errors,

  2. the size of the bins,

  3. a correction for limiting the bounds on the 'y' values

When e.g. task effort/duration is modeled, some tasks measured have 0 time. In practice what is actually is meant, is that the task effort is between 0 and 1 hour. This is where binning of the data happens. Specify a size of the bins to control how this is done. A bin size of 1 means that 0 effort will be mapped to 1/2 effort (at the middle of the bin). This also prevents problems when the fited distribution cannot cope with an effort os zero.

Supports two ways of assigning errors: Wald score or Wilson score. See [1]. Valie values for the algorithm argument are :wald or :wilson.

In the handbook of MCMC [1] a cumulative distribution is constructed. For the largest 'x' value in the sample, the 'y' value is exactly one (1). In combination with the Wald score this gives zero errors on the value '1'. If the resulting distribution is used to fit a curve this may give an infinite contribution to the maximum likelihood function. Use the correction number to have a 'y' value of slightly less than 1 to prevent this from happening. Especially the combination of 0 correction, algorithm :wald, and 'linear' model for handling asymmetric errors gives problems.

The algorithm parameter determines how the errors onthe 'y' value are determined. Currently supported values include :wald and :wilson.

References

[1] "Handbook of Monte Carlo Methods" by Kroese, Taimre, and Botev, section 8.4
[2] See https://en.wikipedia.org/wiki/Cumulative_frequency_analysis
[3] https://arxiv.org/pdf/1112.2593v3.pdf
[4] See https://en.wikipedia.org/wiki/Student%27s_t-distribution:
    90% confidence ==> t = 1.645 for many data points (> 120)
    70% confidence ==> t = 1.000