Chi-SquaredFit v0.3.0 Chi2fit.Utilities
Provides various utilities:
- Bootstrapping
- Derivatives
- Creating Cumulative Distribution Functions / Histograms from sample data
- Solving linear, quadratic, and cubic equations
- Autocorrelation coefficients
Link to this section Summary
Types
Algorithm used to assign errors to frequencey data: Wald score and Wilson score
Cumulative Distribution Function
Functions
Calculates the autocorrelation coefficient of a list of observations
Implements bootstrapping procedure as resampling with replacement
Converts a CDF function to a list of data points
Calculates the partial derivative of a function and returns the value
Generates an empirical Cumulative Distribution Function from sample data
Calculates and returns the error associated with a list of observables
Calculates the empirical CDF from a sample
Calculates the jacobian of the function at the point x
Converts a list of number to frequency data
Converts the input so that the result is a Puiseaux diagram, that is a strict convex shape
Reads data from a file specified by filename
and returns a stream with the data parsed as floats
Returns the real roots of polynoms of order 1, 2 and 3 as a list
Returns a cumulative distribution corresponding to the input data
Converts a list of x,y data into a Cumulative Distribution function
Link to this section Types
Algorithm used to assign errors to frequencey data: Wald score and Wilson score.
Cumulative Distribution Function
Link to this section Functions
Calculates the autocorrelation coefficient of a list of observations.
For available options see fft/2
. Returns a list of the autocorrelation coefficients.
Example
iex> auto [1,2,3]
[14.0, 7.999999999999999, 2.999999999999997]
bootstrap(total :: integer, data :: [number], fun :: ([number], integer -> number), options :: Keyword.t) :: [number]
Implements bootstrapping procedure as resampling with replacement.
It supports saving intermediate results to a file using :dets
. Use the options :safe
and :filename
(see below)
Arguments:
`total` - Total number resmaplings to perform
`data` - The sample data
`fun` - The function to evaluate
`options` - A keyword list of options, see below.
Options
`:safe` - Whether to safe intermediate results to a file, so as to support continuation when it is interrupted.
Valid values are `:safe` and `:cont`.
`:filename` - The filename to use for storing intermediate results
convert_cdf({cdf, range :: [float, ...]}) :: [{float, float, float, float}]
Converts a CDF function to a list of data points.
Example
iex> convert_cdf {fn x->{:math.exp(-x),:math.exp(-x)/16,:math.exp(-x)/4} end, [1,4]}
[{1, 0.6321205588285577, 0.9080301397071394, 0.9770075349267848},
{2, 0.8646647167633873, 0.9661661791908468, 0.9915415447977117},
{3, 0.950212931632136, 0.987553232908034, 0.9968883082270085},
{4, 0.9816843611112658, 0.9954210902778164, 0.9988552725694542}]
der([float | {float, integer}], ([float] -> float), Keyword.t) :: float
Calculates the partial derivative of a function and returns the value.
Examples
The function value at a point:
iex> der([3.0], fn [x]-> x*x end) |> Float.round(10)
9.0
The first derivative of a function at a point:
iex> der([{3.0,1}], fn [x]-> x*x end) |> Float.round(10)
6.0
The second derivative of a function at a point:
iex> der([{3.0,2}], fn [x]-> x*x end) |> Float.round(10)
2.0
Partial derivatives with respect to two variables:
iex> der([{2.0,1},{3.0,1}], fn [x,y] -> 3*x*x*y end) |> Float.round(10)
12.0
Generates an empirical Cumulative Distribution Function from sample data.
Three parameters determine the resulting empirical distribution:
1) algorithm for assigning errors,
2) the size of the bins,
3) a correction for limiting the bounds on the ‘y’ values
When e.g. task effort/duration is modeled, some tasks measured have 0 time. In practice what is actually is meant, is that the task effort is between 0 and 1 hour. This is where binning of the data happens. Specify a size of the bins to control how this is done. A bin size of 1 means that 0 effort will be mapped to 1/2 effort (at the middle of the bin). This also prevents problems when the fited distribution cannot cope with an effort os zero.
In the handbook of MCMC [1] a cumulative distribution is constructed. For the largest ‘x’ value
in the sample, the ‘y’ value is exactly one (1). In combination with the Wald score this
gives zero errors on the value ‘1’. If the resulting distribution is used to fit a curve
this may give an infinite contribution to the maximum likelihood function.
Use the correction number to have a ‘y’ value of slightly less than 1 to prevent this from
happening.
Especially the combination of 0 correction, algorithm :wald
, and ‘linear’ model for
handling asymmetric errors gives problems.
The algorithm parameter determines how the errors onthe ‘y’ value are determined. Currently
supported values include :wald
and :wilson
.
References
[1] “Handbook of Monte Carlo Methods” by Kroese, Taimre, and Botev, section 8.4
error([{gamma :: number, k :: pos_integer}], :initial_sequence_method) :: {number, number}
Calculates and returns the error associated with a list of observables.
Usually these are the result of a Markov Chain Monte Carlo simulation run.
The only supported method is the so-called Initial Sequence Method
. See section 1.10.2 (Initial sequence method)
of [1].
Input is a list of autocorrelation coefficients. This may be the output of auto/2
.
References
[1] ‘Handbook of Markov Chain Monte Carlo’
Calculates the empirical CDF from a sample.
Convenience function that chains make_histogram/2
and empirical_cdf/3
.
Calculates the jacobian of the function at the point x
.
Examples
iex> jacobian([2.0,3.0], fn [x,y] -> x*y end) |> Enum.map(&Float.round(&1,10))
[3.0, 2.0]
make_histogram([number], number) :: %{required(number) => pos_integer}
Converts a list of number to frequency data.
The data is divived into bins of size binsize
and the number of data points inside a bin are counted. A map
is returned with the bin’s index as a key and value the number of data points in that bin.
Examples
iex> make_histogram [1,2,3]
[{1, 1}, {2, 1}, {3, 1}]
iex> make_histogram [1,2,3,4,5,6,5,4,3,4,5,6,7,8,9]
[{1, 1}, {2, 1}, {3, 2}, {4, 3}, {5, 3}, {6, 2}, {7, 1}, {8, 1}, {9, 1}]
iex> make_histogram [1,2,3,4,5,6,5,4,3,4,5,6,7,8,9], 3
[{0, 2}, {1, 8}, {2, 4}, {3, 1}]
puiseaux([number], [number], boolean) :: [number]
Converts the input so that the result is a Puiseaux diagram, that is a strict convex shape.
Examples
iex> puiseaux [1]
[1]
iex> puiseaux [5,3,3,2]
[5, 3, 2.5, 2]
Reads data from a file specified by filename
and returns a stream with the data parsed as floats.
It expects a single data point on a separate line and removes entries that:
- are not floats, and
- smaller than zero (0)
Returns the real roots of polynoms of order 1, 2 and 3 as a list.
Examples
Solve `2.0*x + 5.0 = 0`
iex> solve [2.0,5.0]
[-2.5]
iex> solve [2.0,-14.0,24.0]
[4.0,3.0]
iex> solve [1.0,0.0,5.0,6.0]
[-0.9999999999999999]
to_cdf([number], number, number) :: [{float, float}]
Returns a cumulative distribution corresponding to the input data.
Example
iex> to_cdf [1,2,3,4,5], 0.5, 1
[{0.5, 0.0}, {1.5, 1.0}, {2.5, 2.0}, {3.5, 3.0}, {4.5, 4.0}, {5.5, 5.0}]
iex> to_cdf [1,2,3,4,5,6,5,4,3,4,5,6,7,8,9], 0.5, 2
[{0.5, 0.0}, {2.5, 2.0}, {4.5, 7.0}, {6.5, 12.0}, {8.5, 14.0}, {10.5, 15.0}]
Converts a list of x,y data into a Cumulative Distribution function.
Supports two ways of assigning errors: Wald score or Wilson score. See [1]. Valie values for the algorithm
argument are :wald
or :wilson
.
The second argument numpoints
specifies the size of the original sample.
The returned function returns tuples for its argument where the first element is the actual value of the function and the second and third elements gice the minimum and maximum confidence bounds.
References
[1] See https://en.wikipedia.org/wiki/Cumulative_frequency_analysis
[2] https://arxiv.org/pdf/1112.2593v3.pdf
[3] See https://en.wikipedia.org/wiki/Student%27s_t-distribution:
90% confidence ==> t = 1.645 for many data points (> 120)
70% confidence ==> t = 1.000
Example
iex(1)> fun = [1,2,3,4,5]
...> |> to_cdf(0.5, 1)
...> |> Enum.map(fn {x,y}->{x,y/5} end)
...> |> to_cdf_fun(5,:wilson)
iex(2)> fun.(2.2)
{0.2, 0.027223328910987405, 0.5233625708498564}