Chi-SquaredFit v1.0.0-beta.8 Chi2fit.Utilities View Source
Provides various utilities:
- Bootstrapping
- Derivatives
- Creating Cumulative Distribution Functions / Histograms from sample data
- Solving linear, quadratic, and cubic equations
- Autocorrelation coefficients
Link to this section Summary
Types
Algorithm used to assign errors to frequencey data: Wald score and Wilson score
Average and standard deviationm (error)
Cumulative Distribution Function
Binned data with error bounds specified through low and high values
Supported numerical integration methods
Functions
Adjusts the times to working hours and/or work days
Walks a map structure while applying the function fun
Pretty-prints a nested array-like structure (list or tuple) as a table
Calculates the autocorrelation coefficient of a list of observations
Calculates the systematic errors for bins due to uncertainty in assigning data to bins
Implements bootstrapping procedure as resampling with replacement
Converts a CDF function to a list of data points
Reads CSV data, extracts one column, and returns it as a list of NaiveDateTime
Generates a Cullen & Frey plot for the sample data
Extracts data point with standard deviation from Cullen & Frey plot data
Calculates the partial derivative of a function and returns the value
Displays results of the function Chi2fit.Fit.chi2probe/4
Displays results of the function Chi2fit.Fit.chi2fit/4
Pretty prints subsequences
Generates an empirical Cumulative Distribution Function from sample data
Calculates and returns the error associated with a list of observables
Forecasts how many time periods are needed to complete size
items
Returns a function for forecasting the duration to complete a number of items
Returns a function for forecasting the number of completed items in a number periods
Calculates the empirical CDF from a sample
Numerical integration providing Gauss and Romberg types
Returns a Stream
that generates a stream of dates
Calculates the jacobian of the function at the point x
Converts a list of numbers to frequency data
Maps the date to weekdays such that weekends are eliminated; it does so with respect to a given Saturday
Maps the time of a day into the working hour period
Basic Monte Carlo simulation to repeatedly run a simulation multiple times
Calculates the nth moment of the sample
Calculates the nth centralized moment of the sample
Calculates the nth centralized moment of the sample
Calculates the nth normalized moment of the sample
Calculates the nth normalized moment of the sample
Calculates the nth normalized moment of the sample
Newton-Fourier method for locating roots and returning the interval where the root is located
Converts the input so that the result is a Puiseaux diagram, that is a strict convex shape
Outputs and formats the errors that result from a call to Chi2fit.Fit.chi2/4
Reads data from a file specified by filename
and returns a stream with the data parsed as floats
Reamples the subsequences of numbers contained in the list as determined by analyze/2
Richardson extrapolation
Examples
Counts the number of dates (datelist
) that is between consecutive dates in intervals
and returns the result as a list of numbers
Returns a list of time differences (assumes an ordered list as input)
Converts raw data to binned data with (asymmetrical) errors
Unzips lists of 1-, 2-, 3-, 4-, and 5-tuples
Link to this section Types
Algorithm used to assign errors to frequencey data: Wald score and Wilson score.
Average and standard deviationm (error)
Cumulative Distribution Function
Binned data with error bounds specified through low and high values
method() :: :gauss | :gauss2 | :gauss3 | :romberg | :romberg2 | :romberg3
Supported numerical integration methods
Link to this section Functions
adjust_times(Enumerable.t(), options :: Keyword.t()) :: Enumerable.t()
Adjusts the times to working hours and/or work days.
Options
`workhours` - a 2-tuple containing the starting and ending hours of the work day (defaults
to {8.0, 18.0})
`epoch` - the epoch to which all data elements are relative (defaults to 1970-01-01)
`saturday` - number of days since the epoch that corresponds to a Saturday (defaults
to 9)
`correct` - whether to correct the times for working hours and weekdays; possible values
`:worktime`, `:weekday`, `:"weekday+worktime"` (defaults to `false`)
Walks a map structure while applying the function fun
.
Pretty-prints a nested array-like structure (list or tuple) as a table.
Calculates the autocorrelation coefficient of a list of observations.
The implementation uses the discrete Fast Fourier Transform to calculate the autocorrelation.
For available options see Chi2fit.FFT.fft/2
. Returns a list of the autocorrelation coefficients.
Example
iex> auto [1,2,3]
[14.0, 7.999999999999999, 2.999999999999997]
binerror( data :: [number()], noise_fun :: (Enumerable.t() -> Enumerable.t()), options :: Keyword.t() ) :: [{bin :: number(), avg :: number(), error :: number()}]
Calculates the systematic errors for bins due to uncertainty in assigning data to bins.
Options
`bin` - the size of bins to use (defaults to 1)
`iterations` - the number of iterations to use to estimate the error due to noise (defatuls to 100)
Implements bootstrapping procedure as resampling with replacement.
It supports saving intermediate results to a file using :dets
. Use the options :safe
and :filename
(see below)
Arguments:
`total` - Total number resamplings to perform
`data` - The sample data
`fun` - The function to evaluate
`options` - A keyword list of options, see below.
Options
`:safe` - Whether to safe intermediate results to a file, so as to support continuation when it is interrupted.
Valid values are `:safe` and `:cont`.
`:filename` - The filename to use for storing intermediate results
Converts a CDF function to a list of data points.
Example
iex> convert_cdf {fn x->{:math.exp(-x),:math.exp(-x)/16,:math.exp(-x)/4} end, {1,4}}
[{1, 0.36787944117144233, 0.022992465073215146, 0.09196986029286058},
{2, 0.1353352832366127, 0.008458455202288294, 0.033833820809153176},
{3, 0.049787068367863944, 0.0031116917729914965, 0.012446767091965986},
{4, 0.01831563888873418, 0.0011447274305458862, 0.004578909722183545}]
csv_to_list( csvcata :: Enumerable.t(), key :: String.t(), options :: Keyword.t() ) :: [NaiveDateTime.t()]
Reads CSV data, extracts one column, and returns it as a list of NaiveDateTime
.
Examples
iex> csv = ["Done","2019/05/01","2019/06/01"] |> Stream.map(& &1)
...> csv_to_list csv, "Done", header?: true
[~N[2019-06-01 00:00:00], ~N[2019-05-01 00:00:00]]
iex> csv = ["Done","2019/May/01","2019/Jun/01"] |> Stream.map(& &1)
...> csv_to_list csv, "Done", header?: true, format: "{YYYY}/{Mshort}/{0D}"
[~N[2019-06-01 00:00:00], ~N[2019-05-01 00:00:00]]
iex> csv = ["Done","2019/May/01","2019/06/01"] |> Stream.map(& &1)
...> csv_to_list csv, "Done", header?: true, format: "{YYYY}/{Mshort}/{0D}"
[~N[2019-05-01 00:00:00]]
iex> csv = ["Done","2019/May/01","2019/06/01"] |> Stream.map(& &1)
...> csv_to_list csv, "Done", header?: true, format: ["{YYYY}/{Mshort}/{0D}","{YYYY}/{0M}/{0D}"]
[~N[2019-06-01 00:00:00], ~N[2019-05-01 00:00:00]]
cullen_frey(sample :: [number()], n :: integer()) :: cullenfrey()
Generates a Cullen & Frey plot for the sample data.
The kurtosis returned is the ‘excess kurtosis’.
cullen_frey_point(data :: cullenfrey()) :: {{x :: float(), dx :: float()}, {y :: float(), dy :: float()}}
Extracts data point with standard deviation from Cullen & Frey plot data.
Calculates the partial derivative of a function and returns the value.
Examples
The function value at a point:
iex> der([3.0], fn [x]-> x*x end) |> Float.round(3)
9.0
The first derivative of a function at a point:
iex> der([{3.0,1}], fn [x]-> x*x end) |> Float.round(3)
6.0
The second derivative of a function at a point:
iex> der([{3.0,2}], fn [x]-> x*x end) |> Float.round(3)
2.0
Partial derivatives with respect to two variables:
iex> der([{2.0,1},{3.0,1}], fn [x,y] -> 3*x*x*y end) |> Float.round(3)
12.0
display(device :: IO.device(), Chi2fit.Fit.chi2probe() | avgsd()) :: none()
Displays results of the function Chi2fit.Fit.chi2probe/4
display( device :: IO.device(), hdata :: ecdf(), model :: Chi2fit.Distribution.model(), Chi2fit.Fit.chi2fit(), options :: Keyword.t() ) :: none()
Displays results of the function Chi2fit.Fit.chi2fit/4
display_subsequences( device :: IO.device(), trends :: list(), intervals :: [NaiveDateTime.t()] ) :: none()
Pretty prints subsequences.
Generates an empirical Cumulative Distribution Function from sample data.
Three parameters determine the resulting empirical distribution:
1) algorithm for assigning errors,
2) the size of the bins,
3) a correction for limiting the bounds on the ‘y’ values
When e.g. task effort/duration is modeled, some tasks measured have 0 time. In practice what is actually is meant, is that the task effort is between 0 and 1 hour. This is where binning of the data happens. Specify a size of the bins to control how this is done. A bin size of 1 means that 0 effort will be mapped to 1/2 effort (at the middle of the bin). This also prevents problems when the fited distribution cannot cope with an effort os zero.
Supports two ways of assigning errors: Wald score or Wilson score. See [1]. Valie values for the algorithm
argument are :wald
or :wilson
.
In the handbook of MCMC [1] a cumulative distribution is constructed. For the largest ‘x’ value
in the sample, the ‘y’ value is exactly one (1). In combination with the Wald score this
gives zero errors on the value ‘1’. If the resulting distribution is used to fit a curve
this may give an infinite contribution to the maximum likelihood function.
Use the correction number to have a ‘y’ value of slightly less than 1 to prevent this from
happening.
Especially the combination of 0 correction, algorithm :wald
, and ‘linear’ model for
handling asymmetric errors gives problems.
The algorithm parameter determines how the errors onthe ‘y’ value are determined. Currently
supported values include :wald
and :wilson
.
References
[1] "Handbook of Monte Carlo Methods" by Kroese, Taimre, and Botev, section 8.4
[2] See https://en.wikipedia.org/wiki/Cumulative_frequency_analysis
[3] https://arxiv.org/pdf/1112.2593v3.pdf
[4] See https://en.wikipedia.org/wiki/Student%27s_t-distribution:
90% confidence ==> t = 1.645 for many data points (> 120)
70% confidence ==> t = 1.000
error([{gamma :: number(), k :: pos_integer()}], :initial_sequence_method) :: {var :: number(), lag :: number()}
Calculates and returns the error associated with a list of observables.
Usually these are the result of a Markov Chain Monte Carlo simulation run.
The only supported method is the so-called Initial Sequence Method
. See section 1.10.2 (Initial sequence method)
of [1].
Input is a list of autocorrelation coefficients. This may be the output of auto/2
.
References
[1] 'Handbook of Markov Chain Monte Carlo'
forecast( fun :: (() -> non_neg_integer()), size :: pos_integer(), tries :: pos_integer(), update :: (() -> number()) ) :: number()
Forecasts how many time periods are needed to complete size
items
Related functions: forecast_duration/2
and forecast_items/2
.
forecast_duration(data :: [number()] | (() -> number()), size :: pos_integer()) :: (() -> number())
Returns a function for forecasting the duration to complete a number of items.
This function is a wrapper for forecast/4
.
Arguments
`data` - either a data set to base the forecasting on, or a function that returns (random) numbers
`size` - the number of items to complete
forecast_items(data :: [number()] | (() -> number()), periods :: pos_integer()) :: (() -> number())
Returns a function for forecasting the number of completed items in a number periods.
This function is a wrapper for forecast/4
.
Arguments
`data` - either a data set to base the forecasting on, or a function that returns (random) numbers
`periods` - the number of periods to forecast the number of completed items for
Calculates the empirical CDF from a sample.
Convenience function that chains make_histogram/2
and empirical_cdf/3
.
Numerical integration providing Gauss and Romberg types.
Returns a Stream
that generates a stream of dates.
Calculates the jacobian of the function at the point x
.
Examples
iex> jacobian([2.0,3.0], fn [x,y] -> x*y end) |> Enum.map(&Float.round(&1))
[3.0, 2.0]
make_histogram([number()], number(), number()) :: [ {non_neg_integer(), pos_integer()} ]
Converts a list of numbers to frequency data.
The data is divided into bins of size binsize
and the number of data points inside a bin are counted. A map
is returned with the bin’s index as a key and as value the number of data points in that bin.
Examples
iex> make_histogram [1,2,3]
[{1, 1}, {2, 1}, {3, 1}]
iex> make_histogram [1,2,3], 1.0, 0
[{1, 1}, {2, 1}, {3, 1}]
iex> make_histogram [1,2,3,4,5,6,5,4,3,4,5,6,7,8,9]
[{1, 1}, {2, 1}, {3, 2}, {4, 3}, {5, 3}, {6, 2}, {7, 1}, {8, 1}, {9, 1}]
iex> make_histogram [1,2,3,4,5,6,5,4,3,4,5,6,7,8,9], 3, 1.5
[{0, 1}, {1, 6}, {2, 6}, {3, 2}]
map2weekdays(t :: number(), sat :: pos_integer()) :: number()
Maps the date to weekdays such that weekends are eliminated; it does so with respect to a given Saturday
Example
iex> map2weekdays(43568.123,43566)
43566.123
iex> map2weekdays(43574.123,43566)
43571.123
Maps the time of a day into the working hour period
Scales the resulting part of the day between 0..1.
Arguments
`t` - date and time of day as a float; the integer part specifies the day and the fractional part the hour of the day
`startofday` - start of the work day in hours
`endofday` - end of the working day in hours
Example
iex> map2workhours(43568.1, 8, 18)
43568.0
iex> map2workhours(43568.5, 8, 18)
43568.4
mc( iterations :: pos_integer(), fun :: (pos_integer() -> float()), options :: Keyword.t() ) :: {avg :: float(), sd :: float(), tries :: [float()]} | {avg :: float(), sd :: float()}
Basic Monte Carlo simulation to repeatedly run a simulation multiple times.
Options
`:collect_all?` - If true, collects data from each individual simulation run and returns this an the third element of the result tuple
moment(sample :: [number()], n :: pos_integer()) :: float()
Calculates the nth moment of the sample.
Example
iex> moment [1,2,3,4,5,6], 1
3.5
momentc(sample :: [number()], n :: pos_integer()) :: float()
Calculates the nth centralized moment of the sample.
Example
iex> momentc [1,2,3,4,5,6], 1
0.0
iex> momentc [1,2,3,4,5,6], 2
2.9166666666666665
momentc(sample :: [number()], n :: pos_integer(), mu :: float()) :: float()
Calculates the nth centralized moment of the sample.
Example
iex> momentc [1,2,3,4,5,6], 2, 3.5
2.9166666666666665
momentn(sample :: [number()], n :: pos_integer()) :: float()
Calculates the nth normalized moment of the sample.
Example
iex> momentn [1,2,3,4,5,6], 1
0.0
iex> momentn [1,2,3,4,5,6], 2
1.0
iex> momentn [1,2,3,4,5,6], 4
1.7314285714285718
momentn(sample :: [number()], n :: pos_integer(), mu :: float()) :: float()
Calculates the nth normalized moment of the sample.
Example
iex> momentn [1,2,3,4,5,6], 4, 3.5
1.7314285714285718
momentn( sample :: [number()], n :: pos_integer(), mu :: float(), sigma :: float() ) :: float()
Calculates the nth normalized moment of the sample.
Newton-Fourier method for locating roots and returning the interval where the root is located.
See [https://en.wikipedia.org/wiki/Newton%27s_method#Newton.E2.80.93Fourier_method]
Converts the input so that the result is a Puiseaux diagram, that is a strict convex shape.
Examples
iex> puiseaux [1]
[1]
iex> puiseaux [5,3,3,2]
[5, 3, 2.5, 2]
Outputs and formats the errors that result from a call to Chi2fit.Fit.chi2/4
Errors are tuples of length 2 and larger: {[min1,max1], [min2,max2], ...}
.
Reads data from a file specified by filename
and returns a stream with the data parsed as floats.
It expects a single data point on a separate line and removes entries that:
- are not floats, and
- smaller than zero (0)
Reamples the subsequences of numbers contained in the list as determined by analyze/2
Richardson extrapolation.
subsequences(Enumerable.t()) :: Enumerable.t()
Examples
iex> subsequences []
[]
iex> subsequences [:a, :b]
[[:a], [:a, :b]]
iex> Stream.cycle([1,2,3]) |> subsequences |> Enum.take(4)
[[1], [1, 2], [1, 2, 3], [1, 2, 3, 1]]
throughput(intervals :: Enumerable.t(), datelist :: [NaiveDateTime.t()]) :: [ number() ]
Counts the number of dates (datelist
) that is between consecutive dates in intervals
and returns the result as a list of numbers.
time_diff(data :: Enumrable.t(), options :: Keyword.t()) :: Enumerable.t()
Returns a list of time differences (assumes an ordered list as input)
Options
`cutoff` - time differences below the cutoff are changed to the cutoff value (defaults to `0.01`)
`drop?` - whether to drop time differences below the cutoff (defaults to `false`)
Converts raw data to binned data with (asymmetrical) errors.
Unzips lists of 1-, 2-, 3-, 4-, and 5-tuples.