Dataset v0.4.0 Dataset View Source

Datasets represent labeled tabular data.

Datasets are enumerable:

iex> Dataset.new([{:a, :b, :c},
...>              {:A, :B, :C},
...>              {:i, :ii, :iii},
...>              {:I, :II, :III}],
...>             {"one", "two", "three"})
...> |> Enum.map(&elem(&1, 2))
[:c, :C, :iii, :III]

Datasets are also collectable:

iex> for x <- 0..10, into: Dataset.empty({:n}), do: x
%Dataset{labels: {:n}, rows: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]}

Link to this section Summary

Functions

Return a dataset with no rows and labels specified by the tuple passed as label. If label is not specified, return an empty dataset with zero columns.

Return the result of performing an inner join on datasets ds1 and ds2, using k1 and k2 as the key labels on each respective dataset. The returned dataset will contain columns for each label specified in out_labels, which is a keyword list of the form [left_or_right: label, ...].

Return the result of performing a left join on datasets ds1 and ds2, using k1 and k2 as the key labels on each respective dataset. The returned dataset will contain columns for each label specified in out_labels, which is a keyword list of the form [left_or_right: label, ...].

Construct a new dataset. A dataset is a list of tuples. With no arguments, an empty dataset with zero columns is constructed. Withf one argument a dataset is constructed with the passed object interpreted as rows and labels beginning with 0 are generated, the number of which are determined by size of the first tuple in the data.

Return the result of performing an outer join on datasets ds1 and ds2, using k1 and k2 as the key labels on each respective dataset. The returned dataset will contain columns for each label specified in out_labels, which is a keyword list of the form [left_or_right: label, ...].

Return the result of performing a right join on datasets ds1 and ds2, using k1 and k2 as the key labels on each respective dataset. The returned dataset will contain columns for each label specified in out_labels, which is a keyword list of the form [left_or_right: label, ...].

Returns a dataset with each value in row i and column j transposed into row j and column i. The dataset is labelled with integer indicies beginning with zero.

Return a new dataset with columns chosen from the input dataset ds.

Return the contents of _ds as a list of maps.

Link to this section Functions

Return a dataset with no rows and labels specified by the tuple passed as label. If label is not specified, return an empty dataset with zero columns.

Link to this function

inner_join(ds1, ds2, k1, k2 \\ nil, out_labels) View Source

Return the result of performing an inner join on datasets ds1 and ds2, using k1 and k2 as the key labels on each respective dataset. The returned dataset will contain columns for each label specified in out_labels, which is a keyword list of the form [left_or_right: label, ...].

iex> iso_countries =
...>   Dataset.new(
...>     [
...>       {"us", "United States"},
...>       {"uk", "United Kingdom"},
...>       {"ca", "Canada"},
...>       {"de", "Germany"},
...>       {"nl", "Netherlands"},
...>       {"sg", "Singapore"}
...>     ],
...>     {:iso_country, :country_name}
...>   )
...>
...> country_clicks =
...>   Dataset.new(
...>     [
...>       {"United States", "13"},
...>       {"United Kingdom", "11"},
...>       {"Canada", "4"},
...>       {"Germany", "4"},
...>       {"France", "2"}
...>     ],
...>     {:country_name, :clicks}
...>   )
...>
...> Dataset.inner_join(country_clicks, iso_countries, :country_name,
...>   right: :iso_country,
...>   left: :clicks
...> )
%Dataset{
  labels: {:iso_country, :clicks},
  rows: [{"ca", "4"}, {"de", "4"}, {"uk", "11"}, {"us", "13"}]
}
Link to this function

left_join(ds1, ds2, k1, k2 \\ nil, out_labels) View Source

Return the result of performing a left join on datasets ds1 and ds2, using k1 and k2 as the key labels on each respective dataset. The returned dataset will contain columns for each label specified in out_labels, which is a keyword list of the form [left_or_right: label, ...].

iex> iso_countries =
...>   Dataset.new(
...>     [
...>       {"us", "United States"},
...>       {"uk", "United Kingdom"},
...>       {"ca", "Canada"},
...>       {"de", "Germany"},
...>       {"nl", "Netherlands"},
...>       {"sg", "Singapore"}
...>     ],
...>     {:iso_country, :country_name}
...>   )
...>
...> country_clicks =
...>   Dataset.new(
...>     [
...>       {"United States", "13"},
...>       {"United Kingdom", "11"},
...>       {"Canada", "4"},
...>       {"Germany", "4"},
...>       {"France", "2"}
...>     ],
...>     {:country_name, :clicks}
...>   )
...>
...>  Dataset.left_join(country_clicks, iso_countries, :country_name,
...>    right: :iso_country,
...>    left: :clicks
...>  )
%Dataset{
  labels: {:iso_country, :clicks},
  rows: [{"ca", "4"}, {nil, "2"}, {"de", "4"}, {"uk", "11"}, {"us", "13"}]
}
Link to this function

new(rows \\ [], labels \\ nil) View Source

Construct a new dataset. A dataset is a list of tuples. With no arguments, an empty dataset with zero columns is constructed. Withf one argument a dataset is constructed with the passed object interpreted as rows and labels beginning with 0 are generated, the number of which are determined by size of the first tuple in the data.

iex> Dataset.new()
%Dataset{rows: [], labels: {}}

iex> Dataset.new([{:foo, :bar}, {:eggs, :ham}])
%Dataset{rows: [foo: :bar, eggs: :ham], labels: {0, 1}}

iex> Dataset.new([{0,0}, {1, 1}, {2, 4}, {3, 9}],
...>             {:x, :x_squared})
%Dataset{labels: {:x, :x_squared}, rows: [{0, 0}, {1, 1}, {2, 4}, {3, 9}]}
Link to this function

outer_join(ds1, ds2, k1, k2 \\ nil, out_labels) View Source

Return the result of performing an outer join on datasets ds1 and ds2, using k1 and k2 as the key labels on each respective dataset. The returned dataset will contain columns for each label specified in out_labels, which is a keyword list of the form [left_or_right: label, ...].

iex> iso_countries =
...>   Dataset.new(
...>     [
...>       {"us", "United States"},
...>       {"uk", "United Kingdom"},
...>       {"ca", "Canada"},
...>       {"de", "Germany"},
...>       {"nl", "Netherlands"},
...>       {"sg", "Singapore"}
...>     ],
...>     {:iso_country, :country_name}
...>   )
...>
...> country_clicks =
...>   Dataset.new(
...>     [
...>       {"United States", "13"},
...>       {"United Kingdom", "11"},
...>       {"Canada", "4"},
...>       {"Germany", "4"},
...>       {"France", "2"}
...>     ],
...>     {:country_name, :clicks}
...>   )
...>
...>  Dataset.outer_join(country_clicks, iso_countries, :country_name,
...>    right: :iso_country,
...>    left: :clicks
...>  )
%Dataset{
  labels: {:iso_country, :clicks},
  rows: [
    {"ca", "4"},
    {nil, "2"},
    {"de", "4"},
    {"nl", nil},
    {"sg", nil},
    {"uk", "11"},
    {"us", "13"}
  ]
}
Link to this function

right_join(ds1, ds2, k1, k2 \\ nil, out_labels) View Source

Return the result of performing a right join on datasets ds1 and ds2, using k1 and k2 as the key labels on each respective dataset. The returned dataset will contain columns for each label specified in out_labels, which is a keyword list of the form [left_or_right: label, ...].

iex> iso_countries =
...>   Dataset.new(
...>     [
...>       {"us", "United States"},
...>       {"uk", "United Kingdom"},
...>       {"ca", "Canada"},
...>       {"de", "Germany"},
...>       {"nl", "Netherlands"},
...>       {"sg", "Singapore"}
...>     ],
...>     {:iso_country, :country_name}
...>   )
...>
...> country_clicks =
...>   Dataset.new(
...>     [
...>       {"United States", "13"},
...>       {"United Kingdom", "11"},
...>       {"Canada", "4"},
...>       {"Germany", "4"},
...>       {"France", "2"}
...>     ],
...>     {:country_name, :clicks}
...>   )
...>
...>  Dataset.right_join(country_clicks, iso_countries, :country_name,
...>    right: :iso_country,
...>    left: :clicks
...>  )
%Dataset{
  labels: {:iso_country, :clicks},
  rows: [
    {"ca", "4"},
    {"de", "4"},
    {"nl", nil},
    {"sg", nil},
    {"uk", "11"},
    {"us", "13"}
  ]
}

Returns a dataset with each value in row i and column j transposed into row j and column i. The dataset is labelled with integer indicies beginning with zero.

iex> Dataset.new([{:a,:b,:c},
...>              {:A, :B, :C},
...>              {:i, :ii, :iii},
...>              {:I, :II, :III}])
...> |> Dataset.rotate()
%Dataset{
  labels: {0, 1, 2, 3},
  rows: [{:a, :A, :i, :I},
         {:b, :B, :ii, :II},
         {:c, :C, :iii, :III}]
}

Return a new dataset with columns chosen from the input dataset ds.

iex> Dataset.new([{:a,:b,:c},
...>              {:A, :B, :C},
...>              {:i, :ii, :iii},
...>              {:I, :II, :III}],
...>             {"first", "second", "third"})
...> |> Dataset.select(["second"])
%Dataset{rows: [{:b}, {:B}, {:ii}, {:II}], labels: {"second"}}

Return the contents of _ds as a list of maps.