View Source Explorer.DataFrame (Explorer v0.1.0)

The DataFrame struct and API.

Dataframes are two-dimensional tabular data structures similar to a spreadsheet. For example, the Iris dataset:

iex> Explorer.Datasets.iris()
#Explorer.DataFrame<
  [rows: 150, columns: 5]
  sepal_length float [5.1, 4.9, 4.7, 4.6, 5.0, "..."]
  sepal_width float [3.5, 3.0, 3.2, 3.1, 3.6, "..."]
  petal_length float [1.4, 1.4, 1.3, 1.5, 1.4, "..."]
  petal_width float [0.2, 0.2, 0.2, 0.2, 0.2, "..."]
  species string ["Iris-setosa", "Iris-setosa", "Iris-setosa", "Iris-setosa", "Iris-setosa", "..."]
>

This dataframe has 150 rows and five columns. Each column is an Explorer.Series of the same length (150).

iex> df = Explorer.Datasets.iris()
iex> df["sepal_length"]
#Explorer.Series<
  float[150]
  [5.1, 4.9, 4.7, 4.6, 5.0, 5.4, 4.6, 5.0, 4.4, 4.9, 5.4, 4.8, 4.8, 4.3, 5.8, 5.7, 5.4, 5.1, 5.7, 5.1, 5.4, 5.1, 4.6, 5.1, 4.8, 5.0, 5.0, 5.2, 5.2, 4.7, 4.8, 5.4, 5.2, 5.5, 4.9, 5.0, 5.5, 4.9, 4.4, 5.1, 5.0, 4.5, 4.4, 5.0, 5.1, 4.8, 5.1, 4.6, 5.3, 5.0, ...]
>

creating-dataframes

Creating dataframes

Dataframes can be created from normal Elixir objects. The main ways you might do this are Explorer.DataFrame.from_columns/1 and Explorer.DataFrame.from_rows/1. For example:

iex> Explorer.DataFrame.from_columns(a: ["a", "b"], b: [1, 2])
#Explorer.DataFrame<
  [rows: 2, columns: 2]
  a string ["a", "b"]
  b integer [1, 2]
>

io

IO

Explorer supports reading and writing of:

verbs

Verbs

Explorer uses the idea of a consistent set of SQL-like verbs like dplyr which can help solve common data manipulation challenges. These are split into single table verbs and multiple table verbs.

single-table-verbs

Single table verbs

Single table verbs are (unsurprisingly) used for manipulating a single dataframe. These are:

Each of these combine with Explorer.DataFrame.group_by/2 for operating by group.

multiple-table-verbs

Multiple table verbs

Multiple table verbs are used for combining tables. These are:

access

Access

In addition to this "grammar" of data manipulation, you'll find useful functions for slicing and dicing dataframes such as Explorer.DataFrame.pull/2, Explorer.DataFrame.head/2, Explorer.DataFrame.sample/3, Explorer.DataFrame.slice/3, and Explorer.DataFrame.take/2.

Explorer.DataFrame also implements the Elixir.Access protocol. This should be familiar for users coming from other language with dataframes such as R or Python. For example:

iex> df = Explorer.Datasets.wine()
iex> df["class"]
#Explorer.Series<
  integer[178]
  [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...]
>

Link to this section Summary

Functions

Arranges/sorts rows by columns.

Combine two or more dataframes row-wise (stack).

Combine two dataframes row-wise.

Takes distinct rows by a selection of columns.

Gets the dtypes of the dataframe columns.

Turns a set of columns to dummy variables.

Subset rows using column values.

Creates a new dataframe from a map or keyword of lists or series.

Creates a new dataframe from a list of maps or keyword lists.

Group the dataframe by one or more variables.

Returns the groups of a dataframe.

Returns the first n rows of the dataframe.

Creates and modifies columns.

Returns the number of columns in the dataframe.

Returns the number of rows in the dataframe.

Gets the names of the dataframe columns.

Pivot data from wide to long.

Extracts a single column as a series.

Similar to read_csv/2 but raises if there is a problem reading the CSV.

Reads a delimited file into a dataframe.

Similar to read_ipc/2 but raises if there is a problem reading the IPC file.

Reads a IPC file into a dataframe.

Read a file of JSON objects or lists separated by new lines

Reads a parquet file into a dataframe.

Renames columns.

Renames columns with a function.

Sample rows from a dataframe.

Selects a subset of columns by name.

Gets the shape of the dataframe as a {height, width} tuple.

Subset a continuous set of rows.

Summarise each group to a single row.

Display the DataFrame in a tabular fashion.

Returns the last n rows of the dataframe.

Subset rows with a list of indices.

Writes a dataframe to a binary representation of a delimited file.

Converts a dataframe to a map.

Removes grouping variables.

Similar to write_csv/3 but raises if there is a problem reading the CSV.

Writes a dataframe to a delimited file.

Writes a dataframe to a IPC file.

Writes a dataframe to a ndjson file.

Writes a dataframe to a parquet file.

Link to this section Types

@type data() :: Explorer.Backend.DataFrame.t()
@type t() :: %Explorer.DataFrame{data: data(), groups: term()}

Link to this section Functions

@spec arrange(
  df :: t(),
  columns :: String.t() | [String.t() | {:asc | :desc, String.t()}]
) :: t()

Arranges/sorts rows by columns.

examples

Examples

A single column name will sort ascending by that column:

iex> df = Explorer.DataFrame.from_columns(a: ["b", "c", "a"], b: [1, 2, 3])
iex> Explorer.DataFrame.arrange(df, "a")
#Explorer.DataFrame<
  [rows: 3, columns: 2]
  a string ["a", "b", "c"]
  b integer [3, 1, 2]
>

You can also sort descending:

iex> df = Explorer.DataFrame.from_columns(a: ["b", "c", "a"], b: [1, 2, 3])
iex> Explorer.DataFrame.arrange(df, desc: "a")
#Explorer.DataFrame<
  [rows: 3, columns: 2]
  a string ["c", "b", "a"]
  b integer [2, 1, 3]
>

Sorting by more than one column sorts them in the order they are entered:

iex> df = Explorer.Datasets.fossil_fuels()
iex> Explorer.DataFrame.arrange(df, asc: "total", desc: "country")
#Explorer.DataFrame<
  [rows: 1094, columns: 10]
  year integer [2010, 2012, 2011, 2013, 2014, "..."]
  country string ["ZIMBABWE", "ZIMBABWE", "ZIMBABWE", "ZIMBABWE", "ZIMBABWE", "..."]
  total integer [2121, 2125, 2608, 3184, 3278, "..."]
  solid_fuel integer [1531, 917, 1584, 1902, 2097, "..."]
  liquid_fuel integer [481, 1006, 888, 1119, 1005, "..."]
  gas_fuel integer [0, 0, 0, 0, 0, "..."]
  cement integer [109, 201, 136, 162, 177, "..."]
  gas_flaring integer [0, 0, 0, 0, 0, "..."]
  per_capita float [0.15, 0.15, 0.18, 0.21, 0.22, "..."]
  bunker_fuels integer [7, 9, 8, 9, 9, "..."]
>

Combine two or more dataframes row-wise (stack).

Column names and dtypes must match. The only exception is for numeric columns that can be mixed together, and casted automatically to float columns.

examples

Examples

iex> df1 = Explorer.DataFrame.from_columns(x: [1, 2, 3], y: ["a", "b", "c"])
iex> df2 = Explorer.DataFrame.from_columns(x: [4, 5, 6], y: ["d", "e", "f"])
iex> Explorer.DataFrame.concat_rows([df1, df2])
#Explorer.DataFrame<
  [rows: 6, columns: 2]
  x integer [1, 2, 3, 4, 5, "..."]
  y string ["a", "b", "c", "d", "e", "..."]
>

iex> df1 = Explorer.DataFrame.from_columns(x: [1, 2, 3], y: ["a", "b", "c"])
iex> df2 = Explorer.DataFrame.from_columns(x: [4.2, 5.3, 6.4], y: ["d", "e", "f"])
iex> Explorer.DataFrame.concat_rows([df1, df2])
#Explorer.DataFrame<
  [rows: 6, columns: 2]
  x float [1.0, 2.0, 3.0, 4.2, 5.3, "..."]
  y string ["a", "b", "c", "d", "e", "..."]
>

Combine two dataframes row-wise.

concat_rows(df1, df2) is equivalent to concat_rows([df1, df2]).

Link to this function

distinct(df, opts \\ [])

View Source
@spec distinct(df :: t(), opts :: Keyword.t()) :: t()

Takes distinct rows by a selection of columns.

examples

Examples

By default will return unique values of the requested columns:

iex> df = Explorer.Datasets.fossil_fuels()
iex> Explorer.DataFrame.distinct(df, columns: ["year", "country"])
#Explorer.DataFrame<
  [rows: 1094, columns: 2]
  year integer [2010, 2010, 2010, 2010, 2010, "..."]
  country string ["AFGHANISTAN", "ALBANIA", "ALGERIA", "ANDORRA", "ANGOLA", "..."]
>

If keep_all? is set to true, then the first value of each column not in the requested columns will be returned:

iex> df = Explorer.Datasets.fossil_fuels()
iex> Explorer.DataFrame.distinct(df, columns: ["year", "country"], keep_all?: true)
#Explorer.DataFrame<
  [rows: 1094, columns: 10]
  year integer [2010, 2010, 2010, 2010, 2010, "..."]
  country string ["AFGHANISTAN", "ALBANIA", "ALGERIA", "ANDORRA", "ANGOLA", "..."]
  total integer [2308, 1254, 32500, 141, 7924, "..."]
  solid_fuel integer [627, 117, 332, 0, 0, "..."]
  liquid_fuel integer [1601, 953, 12381, 141, 3649, "..."]
  gas_fuel integer [74, 7, 14565, 0, 374, "..."]
  cement integer [5, 177, 2598, 0, 204, "..."]
  gas_flaring integer [0, 0, 2623, 0, 3697, "..."]
  per_capita float [0.08, 0.43, 0.9, 1.68, 0.37, "..."]
  bunker_fuels integer [9, 7, 663, 0, 321, "..."]
>

A callback on the dataframe's names can be passed instead of a list (like select/3):

iex> df = Explorer.DataFrame.from_columns(x1: [1, 3, 3], x2: ["a", "c", "c"], y1: [1, 2, 3])
iex> Explorer.DataFrame.distinct(df, columns: &String.starts_with?(&1, "x"))
#Explorer.DataFrame<
  [rows: 2, columns: 2]
  x1 integer [1, 3]
  x2 string ["a", "c"]
>
Link to this function

drop_nil(df, columns_or_column \\ [])

View Source
@spec drop_nil(df :: t(), columns_or_column :: [String.t()] | String.t()) :: t()

Drop nil values.

Optionally accepts a subset of columns.

examples

Examples

iex> df = Explorer.DataFrame.from_columns(a: [1, 2, nil], b: [1, nil, 3])
iex> Explorer.DataFrame.drop_nil(df)
#Explorer.DataFrame<
  [rows: 1, columns: 2]
  a integer [1]
  b integer [1]
>
@spec dtypes(df :: t()) :: [atom()]

Gets the dtypes of the dataframe columns.

examples

Examples

iex> df = Explorer.DataFrame.from_columns(floats: [1.0, 2.0], ints: [1, 2])
iex> Explorer.DataFrame.dtypes(df)
[:float, :integer]

Turns a set of columns to dummy variables.

examples

Examples

iex> df = Explorer.DataFrame.from_columns(a: ["a", "b", "a", "c"], b: ["b", "a", "b", "d"])
iex> Explorer.DataFrame.dummies(df, ["a"])
#Explorer.DataFrame<
  [rows: 4, columns: 3]
  a_a integer [1, 0, 1, 0]
  a_b integer [0, 1, 0, 0]
  a_c integer [0, 0, 0, 1]
>

iex> df = Explorer.DataFrame.from_columns(a: ["a", "b", "a", "c"], b: ["b", "a", "b", "d"])
iex> Explorer.DataFrame.dummies(df, ["a", "b"])
#Explorer.DataFrame<
  [rows: 4, columns: 6]
  a_a integer [1, 0, 1, 0]
  a_b integer [0, 1, 0, 0]
  a_c integer [0, 0, 0, 1]
  b_a integer [0, 1, 0, 0]
  b_b integer [1, 0, 1, 0]
  b_d integer [0, 0, 0, 1]
>
@spec filter(df :: t(), mask :: Explorer.Series.t() | [boolean()]) :: t()
@spec filter(df :: t(), callback :: function()) :: t()

Subset rows using column values.

examples

Examples

You can pass a mask directly:

iex> df = Explorer.DataFrame.from_columns(a: ["a", "b", "c"], b: [1, 2, 3])
iex> Explorer.DataFrame.filter(df, Explorer.Series.greater(df["b"], 1))
#Explorer.DataFrame<
  [rows: 2, columns: 2]
  a string ["b", "c"]
  b integer [2, 3]
>

You can combine masks using Explorer.Series.and/2 or Explorer.Series.or/2:

iex> df = Explorer.DataFrame.from_columns(a: ["a", "b", "c"], b: [1, 2, 3])
iex> b_gt = Explorer.Series.greater(df["b"], 1)
iex> a_eq = Explorer.Series.equal(df["a"], "b")
iex> Explorer.DataFrame.filter(df, Explorer.Series.and(a_eq, b_gt))
#Explorer.DataFrame<
  [rows: 1, columns: 2]
  a string ["b"]
  b integer [2]
>

Including a list:

iex> df = Explorer.DataFrame.from_columns(a: ["a", "b", "c"], b: [1, 2, 3])
iex> Explorer.DataFrame.filter(df, [false, true, false])
#Explorer.DataFrame<
  [rows: 1, columns: 2]
  a string ["b"]
  b integer [2]
>

Or you can invoke a callback on the dataframe:

iex> df = Explorer.DataFrame.from_columns(a: ["a", "b", "c"], b: [1, 2, 3])
iex> Explorer.DataFrame.filter(df, &Explorer.Series.greater(&1["b"], 1))
#Explorer.DataFrame<
  [rows: 2, columns: 2]
  a string ["b", "c"]
  b integer [2, 3]
>
Link to this function

from_columns(series, opts \\ [])

View Source
@spec from_columns(series :: map() | Keyword.t(), opts :: Keyword.t()) :: t()

Creates a new dataframe from a map or keyword of lists or series.

Lists and series must be the same length. This function has the same validations from Explorer.Series.from_list/2 for lists, so they must conform to the requirements for making a series.

options

Options

examples

Examples

iex> Explorer.DataFrame.from_columns(%{floats: [1.0, 2.0], ints: [1, nil]})
#Explorer.DataFrame<
  [rows: 2, columns: 2]
  floats float [1.0, 2.0]
  ints integer [1, nil]
>

iex> Explorer.DataFrame.from_columns([floats: [1.0, 2.0], ints: [1, nil]])
#Explorer.DataFrame<
  [rows: 2, columns: 2]
  floats float [1.0, 2.0]
  ints integer [1, nil]
>

iex> Explorer.DataFrame.from_columns(floats: Explorer.Series.from_list([1.0, 2.0]), ints: Explorer.Series.from_list([1, nil]))
#Explorer.DataFrame<
  [rows: 2, columns: 2]
  floats float [1.0, 2.0]
  ints integer [1, nil]
>

iex> Explorer.DataFrame.from_columns(%{floats: [1.0, 2.0], ints: [1, "wrong"]})
** (ArgumentError) cannot create series "ints": cannot make a series from mismatched types - the value "wrong" does not match inferred dtype integer
Link to this function

from_rows(rows, opts \\ [])

View Source
@spec from_rows(rows :: [map()] | Keyword.t(), opts :: Keyword.t()) :: t()

Creates a new dataframe from a list of maps or keyword lists.

Each map in the list should have the same keys, but missing keys will yield a null value for that row. All values for a given key should be of the same dtype.

Keyword lists should all be in the same order.

options

Options

examples

Examples

iex> rows = [%{id: 1, name: "José"}, %{id: 2, name: "Christopher"}, %{id: 3, name: "Cristine"}]
iex> Explorer.DataFrame.from_rows(rows)
#Explorer.DataFrame<
  [rows: 3, columns: 2]
  id integer [1, 2, 3]
  name string ["José", "Christopher", "Cristine"]
>

iex> rows = [[id: 1, name: "José"], [id: 2, name: "Christopher"], [id: 3, name: "Cristine"]]
iex> Explorer.DataFrame.from_rows(rows)
#Explorer.DataFrame<
  [rows: 3, columns: 2]
  id integer [1, 2, 3]
  name string ["José", "Christopher", "Cristine"]
>

With a list of maps, missing keys will yield a null value.

iex> rows = [%{id: 1, name: "José", date: ~D[2001-01-01]}, %{id: 2, date: ~D[1993-01-01]}, %{id: 3, name: "Cristine"}]
iex> Explorer.DataFrame.from_rows(rows)
#Explorer.DataFrame<
  [rows: 3, columns: 3]
  date date [2001-01-01, 1993-01-01, nil]
  id integer [1, 2, 3]
  name string ["José", nil, "Cristine"]
>
@spec group_by(df :: t(), groups_or_group :: [String.t()] | String.t()) :: t()

Group the dataframe by one or more variables.

When the dataframe has grouping variables, operations are performed per group. Explorer.DataFrame.ungroup/2 removes grouping.

examples

Examples

You can group by a single variable:

iex> df = Explorer.Datasets.fossil_fuels()
iex> Explorer.DataFrame.group_by(df, "country")
#Explorer.DataFrame<
  [rows: 1094, columns: 10, groups: ["country"]]
  year integer [2010, 2010, 2010, 2010, 2010, "..."]
  country string ["AFGHANISTAN", "ALBANIA", "ALGERIA", "ANDORRA", "ANGOLA", "..."]
  total integer [2308, 1254, 32500, 141, 7924, "..."]
  solid_fuel integer [627, 117, 332, 0, 0, "..."]
  liquid_fuel integer [1601, 953, 12381, 141, 3649, "..."]
  gas_fuel integer [74, 7, 14565, 0, 374, "..."]
  cement integer [5, 177, 2598, 0, 204, "..."]
  gas_flaring integer [0, 0, 2623, 0, 3697, "..."]
  per_capita float [0.08, 0.43, 0.9, 1.68, 0.37, "..."]
  bunker_fuels integer [9, 7, 663, 0, 321, "..."]
>

Or you can group by multiple:

iex> df = Explorer.Datasets.fossil_fuels()
iex> Explorer.DataFrame.group_by(df, ["country", "year"])
#Explorer.DataFrame<
  [rows: 1094, columns: 10, groups: ["country", "year"]]
  year integer [2010, 2010, 2010, 2010, 2010, "..."]
  country string ["AFGHANISTAN", "ALBANIA", "ALGERIA", "ANDORRA", "ANGOLA", "..."]
  total integer [2308, 1254, 32500, 141, 7924, "..."]
  solid_fuel integer [627, 117, 332, 0, 0, "..."]
  liquid_fuel integer [1601, 953, 12381, 141, 3649, "..."]
  gas_fuel integer [74, 7, 14565, 0, 374, "..."]
  cement integer [5, 177, 2598, 0, 204, "..."]
  gas_flaring integer [0, 0, 2623, 0, 3697, "..."]
  per_capita float [0.08, 0.43, 0.9, 1.68, 0.37, "..."]
  bunker_fuels integer [9, 7, 663, 0, 321, "..."]
>
@spec groups(df :: t()) :: [String.t()]

Returns the groups of a dataframe.

examples

Examples

iex> df = Explorer.Datasets.fossil_fuels()
iex> df = Explorer.DataFrame.group_by(df, "country")
iex> Explorer.DataFrame.groups(df)
["country"]
@spec head(df :: t(), nrows :: integer()) :: t()

Returns the first n rows of the dataframe.

examples

Examples

iex> df = Explorer.Datasets.fossil_fuels()
iex> Explorer.DataFrame.head(df)
#Explorer.DataFrame<
  [rows: 5, columns: 10]
  year integer [2010, 2010, 2010, 2010, 2010]
  country string ["AFGHANISTAN", "ALBANIA", "ALGERIA", "ANDORRA", "ANGOLA"]
  total integer [2308, 1254, 32500, 141, 7924]
  solid_fuel integer [627, 117, 332, 0, 0]
  liquid_fuel integer [1601, 953, 12381, 141, 3649]
  gas_fuel integer [74, 7, 14565, 0, 374]
  cement integer [5, 177, 2598, 0, 204]
  gas_flaring integer [0, 0, 2623, 0, 3697]
  per_capita float [0.08, 0.43, 0.9, 1.68, 0.37]
  bunker_fuels integer [9, 7, 663, 0, 321]
>
Link to this function

join(left, right, opts \\ [])

View Source
@spec join(left :: t(), right :: t(), opts :: Keyword.t()) :: t()

Join two tables.

join-types

Join types

  • inner - Returns all rows from left where there are matching values in right, and all columns from left and right.
  • left - Returns all rows from left and all columns from left and right. Rows in left with no match in right will have nil values in the new columns.
  • right - Returns all rows from right and all columns from left and right. Rows in right with no match in left will have nil values in the new columns.
  • outer - Returns all rows and all columns from both left and right. Where there are not matching values, returns nil for the one missing.
  • cross - Also known as a cartesian join. Returns all combinations of left and right. Can be very computationally expensive.

options

Options

  • on - The columns to join on. Defaults to overlapping columns. Does not apply to cross join.
  • how - One of the join types (as an atom) described above. Defaults to :inner.

examples

Examples

Inner join:

iex> left = Explorer.DataFrame.from_columns(a: [1, 2, 3], b: ["a", "b", "c"])
iex> right = Explorer.DataFrame.from_columns(a: [1, 2, 2], c: ["d", "e", "f"])
iex> Explorer.DataFrame.join(left, right)
#Explorer.DataFrame<
  [rows: 3, columns: 3]
  a integer [1, 2, 2]
  b string ["a", "b", "b"]
  c string ["d", "e", "f"]
>

Left join:

iex> left = Explorer.DataFrame.from_columns(a: [1, 2, 3], b: ["a", "b", "c"])
iex> right = Explorer.DataFrame.from_columns(a: [1, 2, 2], c: ["d", "e", "f"])
iex> Explorer.DataFrame.join(left, right, how: :left)
#Explorer.DataFrame<
  [rows: 4, columns: 3]
  a integer [1, 2, 2, 3]
  b string ["a", "b", "b", "c"]
  c string ["d", "e", "f", nil]
>

Right join:

iex> left = Explorer.DataFrame.from_columns(a: [1, 2, 3], b: ["a", "b", "c"])
iex> right = Explorer.DataFrame.from_columns(a: [1, 2, 4], c: ["d", "e", "f"])
iex> Explorer.DataFrame.join(left, right, how: :right)
#Explorer.DataFrame<
  [rows: 3, columns: 3]
  a integer [1, 2, 4]
  c string ["d", "e", "f"]
  b string ["a", "b", nil]
>

Outer join:

iex> left = Explorer.DataFrame.from_columns(a: [1, 2, 3], b: ["a", "b", "c"])
iex> right = Explorer.DataFrame.from_columns(a: [1, 2, 4], c: ["d", "e", "f"])
iex> Explorer.DataFrame.join(left, right, how: :outer)
#Explorer.DataFrame<
  [rows: 4, columns: 3]
  a integer [1, 2, 4, 3]
  b string ["a", "b", nil, "c"]
  c string ["d", "e", "f", nil]
>

Cross join:

iex> left = Explorer.DataFrame.from_columns(a: [1, 2, 3], b: ["a", "b", "c"])
iex> right = Explorer.DataFrame.from_columns(a: [1, 2, 4], c: ["d", "e", "f"])
iex> Explorer.DataFrame.join(left, right, how: :cross)
#Explorer.DataFrame<
  [rows: 9, columns: 4]
  a integer [1, 1, 1, 2, 2, "..."]
  b string ["a", "a", "a", "b", "b", "..."]
  a_right integer [1, 2, 4, 1, 2, "..."]
  c string ["d", "e", "f", "d", "e", "..."]
>

Inner join with different names:

iex> left = Explorer.DataFrame.from_columns(a: [1, 2, 3], b: ["a", "b", "c"])
iex> right = Explorer.DataFrame.from_columns(d: [1, 2, 2], c: ["d", "e", "f"])
iex> Explorer.DataFrame.join(left, right, on: [{"a", "d"}])
#Explorer.DataFrame<
  [rows: 3, columns: 3]
  a integer [1, 2, 2]
  b string ["a", "b", "b"]
  c string ["d", "e", "f"]
>
Link to this function

mutate(df, with_columns)

View Source
@spec mutate(df :: t(), with_columns :: map() | Keyword.t()) :: t()

Creates and modifies columns.

Columns are added as keyword list arguments. New variables overwrite existing variables of the same name. Column names are coerced from atoms to strings.

examples

Examples

You can pass in a list directly as a new column:

iex> df = Explorer.DataFrame.from_columns(a: ["a", "b", "c"], b: [1, 2, 3])
iex> Explorer.DataFrame.mutate(df, c: [4, 5, 6])
#Explorer.DataFrame<
  [rows: 3, columns: 3]
  a string ["a", "b", "c"]
  b integer [1, 2, 3]
  c integer [4, 5, 6]
>

Or you can pass in a series:

iex> df = Explorer.DataFrame.from_columns(a: ["a", "b", "c"], b: [1, 2, 3])
iex> s = Explorer.Series.from_list([4, 5, 6])
iex> Explorer.DataFrame.mutate(df, c: s)
#Explorer.DataFrame<
  [rows: 3, columns: 3]
  a string ["a", "b", "c"]
  b integer [1, 2, 3]
  c integer [4, 5, 6]
>

Or you can invoke a callback on the dataframe:

iex> df = Explorer.DataFrame.from_columns(a: [4, 5, 6], b: [1, 2, 3])
iex> Explorer.DataFrame.mutate(df, c: &Explorer.Series.add(&1["a"], &1["b"]))
#Explorer.DataFrame<
  [rows: 3, columns: 3]
  a integer [4, 5, 6]
  b integer [1, 2, 3]
  c integer [5, 7, 9]
>

You can overwrite existing columns:

iex> df = Explorer.DataFrame.from_columns(a: ["a", "b", "c"], b: [1, 2, 3])
iex> Explorer.DataFrame.mutate(df, a: [4, 5, 6])
#Explorer.DataFrame<
  [rows: 3, columns: 2]
  a integer [4, 5, 6]
  b integer [1, 2, 3]
>

Scalar values are repeated to fill the series:

iex> df = Explorer.DataFrame.from_columns(a: ["a", "b", "c"], b: [1, 2, 3])
iex> Explorer.DataFrame.mutate(df, a: 4)
#Explorer.DataFrame<
  [rows: 3, columns: 2]
  a integer [4, 4, 4]
  b integer [1, 2, 3]
>

Including when a callback returns a scalar:

iex> df = Explorer.DataFrame.from_columns(a: ["a", "b", "c"], b: [1, 2, 3])
iex> Explorer.DataFrame.mutate(df, a: &Explorer.Series.max(&1["b"]))
#Explorer.DataFrame<
  [rows: 3, columns: 2]
  a integer [3, 3, 3]
  b integer [1, 2, 3]
>

Alternatively, all of the above works with a map instead of a keyword list:

iex> df = Explorer.DataFrame.from_columns(a: ["a", "b", "c"], b: [1, 2, 3])
iex> Explorer.DataFrame.mutate(df, %{"c" => [4, 5, 6]})
#Explorer.DataFrame<
  [rows: 3, columns: 3]
  a string ["a", "b", "c"]
  b integer [1, 2, 3]
  c integer [4, 5, 6]
>
@spec n_cols(df :: t()) :: integer()

Returns the number of columns in the dataframe.

examples

Examples

iex> df = Explorer.Datasets.fossil_fuels()
iex> Explorer.DataFrame.n_cols(df)
10
@spec n_rows(df :: t()) :: integer()

Returns the number of rows in the dataframe.

examples

Examples

iex> df = Explorer.Datasets.fossil_fuels()
iex> Explorer.DataFrame.n_rows(df)
1094
@spec names(df :: t()) :: [String.t()]

Gets the names of the dataframe columns.

examples

Examples

iex> df = Explorer.DataFrame.from_columns(floats: [1.0, 2.0], ints: [1, 2])
iex> Explorer.DataFrame.names(df)
["floats", "ints"]
Link to this function

pivot_longer(df, columns, opts \\ [])

View Source
@spec pivot_longer(
  df :: t(),
  columns :: [String.t()] | function(),
  opts :: Keyword.t()
) :: t()

Pivot data from wide to long.

Explorer.DataFrame.pivot_longer/3 "lengthens" data, increasing the number of rows and decreasing the number of columns. The inverse transformation is Explorer.DataFrame.pivot_wider/4.

The second argument (columns) can be either an array of column names to use or a filter callback on the dataframe's names.

value_cols must all have the same dtype.

options

Options

  • value_cols - Columns to use for values. May be a filter callback on the dataframe's column names. Defaults to an empty list, using all variables except the columns to pivot.
  • names_to - A string specifying the name of the column to create from the data stored in the column names of the dataframe. Defaults to "variable".
  • values_to - A string specifying the name of the column to create from the data stored in series element values. Defaults to "value".

examples

Examples

iex> df = Explorer.Datasets.fossil_fuels()
iex> Explorer.DataFrame.pivot_longer(df, ["year", "country"], value_cols: &String.ends_with?(&1, "fuel"))
#Explorer.DataFrame<
  [rows: 3282, columns: 4]
  year integer [2010, 2010, 2010, 2010, 2010, "..."]
  country string ["AFGHANISTAN", "ALBANIA", "ALGERIA", "ANDORRA", "ANGOLA", "..."]
  variable string ["solid_fuel", "solid_fuel", "solid_fuel", "solid_fuel", "solid_fuel", "..."]
  value integer [627, 117, 332, 0, 0, "..."]
>

iex> df = Explorer.Datasets.fossil_fuels()
iex> Explorer.DataFrame.pivot_longer(df, ["year", "country"], value_cols: ["total"])
#Explorer.DataFrame<
  [rows: 1094, columns: 4]
  year integer [2010, 2010, 2010, 2010, 2010, "..."]
  country string ["AFGHANISTAN", "ALBANIA", "ALGERIA", "ANDORRA", "ANGOLA", "..."]
  variable string ["total", "total", "total", "total", "total", "..."]
  value integer [2308, 1254, 32500, 141, 7924, "..."]
>
Link to this function

pivot_wider(df, names_from, values_from, opts \\ [])

View Source
@spec pivot_wider(
  df :: t(),
  names_from :: String.t(),
  values_from :: String.t(),
  opts :: Keyword.t()
) :: t()

Pivot data from long to wide.

Explorer.DataFrame.pivot_wider/4 "widens" data, increasing the number of columns and decreasing the number of rows. The inverse transformation is Explorer.DataFrame.pivot_longer/3.

Due to a restriction upstream, values_from must be a numeric type.

options

Options

  • id_cols - A set of columns that uniquely identifies each observation. Defaults to all columns in data except for the columns specified in names_from and values_from. Typically used when you have redundant variables, i.e. variables whose values are perfectly correlated with existing variables. May accept a filter callback or list of column names.
  • names_prefix - String added to the start of every variable name. This is particularly useful if names_from is a numeric vector and you want to create syntactic variable names.

examples

Examples

iex> df = Explorer.DataFrame.from_columns(id: [1, 1], variable: ["a", "b"], value: [1, 2])
iex> Explorer.DataFrame.pivot_wider(df, "variable", "value")
#Explorer.DataFrame<
  [rows: 1, columns: 3]
  id integer [1]
  a integer [1]
  b integer [2]
>
@spec pull(df :: t(), column :: String.t() | non_neg_integer()) :: Explorer.Series.t()

Extracts a single column as a series.

examples

Examples

iex> df = Explorer.Datasets.fossil_fuels()
iex> Explorer.DataFrame.pull(df, "total")
#Explorer.Series<
  integer[1094]
  [2308, 1254, 32500, 141, 7924, 41, 143, 51246, 1150, 684, 106589, 18408, 8366, 451, 7981, 16345, 403, 17192, 30222, 147, 1388, 166, 133, 5802, 1278, 114468, 47, 2237, 12030, 535, 58, 1367, 145806, 152, 152, 72, 141, 19703, 2393248, 20773, 44, 540, 19, 2064, 1900, 5501, 10465, 2102, 30428, 18122, ...]
>

iex> df = Explorer.Datasets.fossil_fuels()
iex> Explorer.DataFrame.pull(df, 2)
#Explorer.Series<
  integer[1094]
  [2308, 1254, 32500, 141, 7924, 41, 143, 51246, 1150, 684, 106589, 18408, 8366, 451, 7981, 16345, 403, 17192, 30222, 147, 1388, 166, 133, 5802, 1278, 114468, 47, 2237, 12030, 535, 58, 1367, 145806, 152, 152, 72, 141, 19703, 2393248, 20773, 44, 540, 19, 2064, 1900, 5501, 10465, 2102, 30428, 18122, ...]
>
Link to this function

read_csv!(filename, opts \\ [])

View Source
@spec read_csv!(filename :: String.t(), opts :: Keyword.t()) :: t()

Similar to read_csv/2 but raises if there is a problem reading the CSV.

Link to this function

read_csv(filename, opts \\ [])

View Source
@spec read_csv(filename :: String.t(), opts :: Keyword.t()) ::
  {:ok, t()} | {:error, term()}

Reads a delimited file into a dataframe.

options

Options

  • delimiter - A single character used to separate fields within a record. (default: ",")
  • dtypes - A list of {"column_name", dtype} tuples. Uses column names as read, not as defined in options. If nil, dtypes are imputed from the first 1000 rows. (default: nil)
  • header? - Does the file have a header of column names as the first row or not? (default: true)
  • max_rows - Maximum number of lines to read. (default: Inf)
  • names - A list of column names. Must match the width of the dataframe. (default: nil)
  • null_character - The string that should be interpreted as a nil value. (default: "NA")
  • skip_rows - The number of lines to skip at the beginning of the file. (default: 0)
  • with_columns - A list of column names to keep. If present, only these columns are read into the dataframe. (default: nil)
  • infer_schema_length Maximum number of rows read for schema inference. Setting this to nil will do a full table scan and will be slow (default: 1000).
  • parse_dates - Automatically try to parse dates/ datetimes and time. If parsing fails, columns remain of dtype [DataType::Utf8]
Link to this function

read_ipc!(filename, opts \\ [])

View Source
@spec read_ipc!(filename :: String.t(), opts :: Keyword.t()) :: t()

Similar to read_ipc/2 but raises if there is a problem reading the IPC file.

Link to this function

read_ipc(filename, opts \\ [])

View Source

Reads a IPC file into a dataframe.

options

Options

  • columns - List with name of columns to be selected. Defaults to all columns.
  • projection - List with the index of columns to be selected. Defaults to all columns.
Link to this function

read_ndjson(filename, opts \\ [])

View Source
@spec read_ndjson(filename :: String.t(), opts :: Keyword.t()) ::
  {:ok, t()} | {:error, term()}

Read a file of JSON objects or lists separated by new lines

options

Options

  • with_batch_size - Sets the batch size for reading rows. This value may have significant impact in performance, so adjust it for your needs (default: 1000).

  • infer_schema_length - Maximum number of rows read for schema inference. Setting this to nil will do a full table scan and will be slow (default: 1000).

@spec read_parquet(filename :: String.t()) :: {:ok, t()} | {:error, term()}

Reads a parquet file into a dataframe.

@spec rename(df :: t(), names :: [String.t() | atom()] | map()) :: t()

Renames columns.

To apply a function to a subset of columns, see rename_with/3.

examples

Examples

You can pass in a list of new names:

iex> df = Explorer.DataFrame.from_columns(a: ["a", "b", "a"], b: [1, 3, 1])
iex> Explorer.DataFrame.rename(df, ["c", "d"])
#Explorer.DataFrame<
  [rows: 3, columns: 2]
  c string ["a", "b", "a"]
  d integer [1, 3, 1]
>

Or you can rename individual columns using keyword args:

iex> df = Explorer.DataFrame.from_columns(a: ["a", "b", "a"], b: [1, 3, 1])
iex> Explorer.DataFrame.rename(df, a: "first")
#Explorer.DataFrame<
  [rows: 3, columns: 2]
  first string ["a", "b", "a"]
  b integer [1, 3, 1]
>

Or you can rename individual columns using a map:

iex> df = Explorer.DataFrame.from_columns(a: ["a", "b", "a"], b: [1, 3, 1])
iex> Explorer.DataFrame.rename(df, %{"a" => "first"})
#Explorer.DataFrame<
  [rows: 3, columns: 2]
  first string ["a", "b", "a"]
  b integer [1, 3, 1]
>

Or if you want to use a function:

iex> df = Explorer.DataFrame.from_columns(a: ["a", "b", "a"], b: [1, 3, 1])
iex> Explorer.DataFrame.rename(df, &(&1 <> "_test"))
#Explorer.DataFrame<
  [rows: 3, columns: 2]
  a_test string ["a", "b", "a"]
  b_test integer [1, 3, 1]
>
Link to this function

rename_with(df, callback, columns \\ [])

View Source
@spec rename_with(df :: t(), callback :: function(), columns :: list() | function()) ::
  t()

Renames columns with a function.

examples

Examples

If no columns are specified, it will apply the function to all column names:

iex> df = Explorer.Datasets.fossil_fuels()
iex> Explorer.DataFrame.rename_with(df, &String.upcase/1)
#Explorer.DataFrame<
  [rows: 1094, columns: 10]
  YEAR integer [2010, 2010, 2010, 2010, 2010, "..."]
  COUNTRY string ["AFGHANISTAN", "ALBANIA", "ALGERIA", "ANDORRA", "ANGOLA", "..."]
  TOTAL integer [2308, 1254, 32500, 141, 7924, "..."]
  SOLID_FUEL integer [627, 117, 332, 0, 0, "..."]
  LIQUID_FUEL integer [1601, 953, 12381, 141, 3649, "..."]
  GAS_FUEL integer [74, 7, 14565, 0, 374, "..."]
  CEMENT integer [5, 177, 2598, 0, 204, "..."]
  GAS_FLARING integer [0, 0, 2623, 0, 3697, "..."]
  PER_CAPITA float [0.08, 0.43, 0.9, 1.68, 0.37, "..."]
  BUNKER_FUELS integer [9, 7, 663, 0, 321, "..."]
>

A callback can be used to filter the column names that will be renamed, similarly to select/3:

iex> df = Explorer.Datasets.fossil_fuels()
iex> Explorer.DataFrame.rename_with(df, &String.trim_trailing(&1, "_fuel"), &String.ends_with?(&1, "_fuel"))
#Explorer.DataFrame<
  [rows: 1094, columns: 10]
  year integer [2010, 2010, 2010, 2010, 2010, "..."]
  country string ["AFGHANISTAN", "ALBANIA", "ALGERIA", "ANDORRA", "ANGOLA", "..."]
  total integer [2308, 1254, 32500, 141, 7924, "..."]
  solid integer [627, 117, 332, 0, 0, "..."]
  liquid integer [1601, 953, 12381, 141, 3649, "..."]
  gas integer [74, 7, 14565, 0, 374, "..."]
  cement integer [5, 177, 2598, 0, 204, "..."]
  gas_flaring integer [0, 0, 2623, 0, 3697, "..."]
  per_capita float [0.08, 0.43, 0.9, 1.68, 0.37, "..."]
  bunker_fuels integer [9, 7, 663, 0, 321, "..."]
>

Or you can just pass in the list of column names you'd like to apply the function to:

iex> df = Explorer.Datasets.fossil_fuels()
iex> Explorer.DataFrame.rename_with(df, &String.upcase/1, ["total", "cement"])
#Explorer.DataFrame<
  [rows: 1094, columns: 10]
  year integer [2010, 2010, 2010, 2010, 2010, "..."]
  country string ["AFGHANISTAN", "ALBANIA", "ALGERIA", "ANDORRA", "ANGOLA", "..."]
  TOTAL integer [2308, 1254, 32500, 141, 7924, "..."]
  solid_fuel integer [627, 117, 332, 0, 0, "..."]
  liquid_fuel integer [1601, 953, 12381, 141, 3649, "..."]
  gas_fuel integer [74, 7, 14565, 0, 374, "..."]
  CEMENT integer [5, 177, 2598, 0, 204, "..."]
  gas_flaring integer [0, 0, 2623, 0, 3697, "..."]
  per_capita float [0.08, 0.43, 0.9, 1.68, 0.37, "..."]
  bunker_fuels integer [9, 7, 663, 0, 321, "..."]
>
Link to this function

sample(df, n_or_frac, opts \\ [])

View Source
@spec sample(df :: t(), n_or_frac :: number(), opts :: Keyword.t()) :: t()

Sample rows from a dataframe.

If given an integer as the second argument, it will return N samples. If given a float, it will return that proportion of the series.

Can sample with or without replacement.

options

Options

  • with_replacement? - If set to true, each sample will be independent and therefore values may repeat. Required to be true for n greater then the number of rows in the dataframe or frac > 1.0. (default: false)
  • seed - An integer to be used as a random seed. If nil, a random value between 1 and 1e12 will be used. (default: nil)

examples

Examples

You can sample N rows:

iex> df = Explorer.Datasets.fossil_fuels()
iex> Explorer.DataFrame.sample(df, 3, seed: 100)
#Explorer.DataFrame<
  [rows: 3, columns: 10]
  year integer [2012, 2012, 2013]
  country string ["ZIMBABWE", "NICARAGUA", "NIGER"]
  total integer [2125, 1260, 529]
  solid_fuel integer [917, 0, 93]
  liquid_fuel integer [1006, 1176, 432]
  gas_fuel integer [0, 0, 0]
  cement integer [201, 84, 4]
  gas_flaring integer [0, 0, 0]
  per_capita float [0.15, 0.21, 0.03]
  bunker_fuels integer [9, 18, 19]
>

Or you can sample a proportion of rows:

iex> df = Explorer.Datasets.fossil_fuels()
iex> Explorer.DataFrame.sample(df, 0.03, seed: 100)
#Explorer.DataFrame<
  [rows: 33, columns: 10]
  year integer [2013, 2012, 2013, 2012, 2010, "..."]
  country string ["BAHAMAS", "POLAND", "SLOVAKIA", "MOZAMBIQUE", "OMAN", "..."]
  total integer [764, 81792, 9024, 851, 12931, "..."]
  solid_fuel integer [1, 53724, 3657, 11, 0, "..."]
  liquid_fuel integer [763, 17353, 2090, 632, 2331, "..."]
  gas_fuel integer [0, 8544, 2847, 47, 9309, "..."]
  cement integer [0, 2165, 424, 161, 612, "..."]
  gas_flaring integer [0, 6, 7, 0, 679, "..."]
  per_capita float [2.02, 2.12, 1.67, 0.03, 4.39, "..."]
  bunker_fuels integer [167, 573, 34, 56, 1342, "..."]
>
Link to this function

select(df, columns, keep_or_drop \\ :keep)

View Source
@spec select(
  df :: t(),
  columns :: [String.t() | non_neg_integer()],
  keep_or_drop :: :keep | :drop
) :: t()
@spec select(
  df :: t(),
  callback :: function(),
  keep_or_drop :: :keep | :drop
) :: t()

Selects a subset of columns by name.

Can optionally return all but the named columns if :drop is passed as the last argument.

examples

Examples

You can select columns with a list of names:

iex> df = Explorer.DataFrame.from_columns(a: ["a", "b", "c"], b: [1, 2, 3])
iex> Explorer.DataFrame.select(df, ["a"])
#Explorer.DataFrame<
  [rows: 3, columns: 1]
  a string ["a", "b", "c"]
>

You can also use a range or a list of integers:

iex> df = Explorer.DataFrame.from_columns(a: ["a", "b", "c"], b: [1, 2, 3], c: [4, 5, 6])
iex> Explorer.DataFrame.select(df, [0, 1])
#Explorer.DataFrame<
  [rows: 3, columns: 2]
  a string ["a", "b", "c"]
  b integer [1, 2, 3]
>

iex> df = Explorer.DataFrame.from_columns(a: ["a", "b", "c"], b: [1, 2, 3], c: [4, 5, 6])
iex> Explorer.DataFrame.select(df, 0..1)
#Explorer.DataFrame<
  [rows: 3, columns: 2]
  a string ["a", "b", "c"]
  b integer [1, 2, 3]
>

Or you can use a callback function that takes the dataframe's names as its first argument:

iex> df = Explorer.DataFrame.from_columns(a: ["a", "b", "c"], b: [1, 2, 3])
iex> Explorer.DataFrame.select(df, &String.starts_with?(&1, "b"))
#Explorer.DataFrame<
  [rows: 3, columns: 1]
  b integer [1, 2, 3]
>

If you pass :drop as the third argument, it will return all but the named columns:

iex> df = Explorer.DataFrame.from_columns(a: ["a", "b", "c"], b: [1, 2, 3])
iex> Explorer.DataFrame.select(df, ["b"], :drop)
#Explorer.DataFrame<
  [rows: 3, columns: 1]
  a string ["a", "b", "c"]
>

iex> df = Explorer.DataFrame.from_columns(a: ["a", "b", "c"], b: [1, 2, 3], c: [4, 5, 6])
iex> Explorer.DataFrame.select(df, ["a", "b"], :drop)
#Explorer.DataFrame<
  [rows: 3, columns: 1]
  c integer [4, 5, 6]
>
@spec shape(df :: t()) :: {integer(), integer()}

Gets the shape of the dataframe as a {height, width} tuple.

examples

Examples

iex> df = Explorer.DataFrame.from_columns(floats: [1.0, 2.0, 3.0], ints: [1, 2, 3])
iex> Explorer.DataFrame.shape(df)
{3, 2}
Link to this function

slice(df, offset, length)

View Source

Subset a continuous set of rows.

examples

Examples

iex> df = Explorer.Datasets.fossil_fuels()
iex> Explorer.DataFrame.slice(df, 1, 2)
#Explorer.DataFrame<
  [rows: 2, columns: 10]
  year integer [2010, 2010]
  country string ["ALBANIA", "ALGERIA"]
  total integer [1254, 32500]
  solid_fuel integer [117, 332]
  liquid_fuel integer [953, 12381]
  gas_fuel integer [7, 14565]
  cement integer [177, 2598]
  gas_flaring integer [0, 2623]
  per_capita float [0.43, 0.9]
  bunker_fuels integer [7, 663]
>

Negative offsets count from the end of the series:

iex> df = Explorer.Datasets.fossil_fuels()
iex> Explorer.DataFrame.slice(df, -10, 2)
#Explorer.DataFrame<
  [rows: 2, columns: 10]
  year integer [2014, 2014]
  country string ["UNITED STATES OF AMERICA", "URUGUAY"]
  total integer [1432855, 1840]
  solid_fuel integer [450047, 2]
  liquid_fuel integer [576531, 1700]
  gas_fuel integer [390719, 25]
  cement integer [11314, 112]
  gas_flaring integer [4244, 0]
  per_capita float [4.43, 0.54]
  bunker_fuels integer [30722, 251]
>

If the length would run past the end of the dataframe, the result may be shorter than the length:

iex> df = Explorer.Datasets.fossil_fuels()
iex> Explorer.DataFrame.slice(df, -10, 20)
#Explorer.DataFrame<
  [rows: 10, columns: 10]
  year integer [2014, 2014, 2014, 2014, 2014, "..."]
  country string ["UNITED STATES OF AMERICA", "URUGUAY", "UZBEKISTAN", "VANUATU", "VENEZUELA", "..."]
  total integer [1432855, 1840, 28692, 42, 50510, "..."]
  solid_fuel integer [450047, 2, 1677, 0, 204, "..."]
  liquid_fuel integer [576531, 1700, 2086, 42, 28445, "..."]
  gas_fuel integer [390719, 25, 23929, 0, 12731, "..."]
  cement integer [11314, 112, 1000, 0, 1088, "..."]
  gas_flaring integer [4244, 0, 0, 0, 8042, "..."]
  per_capita float [4.43, 0.54, 0.97, 0.16, 1.65, "..."]
  bunker_fuels integer [30722, 251, 0, 10, 1256, "..."]
>
Link to this function

summarise(df, with_columns)

View Source
@spec summarise(df :: t(), with_columns :: Keyword.t() | map()) :: t()

Summarise each group to a single row.

Implicitly ungroups.

supported-operations

Supported operations

The following aggregations may be performed:

examples

Examples

iex> df = Explorer.Datasets.fossil_fuels()
iex> df |> Explorer.DataFrame.group_by("year") |> Explorer.DataFrame.summarise(total: [:max, :min], country: [:n_unique])
#Explorer.DataFrame<
  [rows: 5, columns: 4]
  year integer [2010, 2011, 2012, 2013, 2014]
  country_n_unique integer [217, 217, 220, 220, 220]
  total_max integer [2393248, 2654360, 2734817, 2797384, 2806634]
  total_min integer [1, 2, 2, 2, 3]
>

Display the DataFrame in a tabular fashion.

examples

Examples

df = Explorer.Datasets.iris() Explorer.DataFrame.table(df)

@spec tail(df :: t(), nrows :: integer()) :: t()

Returns the last n rows of the dataframe.

examples

Examples

iex> df = Explorer.Datasets.fossil_fuels()
iex> Explorer.DataFrame.tail(df)
#Explorer.DataFrame<
  [rows: 5, columns: 10]
  year integer [2014, 2014, 2014, 2014, 2014]
  country string ["VIET NAM", "WALLIS AND FUTUNA ISLANDS", "YEMEN", "ZAMBIA", "ZIMBABWE"]
  total integer [45517, 6, 6190, 1228, 3278]
  solid_fuel integer [19246, 0, 137, 132, 2097]
  liquid_fuel integer [12694, 6, 5090, 797, 1005]
  gas_fuel integer [5349, 0, 581, 0, 0]
  cement integer [8229, 0, 381, 299, 177]
  gas_flaring integer [0, 0, 0, 0, 0]
  per_capita float [0.49, 0.44, 0.24, 0.08, 0.22]
  bunker_fuels integer [761, 1, 153, 33, 9]
>

Subset rows with a list of indices.

examples

Examples

iex> df = Explorer.DataFrame.from_columns(a: [1, 2, 3], b: ["a", "b", "c"])
iex> Explorer.DataFrame.take(df, [0, 2])
#Explorer.DataFrame<
  [rows: 2, columns: 2]
  a integer [1, 3]
  b string ["a", "c"]
>
Link to this function

to_binary(df, opts \\ [])

View Source
@spec to_binary(df :: t(), opts :: Keyword.t()) :: String.t()

Writes a dataframe to a binary representation of a delimited file.

options

Options

  • header? - Should the column names be written as the first line of the file? (default: true)
  • delimiter - A single character used to separate fields within a record. (default: ",")

examples

Examples

iex> df = Explorer.Datasets.fossil_fuels()
iex> df |> Explorer.DataFrame.head() |> Explorer.DataFrame.to_binary()
"year,country,total,solid_fuel,liquid_fuel,gas_fuel,cement,gas_flaring,per_capita,bunker_fuels\n2010,AFGHANISTAN,2308,627,1601,74,5,0,0.08,9\n2010,ALBANIA,1254,117,953,7,177,0,0.43,7\n2010,ALGERIA,32500,332,12381,14565,2598,2623,0.9,663\n2010,ANDORRA,141,0,141,0,0,0,1.68,0\n2010,ANGOLA,7924,0,3649,374,204,3697,0.37,321\n"
@spec to_map(df :: t(), Keyword.t()) :: map()

Converts a dataframe to a map.

By default, the constituent series of the dataframe are converted to Elixir lists.

options

Options

  • :convert_series - Convert the series to lists (default: true)
  • :atom_keys - Configure if the resultant map should have atom keys. (default: false)

examples

Examples

iex> df = Explorer.DataFrame.from_columns(floats: [1.0, 2.0], ints: [1, nil])
iex> Explorer.DataFrame.to_map(df)
%{"floats" => [1.0, 2.0], "ints" => [1, nil]}

iex> df = Explorer.DataFrame.from_columns(floats: [1.0, 2.0], ints: [1, nil])
iex> Explorer.DataFrame.to_map(df, atom_keys: true)
%{floats: [1.0, 2.0], ints: [1, nil]}
Link to this function

ungroup(df, groups \\ [])

View Source
@spec ungroup(df :: t(), groups_or_group :: [String.t()] | String.t()) :: t()

Removes grouping variables.

examples

Examples

iex> df = Explorer.Datasets.fossil_fuels()
iex> df = Explorer.DataFrame.group_by(df, ["country", "year"])
iex> Explorer.DataFrame.ungroup(df, ["country"])
#Explorer.DataFrame<
  [rows: 1094, columns: 10, groups: ["year"]]
  year integer [2010, 2010, 2010, 2010, 2010, "..."]
  country string ["AFGHANISTAN", "ALBANIA", "ALGERIA", "ANDORRA", "ANGOLA", "..."]
  total integer [2308, 1254, 32500, 141, 7924, "..."]
  solid_fuel integer [627, 117, 332, 0, 0, "..."]
  liquid_fuel integer [1601, 953, 12381, 141, 3649, "..."]
  gas_fuel integer [74, 7, 14565, 0, 374, "..."]
  cement integer [5, 177, 2598, 0, 204, "..."]
  gas_flaring integer [0, 0, 2623, 0, 3697, "..."]
  per_capita float [0.08, 0.43, 0.9, 1.68, 0.37, "..."]
  bunker_fuels integer [9, 7, 663, 0, 321, "..."]
>
Link to this function

write_csv!(df, filename, opts \\ [])

View Source
@spec write_csv!(df :: t(), filename :: String.t(), opts :: Keyword.t()) :: String.t()

Similar to write_csv/3 but raises if there is a problem reading the CSV.

Link to this function

write_csv(df, filename, opts \\ [])

View Source
@spec write_csv(df :: t(), filename :: String.t(), opts :: Keyword.t()) ::
  {:ok, String.t()} | {:error, term()}

Writes a dataframe to a delimited file.

options

Options

  • header? - Should the column names be written as the first line of the file? (default: true)
  • delimiter - A single character used to separate fields within a record. (default: ",")
Link to this function

write_ipc(df, filename, opts \\ [])

View Source

Writes a dataframe to a IPC file.

Apache IPC is a language-agnostic columnar data structure that can be used to store data frames. It excels as a format for quickly exchange data between different programming languages.

options

Options

  • compression - Sets the algorithm used to compress the IPC file. It accepts "ZSTD" or "LZ4" compression. (default: nil)
Link to this function

write_ndjson(df, filename)

View Source
@spec write_ndjson(df :: t(), filename :: String.t()) ::
  {:ok, String.t()} | {:error, term()}

Writes a dataframe to a ndjson file.

Link to this function

write_parquet(df, filename)

View Source
@spec write_parquet(df :: t(), filename :: String.t()) ::
  {:ok, String.t()} | {:error, term()}

Writes a dataframe to a parquet file.