ExArrow.Table (ex_arrow v0.6.0)

View Source

An Arrow table: a collection of record batches with a shared schema.

A Table is a logical view over one or more ExArrow.RecordBatch instances that share the same schema. Unlike ExArrow.Stream (which is lazy and backed by a native resource), Table is an Elixir-side aggregation of already-materialised batches.

Position in the hierarchy

Schema  Field (metadata)
            
RecordBatch  Array (one per column)
                  
Table  RecordBatch (one or more, shared schema)
                  
Stream  RecordBatch (lazy sequence)

When to use Table vs Stream

  • Use Table when you have all batches in hand (e.g. after collecting a stream) and want a convenient container with schema/1, num_rows/1, and batches/1.
  • Use Stream for lazy consumption from IPC, Flight, ADBC, or Parquet sources.

Summary

Functions

Returns the list of record batches in this table.

Create a Table from a list of record batches.

Returns the total number of rows across all batches in this table.

Returns the schema of this table.

Types

t()

@type t() :: %ExArrow.Table{
  batches: [ExArrow.RecordBatch.t()],
  schema: ExArrow.Schema.t()
}

Functions

batches(table)

@spec batches(t()) :: [ExArrow.RecordBatch.t()]

Returns the list of record batches in this table.

from_batches(batches)

@spec from_batches([ExArrow.RecordBatch.t()]) :: {:ok, t()} | {:error, String.t()}

Create a Table from a list of record batches.

All batches must share the same schema. The schema is taken from the first batch. If the list is empty, returns {:error, message}.

Returns {:ok, table} or {:error, message}.

Examples

{:ok, ipc_bin} = ExArrow.Native.ipc_test_fixture_binary()
{:ok, stream}  = ExArrow.IPC.Reader.from_binary(ipc_bin)
batches = ExArrow.Stream.to_list(stream)
{:ok, table} = ExArrow.Table.from_batches(batches)
ExArrow.Table.num_rows(table)  #=> 2

num_rows(table)

@spec num_rows(t()) :: non_neg_integer()

Returns the total number of rows across all batches in this table.

Examples

{:ok, table} = ExArrow.Table.from_batches(batches)
ExArrow.Table.num_rows(table)  #=> sum of all batch row counts

schema(table)

@spec schema(t()) :: ExArrow.Schema.t()

Returns the schema of this table.