View Source XlsxReader (xlsx_reader v0.8.6)

Opens XLSX workbooks and reads its worksheets.

Example

{:ok, package} = XlsxReader.open("test.xlsx")

XlsxReader.sheet_names(package)
# ["Sheet 1", "Sheet 2", "Sheet 3"]

{:ok, rows} = XlsxReader.sheet(package, "Sheet 1")
# [
#   ["Date", "Temperature"],
#   [~D[2019-11-01], 8.4],
#   [~D[2019-11-02], 7.5],
#   ...
# ]

Sheet contents

Sheets are loaded on-demand by sheet/3 and sheets/2.

The sheet contents is returned as a list of lists:

[
  ["A1", "B1", "C1" | _],
  ["A2", "B2", "C2" | _],
  ["A3", "B3", "C3" | _],
  | _
]

The behavior of the sheet parser can be customized for each individual sheet, see sheet/3.

Cell types

This library takes a best effort approach for determining cell types. In order of priority, the actual type of an XLSX cell value is determined using:

  1. basic cell properties (e.g. boolean)
  2. predefined known styles (e.g. default money/date formats)
  3. introspection of the custom format string associated with the cell

Custom formats supported by default

  • percentages
  • ISO 8601 date/time (y-m-d)
  • US date/time (m/d/y)
  • European date/time (d/m/y)

Additional custom formats support

If the spreadsheet you need to process contains some unusual cell formatting, you may provide hints to map format strings to a known cell type.

The hints are given as a list of {matcher, type} tuples. The matcher is either a string or regex to match against the custom format string. The supported types are:

  • :string
  • :number
  • :percentage
  • :date
  • :time
  • :date_time
  • :unsupported (used for explicitly unsupported styles and formats)

Conversion errors

Cell data which could not be converted using the detected format is returned as the "#ERROR" placeholder.

Example

[
  {"mmm yy", :date},
  {~r/mmm? yy hh:mm/, :date_time},
  {"[$CHF]0.00", :number}
]

To find out what custom formats are in use in the workbook, you can inspect package.workbook.custom_formats:

# num_fmt_id => format string
%{
  "0" => "General",
  "59" => "dd/mm/yyyy",
  "60" => "dd/mm/yyyy hh:mm",
  "61" => "hh:mm",
  "62" => "0.0%",
  "63" => "[$CHF]0.00"
}

Summary

Types

Error tuple with message describing the cause of the error

Option to specify the XLSX file source

List of cell values

List of rows

Sheet name

Source for the XLSX file: file system (:path) or in-memory (:binary)

Functions

Loads all the sheets in the workbook concurrently.

Opens an XLSX file located on the file system (default) or from memory.

Loads the sheet with the given name (see sheet_names/1)

Lists the names of the sheets in the package's workbook

Loads all the sheets in the workbook.

Types

@type error() :: {:error, String.t()}

Error tuple with message describing the cause of the error

@type open_option() ::
  {:exclude_hidden_sheets?, boolean()}
  | {:source, source()}
  | {:supported_custom_formats, XlsxReader.Styles.supported_custom_formats()}

Option to specify the XLSX file source

@type row() :: [any()]

List of cell values

@type rows() :: [row()]

List of rows

@type sheet_name() :: String.t()

Sheet name

@type source() :: :path | :binary

Source for the XLSX file: file system (:path) or in-memory (:binary)

Functions

Link to this function

async_sheets(package, sheet_options \\ [], task_options \\ [])

View Source

Loads all the sheets in the workbook concurrently.

On success, returns {:ok, [{sheet_name, rows}, ...]}.

When processing files with multiple sheets, async_sheets/3 is ~3x faster than sheets/2 but it comes with a caveat. async_sheets/3 uses Task.async_stream/3 under the hood and thus runs each concurrent task with a timeout. If you expect your dataset to be of a significant size, you may want to increase it from the default 10000ms (see "Concurrency options" below).

If the order in which the sheets are returned is not relevant for your application, you can pass ordered: false (see "Concurrency options" below) for a modest speed gain.

Filtering options

See sheets/2.

Sheet options

See sheet/2.

Concurrency options

  • max_concurrency - maximum number of tasks to run at the same time (default: System.schedulers_online/0)
  • ordered - maintain order consistent with sheet_names/1 (default: true)
  • timeout - maximum duration in milliseconds to process a sheet (default: 10_000)
Link to this function

open(file, options \\ [])

View Source
@spec open(String.t() | binary(), [open_option()]) ::
  {:ok, XlsxReader.Package.t()} | error()

Opens an XLSX file located on the file system (default) or from memory.

Examples

Opening XLSX file on the file system

{:ok, package} = XlsxReader.open("test.xlsx")

Opening XLSX file from memory

blob = File.read!("test.xlsx")

{:ok, package} = XlsxReader.open(blob, source: :binary)

Options

  • source: :path (on the file system, default) or :binary (in memory)
  • supported_custom_formats: a list of {regex | string, type} tuples (see "Additional custom formats support")

  • exclude_hidden_sheets?: Whether to exclude hidden sheets in the workbook
Link to this function

sheet(package, sheet_name, options \\ [])

View Source
@spec sheet(XlsxReader.Package.t(), sheet_name(), Keyword.t()) ::
  {:ok, rows()} | error()

Loads the sheet with the given name (see sheet_names/1)

Options

  • type_conversion - boolean (default: true)
  • blank_value - placeholder value for empty cells (default: "")
  • empty_rows - include empty rows (default: true)
  • number_type - type used for numeric conversion :Integer, Decimal or Float (default: Float)
  • skip_row?: function callback that determines if a row should be skipped. Takes precedence over blank_value and empty_rows. Defaults to nil (keeping the behaviour of blank_value and empty_rows).
  • cell_data_format: Controls the format of the cell data. Can be :value (default, returns the cell value only) or :cell (returns instances of XlsxReader.Cell).

The Decimal type requires the decimal library.

Examples

Skipping rows

When using the skip_row? callback, rows are ignored in the parser which is more memory efficient.

# Skip all rows for which all the values are either blank or "-"
XlsxReader.sheet(package, "Sheet1", skip_row?: fn row ->
  Enum.all?(row, & String.trim(&1) in ["", "-"])
end)

# Skip all rows for which the first column contains the text "disabled"
XlsxReader.sheet(package, "Sheet1", skip_row?: fn [column | _] ->
  column == "disabled"
end)
Link to this function

sheet_filter_option(options, key)

View Source
@spec sheet_names(XlsxReader.Package.t()) :: [sheet_name()]

Lists the names of the sheets in the package's workbook

Link to this function

sheets(package, options \\ [])

View Source
@spec sheets(XlsxReader.Package.t(), Keyword.t()) ::
  {:ok, [{sheet_name(), rows()}]} | error()

Loads all the sheets in the workbook.

On success, returns {:ok, [{sheet_name, rows}, ...]}.

Filtering options

  • only - include the sheets whose name matches the filter
  • except - exclude the sheets whose name matches the filter

Sheets can filtered by name using:

  • a string (e.g. "Exact Match")
  • a regex (e.g. ~r/Sheet +/)
  • a list of string and/or regexes (e.g. ["Parameters", ~r/Sheet [12]/])

Sheet options

See sheet/2.