View Source XlsxReader (xlsx_reader v0.8.8)
Opens XLSX workbooks and reads its worksheets.
Example
{:ok, package} = XlsxReader.open("test.xlsx")
XlsxReader.sheet_names(package)
# ["Sheet 1", "Sheet 2", "Sheet 3"]
{:ok, rows} = XlsxReader.sheet(package, "Sheet 1")
# [
# ["Date", "Temperature"],
# [~D[2019-11-01], 8.4],
# [~D[2019-11-02], 7.5],
# ...
# ]
Sheet contents
Sheets are loaded on-demand by sheet/3
and sheets/2
.
The sheet contents is returned as a list of lists:
[
["A1", "B1", "C1" | _],
["A2", "B2", "C2" | _],
["A3", "B3", "C3" | _],
| _
]
The behavior of the sheet parser can be customized for each
individual sheet, see sheet/3
.
Cell types
This library takes a best effort approach for determining cell types. In order of priority, the actual type of an XLSX cell value is determined using:
- basic cell properties (e.g. boolean)
- predefined known styles (e.g. default money/date formats)
- introspection of the custom format string associated with the cell
Custom formats supported by default
- percentages
- ISO 8601 date/time (y-m-d)
- US date/time (m/d/y)
- European date/time (d/m/y)
Additional custom formats support
If the spreadsheet you need to process contains some unusual cell formatting, you may provide hints to map format strings to a known cell type.
The hints are given as a list of {matcher, type}
tuples. The matcher is either a
string or regex to match against the custom format string. The supported types are:
:string
:number
:percentage
:date
:time
:date_time
:unsupported
(used for explicitly unsupported styles and formats)
Conversion errors
Cell data which could not be converted using the detected format is returned as the "#ERROR"
placeholder.
Example
[
{"mmm yy", :date},
{~r/mmm? yy hh:mm/, :date_time},
{"[$CHF]0.00", :number}
]
To find out what custom formats are in use in the workbook, you can inspect package.workbook.custom_formats
:
# num_fmt_id => format string
%{
"0" => "General",
"59" => "dd/mm/yyyy",
"60" => "dd/mm/yyyy hh:mm",
"61" => "hh:mm",
"62" => "0.0%",
"63" => "[$CHF]0.00"
}
Summary
Types
Error tuple with message describing the cause of the error
Option to specify the XLSX file source
List of cell values
List of rows
Sheet name
Source for the XLSX file: file system (:path
) or in-memory (:binary
)
Functions
Loads all the sheets in the workbook concurrently.
Opens an XLSX file located on the file system (default) or from memory.
Loads the sheet with the given name (see sheet_names/1
)
Lists the names of the sheets in the package's workbook
Loads all the sheets in the workbook.
Types
@type error() :: {:error, String.t()}
Error tuple with message describing the cause of the error
@type open_option() :: {:exclude_hidden_sheets?, boolean()} | {:source, source()} | {:supported_custom_formats, XlsxReader.Styles.supported_custom_formats()}
Option to specify the XLSX file source
@type row() :: [any()]
List of cell values
@type rows() :: [row()]
List of rows
@type sheet_name() :: String.t()
Sheet name
@type source() :: :path | :binary
Source for the XLSX file: file system (:path
) or in-memory (:binary
)
Functions
Loads all the sheets in the workbook concurrently.
On success, returns {:ok, [{sheet_name, rows}, ...]}
.
When processing files with multiple sheets, async_sheets/3
is ~3x faster than sheets/2
but it comes with a caveat. async_sheets/3
uses Task.async_stream/3
under the hood and thus
runs each concurrent task with a timeout. If you expect your dataset to be of a significant size,
you may want to increase it from the default 10000ms (see "Concurrency options" below).
If the order in which the sheets are returned is not relevant for your application, you can
pass ordered: false
(see "Concurrency options" below) for a modest speed gain.
Filtering options
See sheets/2
.
Sheet options
See sheet/2
.
Concurrency options
max_concurrency
- maximum number of tasks to run at the same time (default:System.schedulers_online/0
)ordered
- maintain order consistent withsheet_names/1
(default:true
)timeout
- maximum duration in milliseconds to process a sheet (default:10_000
)
@spec open(String.t() | binary(), [open_option()]) :: {:ok, XlsxReader.Package.t()} | error()
Opens an XLSX file located on the file system (default) or from memory.
Examples
Opening XLSX file on the file system
{:ok, package} = XlsxReader.open("test.xlsx")
Opening XLSX file from memory
blob = File.read!("test.xlsx")
{:ok, package} = XlsxReader.open(blob, source: :binary)
Options
source
::path
(on the file system, default) or:binary
(in memory)supported_custom_formats
: a list of{regex | string, type}
tuples (see "Additional custom formats support")exclude_hidden_sheets?
: Whether to exclude hidden sheets in the workbook
@spec sheet(XlsxReader.Package.t(), sheet_name(), Keyword.t()) :: {:ok, rows()} | error()
Loads the sheet with the given name (see sheet_names/1
)
Options
type_conversion
- boolean (default:true
)blank_value
- placeholder value for empty cells (default:""
)empty_rows
- include empty rows (default:true
)number_type
- type used for numeric conversion :Integer
,Decimal
orFloat
(default:Float
)skip_row?
: function callback that determines if a row should be skipped. Takes precedence overblank_value
andempty_rows
. Defaults tonil
(keeping the behaviour ofblank_value
andempty_rows
).cell_data_format
: Controls the format of the cell data. Can be:value
(default, returns the cell value only) or:cell
(returns instances ofXlsxReader.Cell
).
The Decimal
type requires the decimal library.
Examples
Skipping rows
When using the skip_row?
callback, rows are ignored in the parser which is more memory efficient.
# Skip all rows for which all the values are either blank or "-"
XlsxReader.sheet(package, "Sheet1", skip_row?: fn row ->
Enum.all?(row, & String.trim(&1) in ["", "-"])
end)
# Skip all rows for which the first column contains the text "disabled"
XlsxReader.sheet(package, "Sheet1", skip_row?: fn [column | _] ->
column == "disabled"
end)
@spec sheet_names(XlsxReader.Package.t()) :: [sheet_name()]
Lists the names of the sheets in the package's workbook
@spec sheets(XlsxReader.Package.t(), Keyword.t()) :: {:ok, [{sheet_name(), rows()}]} | error()
Loads all the sheets in the workbook.
On success, returns {:ok, [{sheet_name, rows}, ...]}
.
Filtering options
only
- include the sheets whose name matches the filterexcept
- exclude the sheets whose name matches the filter
Sheets can filtered by name using:
- a string (e.g.
"Exact Match"
) - a regex (e.g.
~r/Sheet +/
) - a list of string and/or regexes (e.g.
["Parameters", ~r/Sheet [12]/]
)
Sheet options
See sheet/2
.