XlsxReader (xlsx_reader v0.8.0) View Source
Opens XLSX workbooks and reads its worksheets.
Example
{:ok, package} = XlsxReader.open("test.xlsx")
XlsxReader.sheet_names(package)
# ["Sheet 1", "Sheet 2", "Sheet 3"]
{:ok, rows} = XlsxReader.sheet(package, "Sheet 1")
# [
# ["Date", "Temperature"],
# [~D[2019-11-01], 8.4],
# [~D[2019-11-02], 7.5],
# ...
# ]
Sheet contents
Sheets are loaded on-demand by sheet/3
and sheets/2
.
The sheet contents is returned as a list of lists:
[
["A1", "B1", "C1" | _],
["A2", "B2", "C2" | _],
["A3", "B3", "C3" | _],
| _
]
The behavior of the sheet parser can be customized for each
individual sheet, see sheet/3
.
Cell types
This library takes a best effort approach for determining cell types. In order of priority, the actual type of an XLSX cell value is determined using:
- basic cell properties (e.g. boolean)
- predefined known styles (e.g. default money/date formats)
- introspection of the custom format string associated with the cell
Custom formats supported by default
- percentages
- ISO 8601 date/time (y-m-d)
- US date/time (m/d/y)
- European date/time (d/m/y)
Additional custom formats support
If the spreadsheet you need to process contains some unusual cell formatting, you may provide hints to map format strings to a known cell type.
The hints are given as a list of {matcher, type}
tuples. The matcher is either a
string or regex to match against the custom format string. The supported types are:
:string
:number
:percentage
:date
:time
:date_time
:unsupported
(used for explicitly unsupported styles and formats)
Example
[
{"mmm yy", :date},
{~r/mmm? yy hh:mm/, :date_time},
{"[$CHF]0.00", :number}
]
To find out what custom formats are in use in the workbook, you can inspect package.workbook.custom_formats
:
# num_fmt_id => format string
%{
"0" => "General",
"59" => "dd/mm/yyyy",
"60" => "dd/mm/yyyy hh:mm",
"61" => "hh:mm",
"62" => "0.0%",
"63" => "[$CHF]0.00"
}
Link to this section Summary
Types
Error tuple with message describing the cause of the error
List of cell values
List of rows
Sheet name
Source for the XLSX file: file system (:path
) or in-memory (:binary
)
Option to specify the XLSX file source
Functions
Loads all the sheets in the workbook concurrently.
Opens an XLSX file located on the file system (default) or from memory.
Loads the sheet with the given name (see sheet_names/1
)
Lists the names of the sheets in the package's workbook
Loads all the sheets in the workbook.
Link to this section Types
Specs
error() :: {:error, String.t()}
Error tuple with message describing the cause of the error
Specs
row() :: [any()]
List of cell values
Specs
rows() :: [row()]
List of rows
Specs
sheet_name() :: String.t()
Sheet name
Specs
source() :: :path | :binary
Source for the XLSX file: file system (:path
) or in-memory (:binary
)
Specs
source_option() :: {:source, source()}
Option to specify the XLSX file source
Link to this section Functions
Loads all the sheets in the workbook concurrently.
On success, returns {:ok, [{sheet_name, rows}, ...]}
.
When processing files with multiple sheets, async_sheets/3
is ~3x faster than sheets/2
but it comes with a caveat. async_sheets/3
uses Task.async_stream/3
under the hood and thus
runs each concurrent task with a timeout. If you expect your dataset to be of a significant size,
you may want to increase it from the default 10000ms (see "Concurrency options" below).
If the order in which the sheets are returned is not relevant for your application, you can
pass ordered: false
(see "Concurrency options" below) for a modest speed gain.
Filtering options
See sheets/2
.
Sheet options
See sheet/2
.
Concurrency options
max_concurrency
- maximum number of tasks to run at the same time (default:System.schedulers_online/0
)ordered
- maintain order consistent withsheet_names/1
(default:true
)timeout
- maximum duration in milliseconds to process a sheet (default:10_000
)
Specs
open(String.t() | binary(), [source_option()]) :: {:ok, XlsxReader.Package.t()} | error()
Opens an XLSX file located on the file system (default) or from memory.
Examples
Opening XLSX file on the file system
{:ok, package} = XlsxReader.open("test.xlsx")
Opening XLSX file from memory
blob = File.read!("test.xlsx")
{:ok, package} = XlsxReader.open(blob, source: :binary)
Options
source
::path
(on the file system, default) or:binary
(in memory)supported_custom_formats
: a list of{regex | string, type}
tuples (see "Additional custom formats support")
Specs
sheet(XlsxReader.Package.t(), sheet_name(), Keyword.t()) :: {:ok, rows()} | error()
Loads the sheet with the given name (see sheet_names/1
)
Options
type_conversion
- boolean (default:true
)blank_value
- placeholder value for empty cells (default:""
)empty_rows
- include empty rows (default:true
)number_type
- type used for numeric conversion :Integer
,Decimal
orFloat
(default:Float
)skip_row?
: function callback that determines if a row should be skipped. Takes precedence overblank_value
andempty_rows
. Defaults tonil
(keeping the behaviour ofblank_value
andempty_rows
).cell_data_format
: Controls the format of the cell data. Can be:value
(default, returns the cell value only) or:cell
(returns instances ofXlsxReader.Cell
).
The Decimal
type requires the decimal library.
Examples
Skipping rows
When using the skip_row?
callback, rows are ignored in the parser which is more memory efficient.
# Skip all rows for which all the values are either blank or "-"
XlsxReader.sheet(package, "Sheet1", skip_row?: fn row ->
Enum.all?(row, & String.trim(&1) in ["", "-"])
end)
# Skip all rows for which the first column contains the text "disabled"
XlsxReader.sheet(package, "Sheet1", skip_row?: fn [column | _] ->
column == "disabled"
end)
Specs
sheet_names(XlsxReader.Package.t()) :: [sheet_name()]
Lists the names of the sheets in the package's workbook
Specs
sheets(XlsxReader.Package.t(), Keyword.t()) :: {:ok, [{sheet_name(), rows()}]} | error()
Loads all the sheets in the workbook.
On success, returns {:ok, [{sheet_name, rows}, ...]}
.
Filtering options
only
- include the sheets whose name matches the filterexcept
- exclude the sheets whose name matches the filter
Sheets can filtered by name using:
- a string (e.g.
"Exact Match"
) - a regex (e.g.
~r/Sheet +/
) - a list of string and/or regexes (e.g.
["Parameters", ~r/Sheet [12]/]
)
Sheet options
See sheet/2
.