Xlsxir v1.6.0 Xlsxir View Source
Extracts and parses data from a .xlsx
file to an Erlang Term Storage (ETS) process and provides various functions for accessing the data.
Link to this section Summary
Functions
Deletes ETS process tid
and returns :ok
if successful
Deprecated
Extracts worksheet data contained in the specified .xlsx
file to an ETS process. Successful extraction
returns {:ok, tid}
with the timer argument set to false and returns a tuple of {:ok, tid, time}
where time is a list containing time elapsed during the extraction process
(i.e. [hour, minute, second, microsecond]
) when the timer argument is set to true and tid - is the ETS table id
Accesses ETS process and returns value of specified cell
Accesses tid
ETS process and returns values of specified column in a list
See get_multi_info2
documentation
Accesses ETS process and returns data formatted as a list of row value lists
Accesses ETS process and returns data formatted as a map of cell references and values
Accesses ETS process and returns an indexed map which functions like a multi-dimensional array in other languages
Returns count data based on num_type
specified:
:rows
- Returns number of rows contained in worksheet:cols
- Returns number of columns contained in worksheet:cells
- Returns number of cells contained in worksheet:all
- Returns a keyword list containing all of the above
Accesses ETS process and returns values of specified row in a list
Extracts worksheet data contained in the specified .xlsx
file to an ETS process. Successful extraction
returns {:ok, table_id}
with the timer argument set to false and returns a tuple of {:ok, table_id, time}
where time
is a list containing time elapsed during the extraction process
(i.e. [hour, minute, second, microsecond]
) when the timer argument is set to true. The table_id
is used to access data for that particular ETS process with the various access functions of the
Xlsxir
module
Extracts the first n number of rows from the specified worksheet contained in the specified .xlsx
file to an ETS process.
Successful extraction returns {:ok, tid}
where tid - is ETS table id
Called when an application is started
Stream worksheet rows contained in the specified .xlsx
file
Link to this section Functions
Deletes ETS process tid
and returns :ok
if successful.
Example
Extract first worksheet in an example file named test.xlsx
located in ./test/test_data
:
iex> {:ok, tid} = Xlsxir.extract("./test/test_data/test.xlsx", 0)
iex> Xlsxir.close(tid)
:ok
iex> {:ok, tid} = Xlsxir.multi_extract("./test/test_data/test.xlsx", 0)
iex> Xlsxir.close(tid)
:ok
Deprecated
Extracts worksheet data contained in the specified .xlsx
file to an ETS process. Successful extraction
returns {:ok, tid}
with the timer argument set to false and returns a tuple of {:ok, tid, time}
where time is a list containing time elapsed during the extraction process
(i.e. [hour, minute, second, microsecond]
) when the timer argument is set to true and tid - is the ETS table id
Cells containing formulas in the worksheet are extracted as either a string
, integer
or float
depending on the resulting value of the cell.
Cells containing an ISO 8601 date format are extracted and converted to Erlang :calendar.date()
format (i.e. {year, month, day}
).
Parameters
path
- file path of a.xlsx
file type instring
formatindex
- index of worksheet from within the Excel workbook to be parsed (zero-based index)timer
- boolean flag that tracks extraction process time and returns it when set totrue
. Default value isfalse
.
Options
:max_rows
- the number of rows to fetch from within the worksheet:extract_to
- Specify how the.xlsx
content (i.e. sharedStrings.xml, style.xml and worksheets xml files) will be be extracted before being parsed.:memory
will extract files to memory, and:file
to files in the file system:extract_base_dir
- when extracting to file, files will be extracted in a sub directory in the:extract_base_dir
directory. Defaults toApplication.get_env(:xlsxir, :extract_base_dir)
or “temp”
Example
Extract first worksheet in an example file named test.xlsx
located in ./test/test_data
:
iex> {:ok, tid} = Xlsxir.extract("./test/test_data/test.xlsx", 0)
iex> Enum.member?(:ets.all, tid)
true
iex> Xlsxir.close(tid)
:ok
iex> {:ok, tid} = Xlsxir.extract("./test/test_data/test.xlsx", 0, false, [extract_to: :file])
iex> Enum.member?(:ets.all, tid)
true
iex> Xlsxir.close(tid)
:ok
iex> {:ok, tid, _timer} = Xlsxir.extract("./test/test_data/test.xlsx", 0, true)
iex> Enum.member?(:ets.all, tid)
true
iex> Xlsxir.close(tid)
:ok
Test parallel parsing
iex> task1 = Task.async(fn -> Xlsxir.extract("./test/test_data/test.xlsx", 0) end)
iex> task2 = Task.async(fn -> Xlsxir.extract("./test/test_data/test.xlsx", 0) end)
iex> {:ok, tid1} = Task.await(task1)
iex> {:ok, tid2} = Task.await(task2)
iex> Xlsxir.get_list(tid1)
[["string one", "string two", 10, 20, {2016, 1, 1}]]
iex> Xlsxir.get_list(tid2)
[["string one", "string two", 10, 20, {2016, 1, 1}]]
iex> Xlsxir.close(tid1)
:ok
iex> Xlsxir.close(tid2)
:ok
Example (errors)
iex> Xlsxir.extract("./test/test_data/test.invalidfile", 0)
{:error, "Invalid file type (expected xlsx)."}
iex> Xlsxir.extract("./test/test_data/test.xlsx", 100)
{:error, "Invalid worksheet index."}
Accesses ETS process and returns value of specified cell.
Parameters
table_id
- table identifier of ETS process to be accessedcell_ref
- Reference name of cell to be returned instring
format (i.e."A1"
)
Example
An example file named test.xlsx
located in ./test/test_data
containing the following:
- cell ‘A1’ -> “string one”
- cell ‘B1’ -> “string two”
- cell ‘C1’ -> integer of 10
- cell ‘D1’ -> formula of “4 * 5”
cell ‘E1’ -> date of 1/1/2016 or Excel date serial of 42370
iex> {:ok, tid} = Xlsxir.extract(“./test/test_data/test.xlsx”, 0) iex> Xlsxir.get_cell(tid, “A1”) “string one” iex> Xlsxir.close(tid) :ok
iex> {:ok, tid} = Xlsxir.multi_extract(“./test/test_data/test.xlsx”, 0) iex> Xlsxir.get_cell(tid, “A1”) “string one” iex> Xlsxir.close(tid) :ok
Accesses tid
ETS process and returns values of specified column in a list
.
Parameters
tid
- table identifier of ETS process to be accessedcol
- Reference name of column to be returned instring
format (i.e."A"
)
Example
An example file named test.xlsx
located in ./test/test_data
containing the following:
- cell ‘A1’ -> “string one”
- cell ‘B1’ -> “string two”
- cell ‘C1’ -> integer of 10
- cell ‘D1’ -> formula of “4 * 5”
cell ‘E1’ -> date of 1/1/2016 or Excel date serial of 42370
iex> {:ok, tid} = Xlsxir.extract(“./test/test_data/test.xlsx”, 0) iex> Xlsxir.get_col(tid, “A”) [“string one”] iex> Xlsxir.close(tid) :ok
iex> {:ok, tid} = Xlsxir.multi_extract(“./test/test_data/test.xlsx”, 0) iex> Xlsxir.get_col(tid, “A”) [“string one”] iex> Xlsxir.close(tid) :ok
See get_multi_info2
documentation.
Accesses ETS process and returns data formatted as a list of row value lists.
Parameters
tid
- table identifier of ETS process to be accessed
Example
An example file named test.xlsx
located in ./test/test_data
containing the following:
- cell ‘A1’ -> “string one”
- cell ‘B1’ -> “string two”
- cell ‘C1’ -> integer of 10
- cell ‘D1’ -> formula of “4 * 5”
cell ‘E1’ -> date of 1/1/2016 or Excel date serial of 42370
iex> {:ok, tid} = Xlsxir.extract(“./test/test_data/test.xlsx”, 0) iex> Xlsxir.get_list(tid) [[“string one”, “string two”, 10, 20, {2016, 1, 1}]] iex> Xlsxir.close(tid) :ok
iex> {:ok, tid} = Xlsxir.extract(“./test/test_data/test.xlsx”, 2) iex> Xlsxir.get_list(tid) |> List.first |> Enum.count 16384
iex> {:ok, tid} = Xlsxir.multi_extract(“./test/test_data/test.xlsx”, 0) iex> Xlsxir.get_list(tid) [[“string one”, “string two”, 10, 20, {2016, 1, 1}]] iex> Xlsxir.close(tid) :ok
Accesses ETS process and returns data formatted as a map of cell references and values.
Parameters
tid
- table identifier of ETS process to be accessed
Example
An example file named test.xlsx
located in ./test/test_data
containing the following:
- cell ‘A1’ -> “string one”
- cell ‘B1’ -> “string two”
- cell ‘C1’ -> integer of 10
- cell ‘D1’ -> formula of “4 * 5”
cell ‘E1’ -> date of 1/1/2016 or Excel date serial of 42370
iex> {:ok, tid} = Xlsxir.extract(“./test/test_data/test.xlsx”, 0) iex> Xlsxir.get_map(tid) %{ “A1” => “string one”, “B1” => “string two”, “C1” => 10, “D1” => 20, “E1” => {2016,1,1}} iex> Xlsxir.close(tid) :ok
iex> {:ok, tid} = Xlsxir.multi_extract(“./test/test_data/test.xlsx”, 0) iex> Xlsxir.get_map(tid) %{ “A1” => “string one”, “B1” => “string two”, “C1” => 10, “D1” => 20, “E1” => {2016,1,1}} iex> Xlsxir.close(tid) :ok
Accesses ETS process and returns an indexed map which functions like a multi-dimensional array in other languages.
Parameters
tid
- table identifier of ETS process to be accessed
Example
An example file named test.xlsx
located in ./test/test_data
containing the following:
- cell ‘A1’ -> “string one”
- cell ‘B1’ -> “string two”
- cell ‘C1’ -> integer of 10
- cell ‘D1’ -> formula of “4 * 5”
cell ‘E1’ -> date of 1/1/2016 or Excel date serial of 42370
iex> {:ok, tid} = Xlsxir.extract(“./test/test_data/test.xlsx”, 0) iex> mda = Xlsxir.get_mda(tid) %{0 => %{0 => “string one”, 1 => “string two”, 2 => 10, 3 => 20, 4 => {2016,1,1}}} iex> mda[0][0] “string one” iex> mda[0][2] 10 iex> Xlsxir.close(tid) :ok
iex> {:ok, tid} = Xlsxir.multi_extract(“./test/test_data/test.xlsx”, 0) iex> mda = Xlsxir.get_mda(tid) %{0 => %{0 => “string one”, 1 => “string two”, 2 => 10, 3 => 20, 4 => {2016,1,1}}} iex> mda[0][0] “string one” iex> mda[0][2] 10 iex> Xlsxir.close(tid) :ok
Returns count data based on num_type
specified:
:rows
- Returns number of rows contained in worksheet:cols
- Returns number of columns contained in worksheet:cells
- Returns number of cells contained in worksheet:all
- Returns a keyword list containing all of the above
Parameters
tid
- table identifier of ETS process to be accessednum_type
- type of count data to be returned (see above), defaults to:all
Accesses ETS process and returns values of specified row in a list
.
Parameters
tid
- table identifier of ETS process to be accessedrow
- Reference name of row to be returned ininteger
format (i.e.1
)
Example
An example file named test.xlsx
located in ./test/test_data
containing the following:
- cell ‘A1’ -> “string one”
- cell ‘B1’ -> “string two”
- cell ‘C1’ -> integer of 10
- cell ‘D1’ -> formula of “4 * 5”
cell ‘E1’ -> date of 1/1/2016 or Excel date serial of 42370
iex> {:ok, tid} = Xlsxir.extract(“./test/test_data/test.xlsx”, 0) iex> Xlsxir.get_row(tid, 1) [“string one”, “string two”, 10, 20, {2016, 1, 1}] iex> Xlsxir.close(tid) :ok
iex> {:ok, tid} = Xlsxir.multi_extract(“./test/test_data/test.xlsx”, 0) iex> Xlsxir.get_row(tid, 1) [“string one”, “string two”, 10, 20, {2016, 1, 1}] iex> Xlsxir.close(tid) :ok
Extracts worksheet data contained in the specified .xlsx
file to an ETS process. Successful extraction
returns {:ok, table_id}
with the timer argument set to false and returns a tuple of {:ok, table_id, time}
where time
is a list containing time elapsed during the extraction process
(i.e. [hour, minute, second, microsecond]
) when the timer argument is set to true. The table_id
is used to access data for that particular ETS process with the various access functions of the
Xlsxir
module.
Cells containing formulas in the worksheet are extracted as either a string
, integer
or float
depending on the resulting value of the cell.
Cells containing an ISO 8601 date format are extracted and converted to Erlang :calendar.date()
format (i.e. {year, month, day}
).
Parameters
path
- file path of a.xlsx
file type instring
formatindex
- index of worksheet from within the Excel workbook to be parsed (zero-based index)timer
- boolean flag that tracts extraction process time and returns it when set totrue
. Defalut value isfalse
.
Options
:max_rows
- the number of rows to fetch from within the worksheets:extract_to
- Specify how the.xlsx
content (i.e. sharedStrings.xml, style.xml and worksheets xml files) will be be extracted before being parsed.:memory
will extract files to memory, and:file
to files in the file system:extract_base_dir
- when extracting to file, files will be extracted in a sub directory in the:extract_base_dir
directory. Defaults toApplication.get_env(:xlsxir, :extract_base_dir)
or “temp”
Example
Extract first worksheet in an example file named test.xlsx
located in ./test/test_data
:
iex> {:ok, tid} = Xlsxir.multi_extract("./test/test_data/test.xlsx", 0)
iex> Enum.member?(:ets.all, tid)
true
iex> Xlsxir.close(tid)
:ok
Example
Extract all worksheets in an example file named test.xlsx
located in ./test/test_data
:
iex> results = Xlsxir.multi_extract("./test/test_data/test.xlsx")
iex> alive_ids = Enum.map(results, fn {:ok, tid} -> Enum.member?(:ets.all, tid) end)
iex> Enum.all?(alive_ids)
true
iex> Enum.map(results, fn {:ok, tid} -> Xlsxir.close(tid) end) |> Enum.all?(fn result -> result == :ok end)
true
Example
Extract all worksheets in an example file named test.xlsx
located in ./test/test_data
with timer:
iex> results = Xlsxir.multi_extract("./test/test_data/test.xlsx", nil, true)
iex> alive_ids = Enum.map(results, fn {:ok, tid, _timer} -> Enum.member?(:ets.all, tid) end)
iex> Enum.all?(alive_ids)
true
iex> Enum.map(results, fn {:ok, tid, _timer} -> Xlsxir.close(tid) end) |> Enum.all?(fn result -> result == :ok end)
true
Example (errors)
iex> Xlsxir.multi_extract("./test/test_data/test.invalidfile", 0)
{:error, "Invalid file type (expected xlsx)."}
iex> Xlsxir.multi_extract("./test/test_data/test.xlsx", 100)
{:error, "Invalid worksheet index."}
Extracts the first n number of rows from the specified worksheet contained in the specified .xlsx
file to an ETS process.
Successful extraction returns {:ok, tid}
where tid - is ETS table id.
Parameters
path
- file path of a.xlsx
file type instring
formatindex
- index of worksheet from within the Excel workbook to be parsed (zero-based index)rows
- the number of rows to fetch from within the specified worksheet
Options
:extract_to
- Specify how the.xlsx
content (i.e. sharedStrings.xml, style.xml and worksheets xml files) will be be extracted before being parsed.:memory
will extract files to memory, and:file
to files in the file system:extract_base_dir
- when extracting to file, files will be extracted in a sub directory in the:extract_base_dir
directory. Defaults toApplication.get_env(:xlsxir, :extract_base_dir)
or “temp”
Example
Peek at the first 10 rows of the 9th worksheet in an example file named test.xlsx
located in ./test/test_data
:
iex> {:ok, tid} = Xlsxir.peek("./test/test_data/test.xlsx", 8, 10)
iex> Enum.member?(:ets.all, tid)
true
iex> Xlsxir.close(tid)
:ok
Called when an application is started.
This function is called when an the application is started using
Application.start/2
(and functions on top of that, such as
Application.ensure_started/2
). This function should start the top-level
process of the application (which should be the top supervisor of the
application’s supervision tree if the application follows the OTP design
principles around supervision).
start_type
defines how the application is started:
:normal
- used if the startup is a normal startup or if the application is distributed and is started on the current node because of a failover from another mode and the application specification key:start_phases
is:undefined
.{:takeover, node}
- used if the application is distributed and is started on the current node because of a failover on the nodenode
.{:failover, node}
- used if the application is distributed and is started on the current node because of a failover on nodenode
, and the application specification key:start_phases
is not:undefined
.
start_args
are the arguments passed to the application in the :mod
specification key (e.g., mod: {MyApp, [:my_args]}
).
This function should either return {:ok, pid}
or {:ok, pid, state}
if
startup is successful. pid
should be the PID of the top supervisor. state
can be an arbitrary term, and if omitted will default to []
; if the
application is later stopped, state
is passed to the stop/1
callback (see
the documentation for the c:stop/1
callback for more information).
use Application
provides no default implementation for the start/2
callback.
Callback implementation for Application.start/2
.
Stream worksheet rows contained in the specified .xlsx
file.
Cells containing formulas in the worksheet are extracted as either a string
, integer
or float
depending on the resulting value of the cell.
Cells containing an ISO 8601 date format are extracted and converted to Erlang :calendar.date()
format (i.e. {year, month, day}
).
Parameters
path
- file path of a.xlsx
file type instring
formatindex
- index of worksheet from within the Excel workbook to be parsed (zero-based index)
Options
:extract_to
- Specify how the.xlsx
content (i.e. sharedStrings.xml, style.xml and worksheets xml files) will be be extracted before being parsed.:memory
will extract files to memory, and:file
to files in the file system:extract_base_dir
- when extracting to file, files will be extracted in a sub directory in the:extract_base_dir
directory. Defaults toApplication.get_env(:xlsxir, :extract_base_dir)
or “temp”
Example
Extract first worksheet in an example file named test.xlsx
located in ./test/test_data
:
iex> Xlsxir.stream_list("./test/test_data/test.xlsx", 1) |> Enum.take(1)
[[1, 2]]
iex> Xlsxir.stream_list("./test/test_data/test.xlsx", 1) |> Enum.take(3)
[[1, 2], [3, 4]]