Archive (Archive v0.2.0)
Archive
provides Elixir bindings to libarchive
through the power of the wonderful Zigler
library.
Archive
provides a high-level API for interacting with archive files.
Like libarchive
, Archive
treats all files as streams first and foremost, but provides many convenient high-level APIs to make it more natural to work with archive.
Early Development
Archive
is still very early in its development, and currently only supports reading archives with all formats, compressions, and filters enabled. In the future, these will be configurable parameters.
Reading
As streams, archives are not conducive to random-access reads or seeks. Once archives are opened and read, they must be closed and reopened to read again. It is often a two-stage process to read an archive, where you read a list of the contents first, then selectively filter which items you want suring a second pass.
Archive
takes care of all resource allocations, initializations, and cleanup for you. Using the high-level API, you only need to provide a mapping function to determine what to do with each entry as it is streamed.
The High-Level API
Archive
's high-level API consists of the following:
read/3
- The main entry-point for reading archive contents as anEnumerable
, since it automatically collects the entries.read/3
can work for both file-based reads and memory-based reads.from_file_streaming/3
- Streams the contents of the archive from a file, applying the supplied function.from_memory_streaming/3
- Streams the contents of the archive from memory, applying the supplied function.
The Low-Level Reading Loop
Although
Archive
's high-level API takes care of all of the resource management for you, it can still be useful to understand how it works:
- Create new archive reader object
- Update any global reader properties as appropriate. These properties determine supported compressions, formats, etc.
- Open the archive
- Repeatedly call archive_read_next_header to get information about successive archive entries. Call archive_read_data to extract data for entries of interest.
- Cleanup archive reader object
The mapping function will accept an Archive.Entry
struct, which will contain metadata (such as path and size) information about the entry. You can use that information to determine what to do in your function.
You can also use function from the Archive.Entry
module to perform different operations with the entry (most commonly Archive.Entry.load/2
).
Usage of
Archive.Entry
functionsIt is generally discouraged to use function from the
Archive.Entry
module outside of the context of a function passed to the high-level API.As mentioned earlier, since archives are all streaming objects, each entry can only be operated on while it is the current entry in the stream. If you try to use functions from the
Archive.Entry
module while outside of the loop you provide to the various high-level APIs, it is up to you to also supply theArchive
struct. The high-level API takes care of ensuring that theArchive
gets passed in.
Examples
Setup the archive
data = File.read!("/path/to/data.zip")
{:ok, a} = Archive.new()
Read the index of entries. Notice that the output of inspection
will show you how many items are in the archive (given the function you supplied to Archive.read/3
), the archive format, the size of the archive, and more.
{:ok, a} = Archive.read(a, data)
{:ok, #Archive[zip]<
147 entries (0 loaded), 506.0 KB
───────────────
.editorconfig (166 B)
.github/ (1 items, 338 B)
workflows/ (1 items, 338 B)
deploy-theme.yml (338 B)
... and 21 more
>}
Here's an example of reading into memory all entries that are larger than 1500 bytes, and store the entries as a list (rather than as a hierarchical map):
{:ok, a} =
Archive.read(a, data,
with: fn entry, archive ->
if entry.size > 1500 do
{:ok, entry} = Archive.Entry.load(entry, archive)
entry
else
entry
end
end, as: :list
)
{:ok, #Archive[zip]<
147 entries (40 loaded), 506.0 KB
───────────────
.editorconfig (166 B)
.github/ (1 items, 338 B)
workflows/ (1 items, 338 B)
deploy-theme.yml (338 B)
... and 21 more
>}
Writing
TODO
Archive
is still very early in development and does not implement any of the writing API yet.
Inspect
Archive
and Archive.Entry
provide custom implementations for the Inspect
protocol.
When inspecting Archive
, the following custom options can be supplied to the custom_options
option of inspect:
:depth
- Depth of directories to display. Defaults to 3.:breadth
- Breadth of items to display. Defaults to 2.
Examples
IO.inspect(%Archive{} = a, custom_options: [depth: 3, breadth: 2])
#Archive[zip]<
147 entries (40 loaded), 506.0 KB
───────────────
.editorconfig (166 B)
.github/ (1 items, 338 B)
workflows/ (1 items, 338 B)
deploy-theme.yml (338 B)
... and 21 more
>
Summary
Functions
Streams the contents of an archive from a file, applying the supplied function to each entry.
Streams the contents of an archive from memory, applying the supplied function to each entry.
Converts a list of Archive.Entry
to a hierachical map, similar to a filesystem structure.
Initializes an Archive
with the appropriate settings and properties.
Reads the content of an archive.
Functions
from_file_streaming(archive, filename, fun)
Streams the contents of an archive from a file, applying the supplied function to each entry.
Refer to read/3
for more information about the supplied function.
from_memory_streaming(archive, data, fun)
Streams the contents of an archive from memory, applying the supplied function to each entry.
Refer to read/3
for more information about the supplied function.
hierarchical(entries)
Converts a list of Archive.Entry
to a hierachical map, similar to a filesystem structure.
init(archive, opts \\ [])
Initializes an Archive
with the appropriate settings and properties.
Properties and Settings
Currently, the properties and settings are not configurable. The default is to support all archive formats (including raw), all compression formats, and not filter any entries.
In the future,
init/2
will accept options for all of these to setup the reader / writer in the appropriate modes
init!(archive, opts \\ [])
new()
Creates a new Archive
struct. This must be initialized before any IO operations can occur.
new!()
read(archive, filename_or_data, opts \\ [])
Reads the content of an archive.
This function populates meta-data about the archive, such as total archive size and archive format.
Options
:with
- Applies the supplied function to eachArchive.Entry
. Can be either an arity-1 function, which only received the currentArchive.Entry
, or an arity-2 function that recieves the entry and theArchive
. Defaults to the identity function.:as
- How to collect the entries. Can be:list
or:map
, where:map
creates a hierarchical filesystem-like representation of the entries. Defaults to:map
.