Archive (Archive v0.1.0)

Archive provides Elixir bindings to libarchive through the power of the wonderful Zigler library.

Archive provides a high-level API for interacting with archive files.

Like libarchive, Archive treats all files as streams first and foremost, but provides many convenient high-level APIs to make it more natural to work with archive.

Early Development

Archive is still very early in its development, and currently only supports reading archives with all formats, compressions, and filters enabled. In the future, these will be configurable parameters.

Reading

As streams, archives are not conducive to random-access reads or seeks. Once archives are opened and read, they must be closed and reopened to read again. It is often a two-stage process to read an archive, where you read a list of the contents first, then selectively filter which items you want suring a second pass.

Archive takes care of all resource allocations, initializations, and cleanup for you. Using the high-level API, you only need to provide a mapping function to determine what to do with each entry as it is streamed.

The Low-Level Reading Loop

Although Archive's high-level API takes care of all of the resource management for you, it can still be useful to understand how it works:

  1. Create new archive reader object
  2. Update any global reader properties as appropriate. These properties determine supported compressions, formats, etc.
  3. Open the archive
  4. Repeatedly call archive_read_next_header to get information about successive archive entries. Call archive_read_data to extract data for entries of interest.
  5. Cleanup archive reader object

The mapping function will accept an Archive.Entry struct, which will contain metadata (such as path and size) information about the entry. You can use that information to determine what to do in your function.

You can also use function from the Archive.Entry module to perform different operations with the entry (most commonly Archive.Entry.load/1).

Usage of Archive.Entry functions

It is generally discouraged to use function from the Archive.Entry module outside of the context of a function passed to the high-level API.

As mentioned earlier, since archives are all streaming objects, each entry can only be operated on while it is the current entry in the stream. If you try to use functions from the Archive.Entry module while outside of the loop you provide to the various high-level APIs, it is up to you to ensure that the entry owns the reference to the archive object (:ref key in the Archive.Entry struct). Most functions do not work when :ref is nil. The high-level API takes care of ensuring that the Archive.Entry has the reference when your function is applied to it.

Examples

Setup the archive

data = File.read!("/path/to/data.zip")
{:ok, a} = Archive.new()

Read the index of entries. Notice that the output of inspection will show you how many items are in the archive (given the function you supplied to Archive.read/3), the archive format, the size of the archive, and more.

{:ok, a} = Archive.read(a, data)
{:ok, #Archive[zip]<
 147 entries (0 loaded), 506.0 KB
 
   .editorconfig (166 B)
   .github/ (1 items, 338 B)
     workflows/ (1 items, 338 B)
       deploy-theme.yml (338 B)
   ... and 21 more
>}

Here's an example of reading into memory all entries that are larger than 1500 bytes, and store the entries as a list (rather than as a hierarchical map):

{:ok, a} =
Archive.read(a, data,
  with: fn entry ->
    if entry.size > 1500 do
      {:ok, entry} = Archive.Entry.load(entry)
      entry
    else
      entry
    end
    end, as: :list
)
{:ok, #Archive[zip]<
 147 entries (40 loaded), 506.0 KB
 
   .editorconfig (166 B)
   .github/ (1 items, 338 B)
     workflows/ (1 items, 338 B)
       deploy-theme.yml (338 B)
   ... and 21 more
>}

Writing

TODO

Archive is still very early in development and does not implement any of the writing API yet.

Inspect

Archive and Archive.Entry provide custom implementations for the Inspect protocol.

When inspecting Archive, the following custom options can be supplied to the custom_options option of inspect:

  • :depth - Depth of directories to display. Defaults to 3.
  • :breadth - Breadth of items to display. Defaults to 2.

Examples

IO.inspect(%Archive{} = a, custom_options: [depth: 3, breadth: 2])
#Archive[zip]<
147 entries (40 loaded), 506.0 KB

  .editorconfig (166 B)
  .github/ (1 items, 338 B)
    workflows/ (1 items, 338 B)
      deploy-theme.yml (338 B)
  ... and 21 more
>

Summary

Functions

Streams the contents of an archive from a file, applying the supplied function to each entry.

Streams the contents of an archive from memory, applying the supplied function to each entry.

Converts a list of Archive.Entry to a hierachical map, similar to a filesystem structure.

Initializes an Archive with the appropriate settings and properties.

Creates a new Archive struct. This must be initialized before any IO operations can occur.

Reads the content of an archive.

Functions

Link to this function

from_file_streaming(archive, filename, fun)

Streams the contents of an archive from a file, applying the supplied function to each entry.

Link to this function

from_memory_streaming(archive, data, fun)

Streams the contents of an archive from memory, applying the supplied function to each entry.

Link to this function

hierarchical(entries)

Converts a list of Archive.Entry to a hierachical map, similar to a filesystem structure.

Link to this function

init(archive, opts \\ [])

Initializes an Archive with the appropriate settings and properties.

Properties and Settings

Currently, the properties and settings are not configurable. The default is to support all archive formats (including raw), all compression formats, and not filter any entries.

In the future, init/2 will accept options for all of these to setup the reader / writer in the appropriate modes

Creates a new Archive struct. This must be initialized before any IO operations can occur.

Link to this function

read(archive, filename_or_data, opts \\ [])

Reads the content of an archive.

This function populates meta-data about the archive, such as total archive size and archive format.

Options

  • :with - Applies the supplied function to each Archive.Entry. Defaults to the identity function.
  • :as - How to collect the entries. Can be :list or :map, where :map creates a hierarchical filesystem-like representation of the entries. Defaults to :map.