CI Hex.pm Docs

EPUB read and write for Elixir, backed by a Rustler NIF. Parses EPUB 2 and EPUB 3 documents into structured Elixir data and generates EPUB 3 documents (with a backward-compatible toc.ncx) from the same structures.

Installation

Add to mix.exs:

def deps do
  [
    {:langelic_epub, "~> 0.1"}
  ]
end

Precompiled NIFs are published for macOS (aarch64, x86_64) and Linux (aarch64-gnu, x86_64-gnu, x86_64-musl). Users on those platforms do not need a Rust toolchain. The artifacts target NIF ABI 2.16, which loads on current OTP releases (tested through OTP 29). Users on other platforms can build from source — see Building from source.

Quick start

# Read
{:ok, doc} = LangelicEpub.parse(File.read!("book.epub"))
doc.title       # => "The Hobbit"
doc.language    # => "en"
length(doc.spine)  # => 23

# Modify a chapter
[first | rest] = doc.spine
translated =
  %LangelicEpub.Chapter{first | data: translate(first.data)}
modified = %LangelicEpub.Document{doc | spine: [translated | rest]}

# Write
{:ok, bytes} = LangelicEpub.build(modified)
File.write!("translated.epub", bytes)

Why this library exists

There was a gap on Hex. bupe (the only EPUB-focused Elixir library) was last updated nine years ago and is minimal; other packages are single-purpose or metadata-only. The Rust ecosystem has mature EPUB tooling, so rather than reimplement format handling in pure Elixir — where EPUB 2/3 metadata variants, NCX vs. nav.xhtml, embedded fonts, refines metadata, and OPF schema quirks all accumulate bugs over time — this package wraps two mature Rust crates through a Rustler NIF:

  • iepub handles spine order, TOC tree, and cover detection on the read side.
  • epub-builder handles EPUB 3 generation on the write side.

A small OPF re-parse layer (quick-xml) fills in the fields iepub drops (<dc:language>, <dc:rights>, multiple <dc:creator> entries). A post- processing pass rewrites the generated OPF to preserve identifiers verbatim and inject DC elements epub-builder doesn't emit natively (<dc:publisher>, <dc:date>, <dc:rights>).

Supported features

FeatureReadWrite
EPUB 2 inputyesn/a
EPUB 3 inputyesyes (always emitted)
Multiple creatorsyesyes
NCX TOCyesyes (emitted for EPUB 2 readers)
nav.xhtml TOCyesyes
Embedded fontsyesyes
Embedded imagesyesyes
Embedded CSSyesyes
Cover imageyesyes
DRM-encrypted contentdetected, not decryptedn/a
MOBInono

Limitations and known issues

  • Identifier round-trip. epub-builder requires the primary <dc:identifier> to be a UUID and prefixes it with urn:uuid:. When the source identifier is not a UUID (e.g. an ISBN or URL), the package generates a deterministic UUID v5 for the primary slot and re-injects the original identifier verbatim via OPF post-processing so readers that look it up still find it.
  • TOC parsing is inconsistent for a small number of source EPUBs. iepub occasionally returns an empty nav tree for valid documents; the cause is a parser quirk for specific NCX/nav.xhtml structures. Generated output always has at least one nav entry per spine chapter (a positional title is used as a fallback) so epubcheck does not flag an empty nav.
  • Multiple <itemref> entries to the same file (common in Calibre-split EPUBs) are deduplicated into a single spine entry on read.
  • epub-builder ID collision warnings. When both <dc:language> and <dc:creator> are present, epub-builder reuses id="epub-creator-N" for both. The package rewrites the language ID to avoid the collision.
  • No streaming API. Both parse/1 and build/1 accept and return full byte buffers in memory. For documents over ~50 MB this may be inappropriate.
  • No validate/1 function. External validation should shell out to epubcheck.

Error model

Every public function returns {:ok, term} | {:error, %LangelicEpub.Error{}}. The :kind field is a well-documented atom (:invalid_zip, :missing_container, :malformed_opf, :io, :missing_required_field, :invalid_chapter, :duplicate_id, :panic). The full list is in the moduledoc of LangelicEpub.Error. Panics on the Rust side are caught and converted to {:error, %Error{kind: :panic}} so a malformed EPUB cannot crash the BEAM scheduler thread.

Architecture

langelic_epub is an Elixir wrapper around a Rustler NIF. The native code lives in native/langelic_epub/ and is compiled as a cdylib. Both NIF functions run on the DirtyCpu scheduler because parsing or building a 5 MB EPUB takes 50–200 ms, well past the 1 ms guideline.

lib/langelic_epub/        # Public API, struct modules, error module
lib/langelic_epub/native.ex  # RustlerPrecompiled binding
native/langelic_epub/src/ # Rust NIF (reader, writer, opf, types, error)

Building from source

Required:

  • Elixir ≥ 1.15
  • Rust ≥ stable (1.85+)

Set the environment variable to force a source build rather than downloading the precompiled NIF:

LANGELIC_EPUB_BUILD=true mix deps.get && mix compile

Contributing

Issues and pull requests are welcome. Before submitting a PR:

  • mix format
  • mix credo --strict
  • mix dialyzer
  • mix test --include external (requires epubcheck on PATH)
  • cargo fmt --check and cargo clippy -- -D warnings for Rust changes

License

MIT. See LICENSE.

This package wraps two Rust crates under separate licenses; see NOTICE for attribution.