Image.OCR (image_ocr v0.2.0)

Copy Markdown View Source

Idiomatic Elixir interface to the Tesseract OCR engine.

Image.OCR is a thin NIF binding (Tesseract 5.x) that accepts images as Vix.Vips.Image structs, file paths, or in-memory image binaries and returns recognised text.

Quick start

{:ok, instance} = Image.OCR.new()
{:ok, text}     = Image.OCR.read_text(instance, "page.png")

Or, in one shot:

{:ok, text} = Image.OCR.read_text("page.png")

Concurrency

A single Image.OCR instance wraps one tesseract::TessBaseAPI, which is not safe for concurrent use. Calls on a shared instance are serialised through a per-instance mutex; for true parallelism, build one instance per worker — the simplest way is Image.OCR.Pool.

Trained-data

Image.OCR ships with English (eng) trained-data. To install additional languages, see Mix.Tasks.Image.Ocr.Tessdata.Add. The trained-data location is resolved by Image.OCR.Tessdata.datapath/1.

Summary

Types

A page-segmentation mode. See psm/0 for valid values.

t()

An OCR instance — wraps a single Tesseract API resource.

A recognised word together with its confidence and bounding box.

Functions

Builds a new OCR instance.

One-shot convenience: builds a default instance, recognises input, and returns the recognised text.

Recognises text in input using instance and returns the result as a UTF-8 string.

Recognises input and returns each word with its confidence and bounding box.

Returns the linked Tesseract library version as a string.

Types

psm()

@type psm() ::
  :osd_only
  | :auto_osd
  | :auto_only
  | :auto
  | :single_column
  | :single_block_vert_text
  | :single_block
  | :single_line
  | :single_word
  | :circle_word
  | :single_char
  | :sparse_text
  | :sparse_text_osd
  | :raw_line

A page-segmentation mode. See psm/0 for valid values.

t()

@type t() :: %Image.OCR{
  datapath: String.t() | nil,
  locale: String.t(),
  psm: psm(),
  ref: reference() | binary(),
  tesseract_language: String.t()
}

An OCR instance — wraps a single Tesseract API resource.

  • :locale — the user-facing locale identifier as supplied to new/1 (e.g. "en", "en-US", "zh-Hans-CN").

  • :tesseract_language — the resolved Tesseract trained-data code that was passed to TessBaseAPI::Init (e.g. "eng").

word_result()

@type word_result() :: %{
  text: String.t(),
  confidence: float(),
  bbox:
    {non_neg_integer(), non_neg_integer(), non_neg_integer(), non_neg_integer()}
}

A recognised word together with its confidence and bounding box.

Functions

new(options \\ [])

@spec new(keyword()) :: {:ok, t()} | {:error, term()}

Builds a new OCR instance.

Arguments

  • options is an optional keyword list. See the options below.

Options

  • :locale is the locale identifier. Accepts ISO 639-1 codes ("en", :en, "fr"), BCP-47 tags for region/script variants ("en-US", "zh-Hans", "sr-Latn"), Tesseract codes verbatim ("frk", "osd"), or +-joined combinations ("en+fr", "chi_sim+eng"). Defaults to "en". Full BCP-47 parsing requires the optional :localize dependency. See Image.OCR.Languages.

  • :datapath is the directory containing <language>.traineddata files. Defaults to the value resolved by Image.OCR.Tessdata.datapath/1.

  • :psm is the page-segmentation mode atom. Defaults to :auto. See psm/0 for the full list.

  • :variables is a keyword list of SetVariable/2 tweakables, applied after initialisation. Example: [tessedit_char_whitelist: "0123456789"].

Returns

  • {:ok, %Image.OCR{}} on success.

  • {:error, reason} if Tesseract initialisation fails (commonly because the trained-data file is missing).

Examples

iex> {:ok, ocr} = Image.OCR.new()
iex> {ocr.locale, ocr.tesseract_language}
{"en", "eng"}

quick_read(input, options \\ [])

@spec quick_read(
  Image.OCR.Input.t(),
  keyword()
) :: {:ok, String.t()} | {:error, term()}

One-shot convenience: builds a default instance, recognises input, and returns the recognised text.

Equivalent to with {:ok, ocr} <- new(options), do: read_text(ocr, input). Prefer new/1 + read_text/3 (or Image.OCR.Pool) when calling more than once — building a fresh Tesseract instance per call is expensive.

Arguments

Returns

  • {:ok, text} on success.

  • {:error, reason} on failure.

read_text(ocr, input, options \\ [])

@spec read_text(t(), Image.OCR.Input.t(), keyword()) ::
  {:ok, String.t()} | {:error, term()}

Recognises text in input using instance and returns the result as a UTF-8 string.

Arguments

  • instance is an %Image.OCR{} struct returned by new/1.

  • input is a Vix.Vips.Image.t(), a path to an image file, or a binary of encoded image data. See Image.OCR.Input.

  • options is reserved for future use.

Returns

  • {:ok, text} on success, where text is the recognised UTF-8 string.

  • {:error, reason} on failure.

Examples

{:ok, ocr}  = Image.OCR.new()
{:ok, text} = Image.OCR.read_text(ocr, "page.png")

# Or with a Vix.Vips.Image already in memory:
{:ok, image} = Vix.Vips.Image.new_from_file("page.png")
{:ok, text}  = Image.OCR.read_text(ocr, image)

recognize(ocr, input, options \\ [])

@spec recognize(t(), Image.OCR.Input.t(), keyword()) ::
  {:ok, [word_result()]} | {:error, term()}

Recognises input and returns each word with its confidence and bounding box.

Arguments

  • instance is an %Image.OCR{} struct returned by new/1.

  • input is any value accepted by read_text/3.

  • options is reserved for future use.

Returns

  • {:ok, [word_result]} where each entry is a map with :text, :confidence (0–100), and :bbox ({x1, y1, x2, y2}) keys.

  • {:error, reason} on failure.

tesseract_version()

@spec tesseract_version() :: String.t()

Returns the linked Tesseract library version as a string.

Returns

  • A string such as "5.5.1".