Idiomatic Elixir interface to the Tesseract OCR engine.
Image.OCR is a thin NIF binding (Tesseract 5.x) that accepts images as
Vix.Vips.Image structs, file paths, or in-memory image binaries and
returns recognised text.
Quick start
{:ok, instance} = Image.OCR.new()
{:ok, text} = Image.OCR.read_text(instance, "page.png")Or, in one shot:
{:ok, text} = Image.OCR.read_text("page.png")Concurrency
A single Image.OCR instance wraps one tesseract::TessBaseAPI, which is
not safe for concurrent use. Calls on a shared instance are serialised
through a per-instance mutex; for true parallelism, build one instance per
worker — the simplest way is Image.OCR.Pool.
Trained-data
Image.OCR ships with English (eng) trained-data. To install additional
languages, see Mix.Tasks.Image.Ocr.Tessdata.Add. The trained-data location
is resolved by Image.OCR.Tessdata.datapath/1.
Summary
Types
A page-segmentation mode. See psm/0 for valid values.
An OCR instance — wraps a single Tesseract API resource.
A recognised word together with its confidence and bounding box.
Functions
Builds a new OCR instance.
One-shot convenience: builds a default instance, recognises input, and
returns the recognised text.
Recognises text in input using instance and returns the result as a
UTF-8 string.
Recognises input and returns each word with its confidence and bounding
box.
Returns the linked Tesseract library version as a string.
Types
@type psm() ::
:osd_only
| :auto_osd
| :auto_only
| :auto
| :single_column
| :single_block_vert_text
| :single_block
| :single_line
| :single_word
| :circle_word
| :single_char
| :sparse_text
| :sparse_text_osd
| :raw_line
A page-segmentation mode. See psm/0 for valid values.
@type t() :: %Image.OCR{ datapath: String.t() | nil, locale: String.t(), psm: psm(), ref: reference() | binary(), tesseract_language: String.t() }
An OCR instance — wraps a single Tesseract API resource.
:locale— the user-facing locale identifier as supplied tonew/1(e.g."en","en-US","zh-Hans-CN").:tesseract_language— the resolved Tesseract trained-data code that was passed toTessBaseAPI::Init(e.g."eng").
@type word_result() :: %{ text: String.t(), confidence: float(), bbox: {non_neg_integer(), non_neg_integer(), non_neg_integer(), non_neg_integer()} }
A recognised word together with its confidence and bounding box.
Functions
Builds a new OCR instance.
Arguments
optionsis an optional keyword list. See the options below.
Options
:localeis the locale identifier. Accepts ISO 639-1 codes ("en",:en,"fr"), BCP-47 tags for region/script variants ("en-US","zh-Hans","sr-Latn"), Tesseract codes verbatim ("frk","osd"), or+-joined combinations ("en+fr","chi_sim+eng"). Defaults to"en". Full BCP-47 parsing requires the optional:localizedependency. SeeImage.OCR.Languages.:datapathis the directory containing<language>.traineddatafiles. Defaults to the value resolved byImage.OCR.Tessdata.datapath/1.:psmis the page-segmentation mode atom. Defaults to:auto. Seepsm/0for the full list.:variablesis a keyword list ofSetVariable/2tweakables, applied after initialisation. Example:[tessedit_char_whitelist: "0123456789"].
Returns
{:ok, %Image.OCR{}}on success.{:error, reason}if Tesseract initialisation fails (commonly because the trained-data file is missing).
Examples
iex> {:ok, ocr} = Image.OCR.new()
iex> {ocr.locale, ocr.tesseract_language}
{"en", "eng"}
@spec quick_read( Image.OCR.Input.t(), keyword() ) :: {:ok, String.t()} | {:error, term()}
One-shot convenience: builds a default instance, recognises input, and
returns the recognised text.
Equivalent to with {:ok, ocr} <- new(options), do: read_text(ocr, input).
Prefer new/1 + read_text/3 (or Image.OCR.Pool) when calling more than
once — building a fresh Tesseract instance per call is expensive.
Arguments
inputis any value accepted byread_text/3.optionsis forwarded tonew/1.
Returns
{:ok, text}on success.{:error, reason}on failure.
@spec read_text(t(), Image.OCR.Input.t(), keyword()) :: {:ok, String.t()} | {:error, term()}
Recognises text in input using instance and returns the result as a
UTF-8 string.
Arguments
instanceis an%Image.OCR{}struct returned bynew/1.inputis aVix.Vips.Image.t(), a path to an image file, or a binary of encoded image data. SeeImage.OCR.Input.optionsis reserved for future use.
Returns
{:ok, text}on success, wheretextis the recognised UTF-8 string.{:error, reason}on failure.
Examples
{:ok, ocr} = Image.OCR.new()
{:ok, text} = Image.OCR.read_text(ocr, "page.png")
# Or with a Vix.Vips.Image already in memory:
{:ok, image} = Vix.Vips.Image.new_from_file("page.png")
{:ok, text} = Image.OCR.read_text(ocr, image)
@spec recognize(t(), Image.OCR.Input.t(), keyword()) :: {:ok, [word_result()]} | {:error, term()}
Recognises input and returns each word with its confidence and bounding
box.
Arguments
instanceis an%Image.OCR{}struct returned bynew/1.inputis any value accepted byread_text/3.optionsis reserved for future use.
Returns
{:ok, [word_result]}where each entry is a map with:text,:confidence(0–100), and:bbox({x1, y1, x2, y2}) keys.{:error, reason}on failure.
@spec tesseract_version() :: String.t()
Returns the linked Tesseract library version as a string.
Returns
- A string such as
"5.5.1".