Image.OCR.Pool (image_ocr v0.2.0)

Copy Markdown View Source

A NimblePool-backed pool of Image.OCR instances for concurrent OCR.

Each pool worker owns one Image.OCR instance — and therefore one tesseract::TessBaseAPI*. Because each instance is single-threaded, the pool is the simplest way to get parallel recognition across many processes without sharing state.

Sizing

The default pool size is System.schedulers_online(), matching the number of dirty-CPU schedulers Tesseract recognition runs on. Each worker holds the loaded language model in memory (typically 2–50 MB depending on the language and trained-data variant), so size the pool deliberately.

Example

children = [
  {Image.OCR.Pool, name: MyOcr, locale: "en", pool_size: 4}
]
Supervisor.start_link(children, strategy: :one_for_one)

{:ok, text} = Image.OCR.Pool.read_text(MyOcr, "page.png")

Summary

Functions

Standard supervisor child spec.

Recognises text in input using a worker checked out from pool.

Recognises input and returns per-word results. See Image.OCR.recognize/3.

Starts a pool linked to the calling process.

Functions

child_spec(options)

@spec child_spec(keyword()) :: Supervisor.child_spec()

Standard supervisor child spec.

read_text(pool, input, options \\ [])

@spec read_text(NimblePool.pool(), Image.OCR.Input.t(), keyword()) ::
  {:ok, String.t()} | {:error, term()}

Recognises text in input using a worker checked out from pool.

Arguments

  • pool is the registered name of the pool.

  • input is any value accepted by Image.OCR.read_text/2.

  • options accepts :timeout (defaults to 30_000 ms).

Returns

  • {:ok, text} on success.

  • {:error, reason} on failure.

recognize(pool, input, options \\ [])

@spec recognize(NimblePool.pool(), Image.OCR.Input.t(), keyword()) ::
  {:ok, [Image.OCR.word_result()]} | {:error, term()}

Recognises input and returns per-word results. See Image.OCR.recognize/3.

start_link(options)

@spec start_link(keyword()) :: GenServer.on_start()

Starts a pool linked to the calling process.

Arguments

  • options is a keyword list. See the options below.

Options

  • :name is the registered name for the pool. Required.

  • :pool_size is the number of worker processes (and therefore Tesseract instances) to run. Defaults to System.schedulers_online().

  • :lazy controls lazy worker initialisation. Defaults to false.

  • Remaining options are passed straight to Image.OCR.new/1 (:locale, :datapath, :psm, :variables).

Returns

  • {:ok, pid} on success.

  • {:error, reason} on failure.