Image.OCR.Tessdata (image_ocr v0.2.0)

Copy Markdown View Source

Helpers for resolving and managing Tesseract trained-data (tessdata) files.

Trained-data files (<lang>.traineddata) live in a directory that Tesseract reads at initialisation time. Image.OCR resolves that directory in the following order:

  1. The :datapath option passed to Image.OCR.new/1.

  2. The :tessdata_path application environment value:

    config :image_ocr, tessdata_path: "/var/lib/image_ocr/tessdata"
  3. The TESSDATA_PREFIX operating-system environment variable.

  4. The vendored fallback at priv/tessdata/ inside the :image_ocr package.

See Mix.Tasks.Image.Ocr.Tessdata.Add and friends for managing the contents of a configured directory.

Summary

Functions

Returns the absolute path to the directory in which trained-data files are read from and written to.

Returns true when language has a trained-data file in the resolved trained-data directory.

Returns the list of language codes installed in the resolved trained-data directory.

Returns the absolute path to the trained-data file for language inside the resolved trained-data directory.

Returns the absolute path to the directory of trained-data shipped with the image_ocr package.

Functions

datapath(options \\ [])

@spec datapath(keyword()) :: String.t()

Returns the absolute path to the directory in which trained-data files are read from and written to.

Arguments

  • options is an optional keyword list. See the options below.

Options

  • :datapath is an explicit path that overrides every other lookup. When nil (the default) the standard resolution order is used.

Returns

  • A string containing the absolute path to the trained-data directory.

Examples

iex> path = Image.OCR.Tessdata.datapath()
iex> File.dir?(path)
true

installed?(language, options \\ [])

@spec installed?(
  String.t(),
  keyword()
) :: boolean()

Returns true when language has a trained-data file in the resolved trained-data directory.

installed_languages(options \\ [])

@spec installed_languages(keyword()) :: [String.t()]

Returns the list of language codes installed in the resolved trained-data directory.

Arguments

  • options is an optional keyword list. See datapath/1 for the supported options.

Returns

  • A list of language code strings (for example ["eng", "fra"]) sorted alphabetically. Returns [] when the directory does not exist.

Examples

iex> "eng" in Image.OCR.Tessdata.installed_languages()
true

language_file(language, options \\ [])

@spec language_file(
  String.t(),
  keyword()
) :: String.t()

Returns the absolute path to the trained-data file for language inside the resolved trained-data directory.

Arguments

  • language is a language code string such as "eng" or "fra".

  • options is an optional keyword list. See datapath/1 for the supported options.

Returns

  • A string containing the absolute path. The file is not guaranteed to exist.

vendored_path()

@spec vendored_path() :: String.t()

Returns the absolute path to the directory of trained-data shipped with the image_ocr package.

Returns

  • A string containing the absolute path to the vendored trained-data directory.

Examples

iex> Image.OCR.Tessdata.vendored_path() |> String.ends_with?("priv/tessdata")
true