mix image.ocr.tessdata.add (image_ocr v0.2.0)

Copy Markdown View Source

Downloads <language>.traineddata from the upstream tessdata_* GitHub repository into the configured trained-data directory.

Usage

mix image.ocr.tessdata.add LANG [LANG ...] [--variant fast|best|legacy]
                                           [--branch BRANCH]
                                           [--path DIR]
                                           [--source URL]
                                           [--force]

Languages are specified using ISO 639-1 codes (en, fr, de), BCP-47 tags for region/script-specific variants (zh-Hans, zh-Hant, sr-Latn), or Tesseract codes verbatim where ISO 639-1 cannot express the language (frk, osd). See Image.OCR.Languages.

Options

  • --variant — Trained-data variant. One of fast (smallest, fastest;

               ~2-4 MB per language), `best` (most accurate; ~10-15 MB),
               or `legacy` (legacy + LSTM combined; largest). Defaults
               to the `:image_ocr, :default_variant` application config,
               or `fast` if unset.
    
               To install the larger / more accurate English data:
    
                   mix image.ocr.tessdata.add en --variant best
  • --branch — Upstream git branch to fetch from. Defaults to main.

  • --path — Destination directory. Defaults to the value resolved by

               `Image.OCR.Tessdata.datapath/0` (which honours the
               `:image_ocr, :tessdata_path` application config and the
               `TESSDATA_PREFIX` environment variable).
  • --source — Explicit URL to fetch from. Useful for mirrors. Only

               valid when adding a single language.
  • --force — Overwrite an existing trained-data file even when its

               SHA matches the previous fetch.

Examples

mix image.ocr.tessdata.add en
mix image.ocr.tessdata.add fr de --variant best
mix image.ocr.tessdata.add zh-Hans --path /var/lib/tessdata
mix image.ocr.tessdata.add ja zh-Hant ko