Elixir binding of Magika, Google's deep-learning file content type detector.
Magika identifies the content type of a file (e.g. html, python, pdf,
zip) from its bytes, using a small ONNX model run via
OnnxRuntime. It is a faithful port of
the reference Python implementation's standard_v3_3 model and inference
logic.
Usage
The model is loaded once and hosted by a supervised Magika.Server that
starts automatically with the :magika application. Call the API without
threading an instance around:
{:ok, result} = Magika.identify("<!DOCTYPE html>\n<html>...</html>")
result.prediction.output.label #=> "html"
result.prediction.output.mime_type #=> "text/html"
result.prediction.score #=> 0.99...
{:ok, result} = Magika.identify_path("/path/to/file.pdf")
result.prediction.output.label #=> "pdf"Prediction mode
The prediction mode controls how strict Magika is before trusting the model's
guess. The hosted server uses :high_confidence by default; change it in your
application config:
config :magika, prediction_mode: :best_guessThe modes:
:high_confidence(default) — keep the model prediction only when its score clears the per-content-type threshold (falling back to the medium-confidence threshold otherwise).:medium_confidence— keep the model prediction when its score clears the generic medium-confidence threshold.:best_guess— always return the model prediction regardless of score.
When the score is too low for the chosen mode, the output is generalized to
txt (for text content types) or unknown (for binary content types).
Standalone instances (advanced)
You normally don't need this. For one-off scripts or tests you can build an
instance with new/1 and pass it as the first argument, bypassing the
supervised server. A Magika.t() is immutable and safe to reuse:
magika = Magika.new(prediction_mode: :best_guess)
{:ok, result} = Magika.identify(magika, content)A specific named server can also be targeted with the :server option:
{:ok, result} = Magika.identify(content, server: MyApp.Magika)
Summary
Functions
Identifies the content type of the given raw content (a binary).
Identifies the content type of the file at path.
Identifies the content type read from an open binary IO.device/file.
Returns the loaded model's name (the model directory basename).
Creates a new Magika instance, loading the model and configuration.
Types
@type prediction_mode() :: :high_confidence | :medium_confidence | :best_guess
@type t() :: %Magika{ config: Magika.Config.t(), model: OnnxRuntime.Model.t(), prediction_mode: prediction_mode() }
Functions
@spec identify( binary(), keyword() ) :: {:ok, Magika.Result.t()}
@spec identify(t(), binary()) :: {:ok, Magika.Result.t()}
Identifies the content type of the given raw content (a binary).
Resolves the hosted instance from a Magika.Server. Pass server: to target
a specific named server (defaults to Magika.Server). Alternatively, pass a
Magika instance as the first argument to bypass the server entirely.
Always returns {:ok, result} — identification of in-memory bytes cannot
fail the way a filesystem read can.
@spec identify_path( Path.t(), keyword() ) :: {:ok, Magika.Result.t()} | {:error, Magika.Result.t()}
@spec identify_path(t(), Path.t()) :: {:ok, Magika.Result.t()} | {:error, Magika.Result.t()}
Identifies the content type of the file at path.
Resolves the hosted instance from a Magika.Server. Pass server: to target
a specific named server (defaults to Magika.Server). Alternatively, pass a
Magika instance as the first argument to bypass the server entirely.
Returns {:ok, result} on success, or {:error, result} when the path does
not exist or cannot be read. Directories and other special files are reported
via dedicated content types (directory, symlink, unknown).
@spec identify_stream( IO.device(), keyword() ) :: {:ok, Magika.Result.t()}
@spec identify_stream(t(), IO.device()) :: {:ok, Magika.Result.t()}
Identifies the content type read from an open binary IO.device/file.
Resolves the hosted instance from a Magika.Server. Pass server: to target
a specific named server (defaults to Magika.Server). Alternatively, pass a
Magika instance as the first argument to bypass the server entirely.
The whole stream is read into memory (Magika only needs a bounded prefix and suffix, but reading fully keeps the implementation simple and correct). The caller is responsible for opening and closing the device.
Returns the loaded model's name (the model directory basename).
Creates a new Magika instance, loading the model and configuration.
Options
:prediction_mode— one of:high_confidence(default),:medium_confidence,:best_guess.:model_path— path to a custommodel.onnx. Defaults to the vendoredstandard_v3_3model.:model_config_path— path to a customconfig.min.json.:content_types_kb_path— path to a customcontent_types_kb.min.json.