PhoenixKitCatalogue.Schemas.PdfExtraction (PhoenixKitCatalogue v0.1.17)

Copy Markdown View Source

Extraction state for one unique PDF file content.

Keyed by file_uuid (PK) — one row per unique phoenix_kit_files.uuid, regardless of how many times that content was uploaded under different filenames. The worker's state machine lives here, not on Pdf, so two uploads of the same content share one extraction job + one extracted page set.

Status flow: pending → extracting → extracted | scanned_no_text | failed. Cascades on the file row's hard delete.

Summary

Types

t()

@type t() :: %PhoenixKitCatalogue.Schemas.PdfExtraction{
  __meta__: term(),
  error_message: term(),
  extracted_at: term(),
  extraction_status: term(),
  file_uuid: term(),
  inserted_at: term(),
  page_count: term(),
  updated_at: term()
}

Functions

changeset(extraction, attrs)

@spec changeset(
  t()
  | %PhoenixKitCatalogue.Schemas.PdfExtraction{
      __meta__: term(),
      error_message: term(),
      extracted_at: term(),
      extraction_status: term(),
      file_uuid: term(),
      inserted_at: term(),
      page_count: term(),
      updated_at: term()
    },
  map()
) :: Ecto.Changeset.t(t())

status_changeset(extraction, attrs)

@spec status_changeset(t(), map()) :: Ecto.Changeset.t(t())

statuses()

@spec statuses() :: [String.t()]