V111: PDF library tables for the catalogue module.
Backs the "PDFs" subtab in phoenix_kit_catalogue, layered on top
of core's phoenix_kit_files for binary storage / dedup / soft-delete
/ multi-bucket redundancy. Catalogue owns only the per-page text
index and the user-facing per-upload row.
Tables
phoenix_kit_cat_pdfs— thin per-upload row. One row per "user uploaded this name".file_uuidFK →phoenix_kit_files.uuidON DELETE RESTRICT(catalogue manages the lifecycle; core prune can't remove a file referenced by a live catalogue row). Two uploads of identical content (different filenames) → twophoenix_kit_cat_pdfsrows, one sharedphoenix_kit_filesrow, one shared extraction. Soft-delete viastatussentinel"active"/"trashed"(workspace convention) plustrashed_atfor trashed-at age UI.phoenix_kit_cat_pdf_extractions— keyed byfile_uuidPK (one row per unique PDF content). Holds the worker's state machine (pending → extracting → extracted | scanned_no_text | failed),page_count,extracted_at,error_message. Cascades on the file row's hard delete.phoenix_kit_cat_pdf_page_contents— content-addressed dedup cache. Keyed bycontent_hash(SHA-256 hex of the page's normalized text). Same page text across multiple PDFs (boilerplate, legal disclaimers, cross-referenced product entries) is stored once. The GIN trigram index ontextlives here, so the search index doesn't grow with duplication.phoenix_kit_cat_pdf_pages— per-page join. Composite PK(file_uuid, page_number). References both the file (cascade on file delete) and the page-content cache (restrict; orphaned content rows are GC'd by a catalogue-side helper, not by FK cascade, so the cache doesn't churn during normal upload/delete cycles).
Enables pg_trgm for the trigram index.