Use this after the README quickstart and before treating audit capture as production-ready. It complements brownfield-continuity.md for existing data.
For host staging / pooler parity (STG-01–STG-03), use guides/adoption-pilot-backlog.md as the in-repo matrix and rubric: fixed-field topology (STG-HOST-TOPOLOGY-TEMPLATE) plus audited HTTP/job paths with honest status columns under STG-AUDITED-PATH-RUBRIC. Copy rows into issues when something fails; keep evidence pointers redacted and link out to integrator-controlled detail.
1. Capture and triggers
- [ ]
mix threadline.installandmix threadline.gen.triggersmigrations applied in the target environment. - [ ]
MIX_ENVmatches between trigger regeneration and runtime (mix threadline.gen.triggersloadsapp.config). - [ ]
config :threadline, :verify_coverage, expected_tables: [...]lists every audited table;mix threadline.verify_coveragepasses in CI and on a production-like host. - [ ] Run
Threadline.Health.trigger_coverage/1after deploys, schema changes, and on a periodic cadence you trust; each{:covered, _}/{:uncovered, _}tuple names one public user table from the same catalogverify_coveragereads — full interpretation:domain-reference.md#trigger-coverage-operational. - [ ]
mix threadline.verify_coverageonly fails CI when anexpected_tablesname is missing triggers or uncovered;{:uncovered, _}on other tables is informational. Audit catalog tablesaudit_transactions,audit_changes, andaudit_actionsare excluded fromHealth’s per-table list by design (same link). - [ ]
Threadline.Health.trigger_coverage/1is wired into health checks or release checks where you need fast failure on drift.
2. Actor bridge and semantics
- [ ] Request paths set
threadline.actor_refinside the sameEcto.Multi/Repo.transactionas audited writes (transaction-local GUC; safe under PgBouncer transaction pooling — see README PgBouncer section). - [ ] Background jobs use
Threadline.Job(or equivalent) so jobs and HTTP requests both attribute actors consistently. - [ ] Where you need intent beyond row diffs,
Threadline.record_action/2is called with:repoand a validActorRef.
3. Redaction and sensitive columns
- [ ]
config :threadline, :trigger_capture, tables: %{"users" => [exclude: ..., mask: ...]}reviewed with security; no column in bothexcludeandmask. - [ ]
mix threadline.gen.triggers --dry-runused after config changes; migrations applied before relying on new trigger SQL. - [ ] JSON/JSONB columns: remember masking replaces the whole value (no field-level redaction in current releases).
4. Retention and purge
- [ ]
config :threadline, :retentionvalidated (keep_daysormax_age_seconds, not both; positive window). - [ ] Destructive purge only with
enabled: trueafter ops sign-off; alwaysmix threadline.retention.purge --dry-runfirst. - [ ] Production:
MIX_ENV=prod mix threadline.retention.purge --execute(requires explicit--execute). - [ ] Batch size and
max_batchestuned so each run finishes under lock/latency budgets; schedule often enough that volume per run stays bounded. - [ ] Backups / point-in-time recovery: purges are permanent deletes of
audit_changes(and optionally emptyaudit_transactions); align retention with compliance needs. - [ ] Index strategy for audit tables (baseline vs optional btree/GIN) reviewed with your DBA path; see
audit-indexing.mdfor shipped index names, timeline/export join semantics, and evidence-first additive patterns.
Volume, growth, and purge cadence
- Treat
audit_changes(and related storage) as a monotonically growing dataset until retention runs; chart table size and free space alongside application traffic so growth surprises surface before purge latency spikes. - Schedule purges often enough that each run finishes well inside the configured
max_batchesouter loop — if you routinely hit the cap, eligible rows remain until the next run; lowering per-pass volume (smaller--batch-size/batch_size) or running more frequently is safer than silently leaving a long tail of old rows. - Start
batch_sizenear 500 (theThreadline.Retention.purge/1default), then adjust with lock wait, statement duration, and capture concurrency in mind; the Mix task maps--batch-size/--max-batchesto the same options. Threadline.Retention.Policyis the validated view ofconfig :threadline, :retention; callThreadline.Retention.purge/1with a requiredrepo:keyword (and optionaldry_run:,batch_size,max_batches,cutoff:) from automation, or usemix threadline.retention.purge: always--dry-runfirst, then productionMIX_ENV=prod mix threadline.retention.purge --executeonly after ops sign-off — untilenabled: true, programmatic calls return{:error, :disabled}and the Mix task raises.- Monitor each run: Mix and library logs include batch indices and cumulative
deleted_changes(and empty-transaction counts when applicable); track wall-clock duration per run and whether the final summary shows unusedmax_batchesheadroom. - Cutoff clock, orphan
audit_transactions, and empty-parent semantics stay indomain-reference.md— Retention (Phase 13) — do not fork a second spec in this checklist.
5. Export and investigation
- [ ] Exports use the same filter keys as
Threadline.timeline/2(:repo,:table,:actor_ref,:from,:to,:correlation_idonly). Unknown keys raiseArgumentErrorwith a message pointing atThreadline.Query. - [ ] Large exports: respect default
max_rowsandtruncatedmetadata, or useThreadline.Export.stream_changes/2withStream.take/2intentionally. - [ ] Retention vs filters:
Threadline.timeline/2,mix threadline.export, and correlation-heavy playbooks only return rows that still exist after your purge windows — align:from/:to,max_rows, streaming (Threadline.Export.stream_changes/2), and:correlation_idinvestigations with the policy in §4 Retention and purge; export behavior details live indomain-reference.md— Export (Phase 14). - [ ] Operator Surface: If you mount the LiveView operator UI, ensure it is protected behind an authenticated admin pipeline with a strict
:authorize_fnpolicy. See the Operator Surface guide for fail-closed requirements.
6. Observability
- [ ]
:telemetryhandlers for Threadline events are attached where you need metrics or logs. Event names and measurements:domain-reference.md— Telemetry; per-event narrative and how health counts relate to coverage checks:domain-reference.md#trigger-coverage-operational. - [ ] Retention purge logs (
threadline retention purge batch, etc.) visible to operators when purge runs.
7. Brownfield and continuity
- [ ] If tables already had rows before capture: read
brownfield-continuity.md; runmix threadline.continuitywhere applicable; document the honest “gap until first audited write” for stakeholders.
Support incident queries
Incident queries assume audit rows still within the retained window — aggressive purges can make historical answers empty; reconcile timelines with retention and purge before escalating missing data.
Pre-launch: confirm operators can answer the five canonical support questions (see domain-reference.md for full SQL and API notes). For a skimmable “which public API first?” map before diving into playbooks, see domain-reference.md — Exploration API routing.
| Question (1-line) | API / Mix | SQL |
|---|---|---|
| 1. Row history — PK in a time window | Threadline.history/3, Threadline.Query.timeline/2 | Golden query in domain reference |
| 2. Actor window — one actor across tables | Threadline.actor_history/2, timeline/2 + :actor_ref | Golden query |
3. Correlation bundle — shared correlation_id | timeline/2, mix threadline.export + :correlation_id | Inner-join SQL + strict semantics |
| 4. Export parity — same filters as timeline | Threadline.Export, mix threadline.export | Filter vocabulary |
| 5. Action ↔ capture — link semantics to rows | Threadline.record_action/2, action_id | Join pattern |
| 6. Single transaction incident drill-down | Start with domain-reference.md — Exploration API routing | Then use the bundled incident story in incident-playbook.md |
- [ ] Q1 — Row history: Read row history playbook (
audit_changes,audit_transactions, boundedcaptured_at). - [ ] Q2 — Actor window: Read actor window playbook (
actor_refJSON, time bounds). - [ ] Q3 — Correlation: Read correlation bundle playbook — with
:correlation_id, timeline/export return only changes whose transaction inner-joins anaudit_actionsrow with that correlation (no orphan capture rows). - [ ] Q4 — Export parity: Read export parity notes — same keys as
Threadline.Query.timeline/2. - [ ] Q5 — Action ↔ capture: Read action/capture join (
audit_actions,action_id,audit_changes). - [ ] Q6 — Single transaction incident drill-down: Start with exploration routing, then follow the bundled incident path in incident-playbook.md.
See also
- Adoption pilot backlog — matrix to run this checklist in a real environment and file issues with evidence.
- Domain reference — schema, retention semantics, export behavior.
- HexDocs —
Threadline,Threadline.Export,Threadline.Retention,Threadline.Query.