Use this after the README quickstart and before treating audit capture as production-ready. It complements brownfield-continuity.md for existing data.
For host staging / pooler parity (STG-01–STG-03), use guides/adoption-pilot-backlog.md as the in-repo matrix and rubric: fixed-field topology (STG-HOST-TOPOLOGY-TEMPLATE) plus audited HTTP/job paths with honest status columns under STG-AUDITED-PATH-RUBRIC. Copy rows into issues when something fails; keep evidence pointers redacted and link out to integrator-controlled detail.
1. Capture and triggers
- [ ]
mix threadline.installandmix threadline.gen.triggersmigrations applied in the target environment. - [ ]
MIX_ENVmatches between trigger regeneration and runtime (mix threadline.gen.triggersloadsapp.config). - [ ]
config :threadline, :verify_coverage, expected_tables: [...]lists every audited table;mix threadline.verify_coveragepasses in CI and on a production-like host. - [ ] Run
Threadline.Health.trigger_coverage/1after deploys, schema changes, and on a periodic cadence you trust; each{:covered, _}/{:uncovered, _}tuple names one public user table from the same catalogverify_coveragereads — full interpretation:domain-reference.md#trigger-coverage-operational. - [ ]
mix threadline.verify_coverageonly fails CI when anexpected_tablesname is missing triggers or uncovered;{:uncovered, _}on other tables is informational. Audit catalog tablesaudit_transactions,audit_changes, andaudit_actionsare excluded fromHealth’s per-table list by design (same link). - [ ]
Threadline.Health.trigger_coverage/1is wired into health checks or release checks where you need fast failure on drift.
Coverage drift visibility
Threadline's strongest production posture comes from making coverage drift impossible to miss. After mounting the operator surface and configuring triggers, verify:
- [ ] Surface header pill renders on every LV — visit any operator-surface page and confirm the badge shows either "All covered" (green-muted) or "{N} uncovered" (amber). The badge link goes to
/audit/coverage. - [ ] Coverage dashboard responds at
/audit/coverage— the page renders three buckets (covered / uncovered / expected) with a 30-second polling default. - [ ] Mix-task parity for capture-only paths —
mix threadline.health.coverageprints the same data;mix threadline.health.coverage --jsonfor machine consumption. - [ ] Adopter-declared expected-uncovered set — if you use Oban, vendor add-ons, or non-Threadline bookkeeping tables, declare them in
config :threadline, :health, expected_uncovered_tables: [...]. RunThreadline.Health.Policy.validate!/1at boot to fail loudly on typos. - [ ] Telemetry alert on failure — subscribe to
[:threadline, :health, :checked, :error]so sustained polling failures (e.g. DB connection issues) page someone instead of silently freezing the dashboard at the last-good count.
See also guides/operator-surface.md §"Coverage dashboard".
2. Actor bridge and semantics
- [ ] Request paths set
threadline.actor_refinside the sameEcto.Multi/Repo.transactionas audited writes (transaction-local GUC; safe under PgBouncer transaction pooling — see README PgBouncer section). - [ ] Background jobs use
Threadline.Job(or equivalent) so jobs and HTTP requests both attribute actors consistently. - [ ] Where you need intent beyond row diffs,
Threadline.record_action/2is called with:repoand a validActorRef.
3. Redaction and sensitive columns
- [ ]
config :threadline, :trigger_capture, tables: %{"users" => [exclude: ..., mask: ...]}reviewed with security; no column in bothexcludeandmask. - [ ]
mix threadline.gen.triggers --dry-runused after config changes; migrations applied before relying on new trigger SQL. - [ ] Visit
/audit/policy/redactionafter deploys or config changes; confirm the affected tables land inConfig matches deployed, notDrift detectedorCould not introspect. - [ ] Capture-only path checked too:
mix threadline.policy.showfor human output,mix threadline.policy.show --jsonfor machine checks or incident tooling. - [ ] If any table shows
Drift detectedorCould not introspect, rerunmix threadline.gen.triggers, apply the generated migration, and re-check before declaring the rollout aligned. - [ ] Confirm the redaction viewer stays safe for operator screenshots and incident notes: it should show only column names and placeholder metadata, never sample values.
- [ ] JSON/JSONB columns: remember masking replaces the whole value (no field-level redaction in current releases).
4. Retention and purge
- [ ]
config :threadline, :retentionvalidated (keep_daysormax_age_seconds, not both; positive window). - [ ] Destructive purge only with
enabled: trueafter ops sign-off; alwaysmix threadline.retention.purge --dry-runfirst. - [ ] Production:
MIX_ENV=prod mix threadline.retention.purge --execute(requires explicit--execute). - [ ] Batch size and
max_batchestuned so each run finishes under lock/latency budgets; schedule often enough that volume per run stays bounded. - [ ] Backups / point-in-time recovery: purges are permanent deletes of
audit_changes(and optionally emptyaudit_transactions); align retention with compliance needs. - [ ] Index strategy for audit tables (baseline vs optional btree/GIN) reviewed with your DBA path; see
audit-indexing.mdfor shipped index names, timeline/export join semantics, and evidence-first additive patterns.
Volume, growth, and purge cadence
- Treat
audit_changes(and related storage) as a monotonically growing dataset until retention runs; chart table size and free space alongside application traffic so growth surprises surface before purge latency spikes. - Schedule purges often enough that each run finishes well inside the configured
max_batchesouter loop — if you routinely hit the cap, eligible rows remain until the next run; lowering per-pass volume (smaller--batch-size/batch_size) or running more frequently is safer than silently leaving a long tail of old rows. - Start
batch_sizenear 500 (theThreadline.Retention.purge/1default), then adjust with lock wait, statement duration, and capture concurrency in mind; the Mix task maps--batch-size/--max-batchesto the same options. Threadline.Retention.Policyis the validated view ofconfig :threadline, :retention; callThreadline.Retention.purge/1with a requiredrepo:keyword (and optionaldry_run:,batch_size,max_batches,cutoff:) from automation, or usemix threadline.retention.purge: always--dry-runfirst, then productionMIX_ENV=prod mix threadline.retention.purge --executeonly after ops sign-off — untilenabled: true, programmatic calls return{:error, :disabled}and the Mix task raises.- Monitor each run: Mix and library logs include batch indices and cumulative
deleted_changes(and empty-transaction counts when applicable); track wall-clock duration per run and whether the final summary shows unusedmax_batchesheadroom. - Cutoff clock, orphan
audit_transactions, and empty-parent semantics stay indomain-reference.md— Retention (Phase 13) — do not fork a second spec in this checklist.
5. Export and investigation
- [ ] Exports use the same filter keys as
Threadline.timeline/2(:repo,:table,:actor_ref,:from,:to,:correlation_idonly). Unknown keys raiseArgumentErrorwith a message pointing atThreadline.Query. - [ ] Large exports: respect default
max_rowsandtruncatedmetadata, or useThreadline.Export.stream_changes/2withStream.take/2intentionally. - [ ] Retention vs filters:
Threadline.timeline/2,mix threadline.export, and correlation-heavy playbooks only return rows that still exist after your purge windows — align:from/:to,max_rows, streaming (Threadline.Export.stream_changes/2), and:correlation_idinvestigations with the policy in §4 Retention and purge; export behavior details live indomain-reference.md— Export (Phase 14). - [ ] Operator Surface: If you mount the LiveView operator UI, ensure it is protected behind an authenticated admin pipeline with a strict
:authorize_fnpolicy. See the Operator Surface guide for fail-closed requirements. - [ ] Policy drift review: add
/audit/policy/redactionormix threadline.policy.showto the same post-deploy operational pass where you already check trigger coverage, especially after redaction config changes.
6. Observability
- [ ]
:telemetryhandlers for Threadline events are attached where you need metrics or logs. Event names and measurements:domain-reference.md— Telemetry; per-event narrative and how health counts relate to coverage checks:domain-reference.md#trigger-coverage-operational. - [ ] Retention purge logs (
threadline retention purge batch, etc.) visible to operators when purge runs.
7. Brownfield and continuity
- [ ] If tables already had rows before capture: read
brownfield-continuity.md; runmix threadline.continuitywhere applicable; document the honest “gap until first audited write” for stakeholders.
Support incident queries
Incident queries assume audit rows still within the retained window — aggressive purges can make historical answers empty; reconcile timelines with retention and purge before escalating missing data.
Pre-launch: confirm operators can answer the five canonical support questions (see domain-reference.md for full SQL and API notes). For a skimmable “which public API first?” map before diving into playbooks, see domain-reference.md — Exploration API routing.
| Question (1-line) | API / Mix | SQL |
|---|---|---|
| 1. Row history — PK in a time window | Threadline.history/3, Threadline.Query.timeline/2 | Golden query in domain reference |
| 2. Actor window — one actor across tables | Threadline.actor_history/2, timeline/2 + :actor_ref | Golden query |
3. Correlation bundle — shared correlation_id | timeline/2, mix threadline.export + :correlation_id | Inner-join SQL + strict semantics |
| 4. Export parity — same filters as timeline | Threadline.Export, mix threadline.export | Filter vocabulary |
| 5. Action ↔ capture — link semantics to rows | Threadline.record_action/2, action_id | Join pattern |
| 6. Single transaction incident drill-down | Start with domain-reference.md — Exploration API routing | Then use the bundled incident story in incident-playbook.md |
- [ ] Q1 — Row history: Read row history playbook (
audit_changes,audit_transactions, boundedcaptured_at). - [ ] Q2 — Actor window: Read actor window playbook (
actor_refJSON, time bounds). - [ ] Q3 — Correlation: Read correlation bundle playbook — with
:correlation_id, timeline/export return only changes whose transaction inner-joins anaudit_actionsrow with that correlation (no orphan capture rows). - [ ] Q4 — Export parity: Read export parity notes — same keys as
Threadline.Query.timeline/2. - [ ] Q5 — Action ↔ capture: Read action/capture join (
audit_actions,action_id,audit_changes). - [ ] Q6 — Single transaction incident drill-down: Start with exploration routing, then follow the bundled incident path in incident-playbook.md.
See also
- Adoption pilot backlog — matrix to run this checklist in a real environment and file issues with evidence.
- Domain reference — schema, retention semantics, export behavior.
- HexDocs —
Threadline,Threadline.Export,Threadline.Retention,Threadline.Query.