Per-row defensive sync for media_provider_assets rows that may have
missed a webhook. Called by Rindle.Workers.MuxSyncCoordinator (Phase 34
ships the cron coordinator; Phase 35 wires up webhook-driven sync).
Job Arguments
%{"provider_asset_id" => mux_asset_id}Behavior
- Fetch row by
provider_asset_id. - If the row is past the stuck threshold, transition to
:erroredwithlast_sync_error: "stuck in :<state> past threshold"and emit[:rindle, :provider, :sync, :stuck]. - Otherwise, call
Rindle.Streaming.Provider.Mux.get_asset/1and reconcile FSM/playback_ids. Emit[:rindle, :provider, :sync, :resolved]. - If Mux returns 404, transition to
:erroredwith reason"mux asset not found"and emit:resolved(the row IS now reconciled with reality — there is no asset to wait for).
Telemetry Contract
[:rindle, :provider, :sync, :resolved]— fires on every successfulget_asset/1call (whether or not a state change occurred).measurements: %{system_time} metadata: %{profile, provider, asset_id, provider_state, age_ms, no_change}[:rindle, :provider, :sync, :stuck]— fires when the row'supdated_atexceeds:provider_stuck_threshold_seconds(default 7200). Same metadata shape;provider_statereflects the row's final:erroredstate.
metadata.asset_id is the redacted last-4-char tag of the
provider_asset_id (security invariant 14, via
Rindle.Domain.MediaProviderAsset.redact_id/1).
age_ms semantics across :resolved and :stuck (WR-03 / POLISH-01/D-13)
metadata.age_ms is ALWAYS "time since the row's updated_at", never
"time since the last sync attempt". Because a no-op :resolved event
(live state already matches the row — see metadata.no_change: true)
performs no DB write, updated_at is unchanged and age_ms keeps growing
while syncs are succeeding. The same metric therefore carries two
operational semantics depending on the event/branch:
:resolvedwithno_change: false— a transition just happened, soage_msmeasures the age of the row before this sync resolved it.:resolvedwithno_change: true— nothing changed, soage_msreflects "time since the last actual state change", which legitimately balloons even though sync is healthy. Dashboards MUST gate onno_change(or use:stuck) before treating a largeage_msas a liveness problem.:stuck—age_msIS the threshold-driving liveness metric (the row exceeded:provider_stuck_threshold_seconds).
Filter :resolved events by no_change before alerting on age_ms; the
:stuck event is the canonical staleness signal.