Rindle.Workers.MuxSyncProviderAsset (Rindle v0.1.6)

Copy Markdown View Source

Per-row defensive sync for media_provider_assets rows that may have missed a webhook. Called by Rindle.Workers.MuxSyncCoordinator (Phase 34 ships the cron coordinator; Phase 35 wires up webhook-driven sync).

Job Arguments

%{"provider_asset_id" => mux_asset_id}

Behavior

  1. Fetch row by provider_asset_id.
  2. If the row is past the stuck threshold, transition to :errored with last_sync_error: "stuck in :<state> past threshold" and emit [:rindle, :provider, :sync, :stuck].
  3. Otherwise, call Rindle.Streaming.Provider.Mux.get_asset/1 and reconcile FSM/playback_ids. Emit [:rindle, :provider, :sync, :resolved].
  4. If Mux returns 404, transition to :errored with reason "mux asset not found" and emit :resolved (the row IS now reconciled with reality — there is no asset to wait for).

Telemetry Contract

  • [:rindle, :provider, :sync, :resolved] — fires on every successful get_asset/1 call (whether or not a state change occurred).

    measurements: %{system_time}
    metadata:     %{profile, provider, asset_id, provider_state, age_ms, no_change}
  • [:rindle, :provider, :sync, :stuck] — fires when the row's updated_at exceeds :provider_stuck_threshold_seconds (default 7200). Same metadata shape; provider_state reflects the row's final :errored state.

metadata.asset_id is the redacted last-4-char tag of the provider_asset_id (security invariant 14, via Rindle.Domain.MediaProviderAsset.redact_id/1).

age_ms semantics across :resolved and :stuck (WR-03 / POLISH-01/D-13)

metadata.age_ms is ALWAYS "time since the row's updated_at", never "time since the last sync attempt". Because a no-op :resolved event (live state already matches the row — see metadata.no_change: true) performs no DB write, updated_at is unchanged and age_ms keeps growing while syncs are succeeding. The same metric therefore carries two operational semantics depending on the event/branch:

  • :resolved with no_change: false — a transition just happened, so age_ms measures the age of the row before this sync resolved it.
  • :resolved with no_change: true — nothing changed, so age_ms reflects "time since the last actual state change", which legitimately balloons even though sync is healthy. Dashboards MUST gate on no_change (or use :stuck) before treating a large age_ms as a liveness problem.
  • :stuckage_ms IS the threshold-driving liveness metric (the row exceeded :provider_stuck_threshold_seconds).

Filter :resolved events by no_change before alerting on age_ms; the :stuck event is the canonical staleness signal.