Runs LLM-as-judge evaluations over dataset items.
--dataset
mix scoria.eval --dataset 00000000-0000-0000-0000-000000000000