Credence.Pattern.PreferGraphemesForCharacterUniqueness (credence v0.8.0)

Copy Markdown

Readability & correctness rule: Detects the pattern String.to_charlist(s) |> Enum.uniq() |> Enum.count() |> (&(&1 == String.length(s))).().

String.to_charlist/1 decomposes into codepoints (integers), while String.length/1 counts graphemes (whole characters). For decomposed Unicode (e.g. "é" = "e" + U+0301) these differ — the codepoint count is higher than the grapheme count — so the comparison is wrong. Using String.graphemes/1 instead keeps both sides grapheme-level.

Under the single_codepoint_graphemes assumption (every grapheme is exactly one codepoint), the two decompositions agree, so the rewrite is safe.

The fix also modernises the capture call idiom from (&expr).() to then(&expr), which reads more naturally in a pipeline.

Bad (only rewritten while single_codepoint_graphemes is on)

String.to_charlist(s) |> Enum.uniq() |> Enum.count() |> (&(&1 == String.length(s))).()

Good

String.graphemes(s) |> Enum.uniq() |> Enum.count() |> then(&(&1 == String.length(s)))