Credence.Pattern.UnnecessaryGraphemeChunking.Unfixable (credence v0.4.0)

Copy Markdown

Detects inefficient string transformation pipelines that convert strings to graphemes or codepoints, perform chunking or grouping, and reconstruct strings from the result. These patterns cannot be automatically fixed.

This rule catches variants NOT covered by the fixable UnnecessaryGraphemeChunking rule, including:

  • Using String.codepoints/1 instead of String.graphemes/1 (String.slice/3 operates on graphemes, not codepoints)
  • Using Enum.chunk_by/2 (predicate-based grouping, not a sliding window)
  • Using Enum.split/2 (splits at an index, not a chunking operation)
  • Using Enum.chunk_every with step != 1 (non-standard window stride)
  • Using Enum.chunk_every with :trim leftover (includes incomplete trailing chunks, which String.slice-based replacement would drop)
  • Using a map function other than Enum.join/1

Why this is a problem

Elixir strings are UTF-8 binaries. Converting them into grapheme lists:

String.graphemes("café")
# => ["c", "a", "f", "é"]

creates a full intermediate structure in memory. If we then chunk and rebuild strings, we are effectively doing:

binary  list  list of lists  binaries

which increases memory usage, CPU cost, and GC pressure.

  1. Direct binary slicing (when possible):

    for i <- 0..(String.length(string) - n) do

     String.slice(string, i, n)

    end

  2. Single grapheme conversion (if Unicode safety is required):

    graphemes = String.graphemes(string) for i <- 0..(length(graphemes) - n) do

     graphemes |> Enum.slice(i, n) |> Enum.join()

    end

  3. Algorithmic restructuring — in many cases substring generation is not needed at all and can be replaced with streaming or incremental computation.