Credence.Pattern.UnnecessaryGraphemeChunking.Unfixable
(credence v0.5.0)
Copy Markdown
Detects inefficient string transformation pipelines that convert strings to graphemes or codepoints, perform chunking or grouping, and reconstruct strings from the result. These patterns cannot be automatically fixed.
This rule catches variants NOT covered by the fixable
UnnecessaryGraphemeChunking rule, including:
- Using
String.codepoints/1instead ofString.graphemes/1(String.slice/3operates on graphemes, not codepoints) - Using
Enum.chunk_by/2(predicate-based grouping, not a sliding window) - Using
Enum.split/2(splits at an index, not a chunking operation) - Using
Enum.chunk_everywith step != 1 (non-standard window stride) - Using
Enum.chunk_everywith:trimleftover (includes incomplete trailing chunks, whichString.slice-based replacement would drop) - Using a map function other than
Enum.join/1
Why this is a problem
Elixir strings are UTF-8 binaries. Converting them into grapheme lists:
String.graphemes("café")
# => ["c", "a", "f", "é"]creates a full intermediate structure in memory. If we then chunk and rebuild strings, we are effectively doing:
binary → list → list of lists → binarieswhich increases memory usage, CPU cost, and GC pressure.
Recommended alternatives
Direct binary slicing (when possible):
for i <- 0..(String.length(string) - n) do
String.slice(string, i, n)end
Single grapheme conversion (if Unicode safety is required):
graphemes = String.graphemes(string) for i <- 0..(length(graphemes) - n) do
graphemes |> Enum.slice(i, n) |> Enum.join()end
Algorithmic restructuring — in many cases substring generation is not needed at all and can be replaced with streaming or incremental computation.