Credence.Rule.UnnecessaryGraphemeChunking
(credence v0.2.0)
Copy Markdown
Detects inefficient string transformation pipelines that:
- Convert a UTF-8 binary into graphemes or codepoints
- Perform chunking or grouping operations on the resulting list
- Immediately reconstruct strings from those chunks
This pattern often indicates unnecessary intermediate allocations: binary → list → list of lists → binary
While correct, this transformation is usually avoidable and can often be replaced with a more direct sliding-window or binary-based approach.
Why this is a problem
Elixir strings are UTF-8 binaries. Converting them into grapheme lists:
String.graphemes("café")
# => ["c", "a", "f", "é"]creates a full intermediate structure in memory. If we then chunk and rebuild strings, we are effectively doing:
binary → list → list of lists → binarieswhich increases:
- memory usage (multiple allocations)
- CPU cost (repeated traversal)
- garbage collection pressure
Example (flagged)
string
|> String.graphemes()
|> Enum.chunk_every(3, 1, :discard)
|> Enum.map(&Enum.join/1)This:
- expands the entire string into a list
- builds overlapping sublists
- reconstructs each substring separately
Better alternatives
1. Direct binary slicing (preferred when valid)
for i <- 0..String.length(string) - n do
String.slice(string, i, n)
end2. Single grapheme conversion (if Unicode safety is required)
graphemes = String.graphemes(string)
for i <- 0..(length(graphemes) - n) do
graphemes
|> Enum.slice(i, n)
|> Enum.join()
end3. Algorithmic restructuring
In many cases, substring generation is not needed at all and can be replaced with a streaming or incremental computation.
When NOT to flag
- Small input sizes where clarity is more important than performance
- One-off transformations in scripts or tests
- Cases where grapheme correctness is explicitly required and simplicity is preferred