Ragex.Analysis.Duplication
(Ragex v0.11.0)
View Source
Code duplication detection using two complementary approaches.
Primary: AST-Based Detection (via Metastatic)
Delegates to Metastatic.Analysis.Duplication for precise clone detection:
- Type I: Exact clones (identical AST)
- Type II: Renamed clones (same structure, different identifiers)
- Type III: Near-miss clones (similar structure with modifications)
- Type IV: Semantic clones (different syntax, same behavior)
Works across different programming languages by comparing MetaAST representations.
Secondary: Embedding-Based Detection
Uses existing semantic embeddings to find similar functions:
- Semantic similarity via cosine distance
- Configurable similarity threshold (default: 0.95)
- Complements AST-based detection
- Useful for finding "code smells" and refactoring opportunities
Usage
alias Ragex.Analysis.Duplication
# AST-based detection (via Metastatic)
{:ok, result} = Duplication.detect_in_files(["lib/a.ex", "lib/b.ex"])
# Embedding-based detection
{:ok, similar} = Duplication.find_similar_functions(threshold: 0.95)
# Detect in directory
{:ok, results} = Duplication.detect_in_directory("lib/")
Summary
Functions
Detects duplicates between two files using Metastatic's AST comparison.
Detects duplicates in all supported files within a directory.
Detects duplicates across multiple files.
Finds code duplicates in a directory.
Finds similar functions using semantic embeddings.
Generates a duplication report for a project.
Types
@type clone_pair() :: %{ file1: String.t(), file2: String.t(), clone_type: clone_type(), similarity: float(), details: map() }
@type clone_type() :: :type_i | :type_ii | :type_iii | :type_iv
@type function_ref() :: {:function, module(), atom(), non_neg_integer()}
@type similar_pair() :: %{ function1: function_ref(), function2: function_ref(), similarity: float(), method: :embedding | :ast }
Functions
@spec detect_between_files(String.t(), String.t(), keyword()) :: {:ok, Metastatic.Analysis.Duplication.Result.t()} | {:error, term()}
Detects duplicates between two files using Metastatic's AST comparison.
Parameters
file1_path- Path to first filefile2_path- Path to second fileopts- Keyword list of options:threshold- Similarity threshold for Type III (default: 0.8):min_tokens- Minimum tokens for detection (default: 5):cross_language- Enable cross-language detection (default: true)
Returns
{:ok, result}- Metastatic.Analysis.Duplication.Result struct{:error, reason}- Error if analysis fails
Examples
{:ok, result} = Duplication.detect_between_files("lib/a.ex", "lib/b.ex")
if result.duplicate? do
IO.puts("Found #{result.clone_type} clone")
end
@spec detect_in_directory( String.t(), keyword() ) :: {:ok, [clone_pair()]} | {:error, term()}
Detects duplicates in all supported files within a directory.
Recursively scans the directory for supported file types and detects duplicates using Metastatic's AST comparison.
Parameters
directory- Path to directoryopts- Keyword list of options:recursive- Recursively scan subdirectories (default: true):threshold- Similarity threshold (default: 0.8):exclude_patterns- List of patterns to exclude (default: ["_build", "deps", ".git"])
Returns
{:ok, [clone_pair]}- List of detected clone pairs{:error, reason}- Error if analysis fails
Examples
{:ok, clones} = Duplication.detect_in_directory("lib/")
IO.puts("Found #{length(clones)} duplicate pairs")
@spec detect_in_files( [String.t()], keyword() ) :: {:ok, [clone_pair()]} | {:error, term()}
Detects duplicates across multiple files.
Returns a list of clone pairs found across the provided files.
Parameters
file_paths- List of file paths to analyzeopts- Keyword list of options (same as detect_between_files/3):ai_analyze- Use AI for semantic analysis (default: from config)
Returns
{:ok, [clone_pair]}- List of detected clone pairs{:error, reason}- Error if analysis fails
Examples
{:ok, clones} = Duplication.detect_in_files(["lib/a.ex", "lib/b.ex", "lib/c.ex"])
Enum.each(clones, fn clone ->
IO.puts("#{clone.file1} <-> #{clone.file2}: #{clone.clone_type}")
end)
Finds code duplicates in a directory.
Alias for detect_in_directory/2. Provided for API consistency with mix tasks.
Examples
{:ok, duplicates} = Duplication.find_duplicates("lib/", threshold: 0.85)
@spec find_similar_functions(keyword()) :: {:ok, [similar_pair()]} | {:error, term()}
Finds similar functions using semantic embeddings.
This is a complementary approach to AST-based detection. Uses cosine similarity on function embeddings to find semantically similar code.
Parameters
opts- Keyword list of options:threshold- Similarity threshold (0.0-1.0, default: 0.95):limit- Maximum number of pairs to return (default: 100):node_type- Type of node to compare (default: :function)
Returns
{:ok, [similar_pair]}- List of similar function pairs{:error, reason}- Error if analysis fails
Examples
{:ok, similar} = Duplication.find_similar_functions(threshold: 0.95)
Enum.each(similar, fn pair ->
IO.puts("#{format_function(pair.function1)} ~ #{format_function(pair.function2)}")
IO.puts(" Similarity: #{pair.similarity}")
end)
Generates a duplication report for a project.
Combines both AST-based and embedding-based detection to provide a comprehensive view of code duplication.
Parameters
directory- Path to project directoryopts- Keyword list of options:include_embeddings- Include embedding-based results (default: true):format- Output format (:summary, :detailed, :json, default: :summary)
Returns
{:ok, report}- Duplication report map{:error, reason}- Error if analysis fails