ExDNA.Detection.Fuzzy (ExDNA v1.5.2)

Copy Markdown View Source

Type-III (near-miss) clone detection.

Uses an inverted index on structural sub-hashes for candidate-pair generation, with MinHash acceleration for large posting lists.

Each fragment carries a set of lightweight sub-hashes from its child subtrees (computed during fingerprinting). The inverted index identifies all pairs sharing at least one sub-hash. Small posting lists use exact Jaccard similarity as a pre-filter; large ones (above @lsh_cutover) switch to an O(k) MinHash estimate, avoiding quadratic blowup without the recall loss of a hard posting-list cap.

Summary

Functions

Find Type-III clones from a list of fragments at the given similarity threshold.

Types

fuzzy_opts()

@type fuzzy_opts() :: [{:mass_tolerance, float()}]

Functions

detect(fragments, min_similarity, exact_hashes, opts \\ [])

@spec detect([map()], float(), MapSet.t(), fuzzy_opts()) :: [
  ExDNA.Detection.Clone.t()
]

Find Type-III clones from a list of fragments at the given similarity threshold.